Parsing

This page documents the parsing process used by mappyfile to parse Mapfiles. mappyfile uses lark as the parsing engine.

MapFile Keywords

Links to the keywords that are used within Mapfiles:

MapFiles

Details on the structure of the Mapfile can be found at https://mapserver.org/mapfile/#notes:

  • The Mapfile is NOT case-sensitive

  • Strings containing non-alphanumeric characters or a MapServer keyword MUST be quoted. It is recommended to put ALL strings in double-quotes.

  • The Mapfile has a hierarchical structure, with the MAP object being the root All other objects fall under this one.

  • Comments are designated with a #.

  • C-style comments were added in 2017: https://github.com/mapserver/mapserver/pull/5362 - both single line (e.g. /* foo */) and multi-line comments work.

Hierarchy

A summary of all the main Mapfile components is shown below. These are directives that are in the form TYPE..END.

_images/map_classes.png

The LAYER type has been split out into its own diagram due to its more complex nature:

_images/layer_classes.png

Mapfile Notes

This section details the various declaration types found in a Mapfile.

  • Quoted strings. Used for quoted property values e.g.

    NAME "Layer1"
    DATA "lakes.shp"
    
  • Non-quoted lists. E.g. a POINTS block can be defined as follows:

    POINTS
        0 100
        100 200
        40 90
    END
    
  • Quoted lists. Used for property lists that should be quoted. E.g. the PROJECTION block can be defined as follows:

    PROJECTION
        'proj=utm'
        'ellps=GRS80'
        'datum=NAD83'
        'zone=15'
        'units=m'
        'north'
        'no_defs'
    END
    
  • Key-value lists:

    PROCESSING "BANDS=1"
    PROCESSING "CONTOUR_ITEM=elevation"
    PROCESSING "CONTOUR_INTERVAL=20"
    
  • Key-double-value lists. As above but there are two strings for each directive:

    CONFIG MS_ERRORFILE "stderr"
    CONFIG "PROJ_DEBUG" "OFF"
    CONFIG "ON_MISSING_DATA" "IGNORE"
    
  • Composite types- container declarations which finish with the keyword END. Examples:

    MAP ... END
    LAYER ... END
    CLASS ... END
    STYLE ... END
    

Including Files

The parser allows for files (containing further Mapfile declarations) referenced in the Mapfile to be loaded and parsed. Notes on the INCLUDE directive can be found at https://mapserver.org/mapfile/include.html:

  • Includes may be nested, up to 5 deep.

  • File locations can be given as a full path to the file, or as a path relative to the Mapfile

  • If a string is provided to the parse method, then an optional root_folder parameter can be used to work with relative paths

MAP
    NAME "include_mapfile"
    EXTENT 0 0 500 500
    SIZE 250 250

    INCLUDE "test_include_symbols.map"
    INCLUDE "C:\Includes\test_include_layer.map"
END

Parsing Notes

The Mapfile has a very flexible syntax, this section points out some of those syntax features, explains their significance to parsing, and detail the solution to accommodate them.

Unquoted Strings

Most programming languages insist that all strings are quoted. Unquoted strings can lead to a lot of ambiguity, as it does in the Mapfile format. For example, in the line:

TYPE LINE

It is unclear to the lexer (short for “lexical analyzer” that is responsible for converting a Mapfile into tokens) if LINE is a command like TYPE, or a string. In this case of course it’s a string, but it’s left to the parser to disambiguate it. This is not always simple process.

In our parser, we simply allowed for attribute names as a value. In post-processing, we treat them the same as strings.

Composite and Attribute Ambiguity

Two composite names - STYLE and SYMBOL, are also attribute names. For example:

# a style block
STYLE
    OUTLINECOLOR 0 255 0
END

QUERYMAP
    # a style attribute
    STYLE SELECTED
    COLOR 255 0 0
END

This above example is not a problem to parse, but it becomes very tricky when compounded by the next issue - line-breaks.

Resolving the SYMBOL ambiguity issue required the use of an interactive LALR parser. See this commit for details.

Line-Break Fluidity

On its surface, the Mapfile format appears very consistent in its line-break usage. But actually, there is a lot of variance allowed. For example:

STYLE  COLOR 255 0 0  END

Containers can be placed completely on one line, but also partially:

LAYER DEBUG 5
GROUP "default"
...
END

In this example, both attributes belong to LAYER, but only one of them is on the same line.

In this last example, we see a culmination of all 3 issues to create a high-level of ambiguity. It’s impossible to know if LAYER here is a composite or an attribute. Only after looking much further ahead, could a smart parser figure it out.