This page documents the parsing process used by mappyfile to parse Mapfiles. mappyfile uses lark as the parsing engine.
Links to the keywords that are used within Mapfiles:
All the tokens used by MapServer are listed in the following file: https://github.com/mapserver/mapserver/blob/main/maplexer.l
Keywords are also listed at: https://mapserver.org/mapfile/index.html
There is a Sphinx RegexLexer for code formatting at https://github.com/mapserver/docs/blob/main/conf.py (see bottom of file)
Details on the structure of the Mapfile can be found at https://mapserver.org/mapfile/#notes:
The Mapfile is NOT case-sensitive
Strings containing non-alphanumeric characters or a MapServer keyword MUST be quoted. It is recommended to put ALL strings in double-quotes.
The Mapfile has a hierarchical structure, with the MAP object being the root All other objects fall under this one.
Comments are designated with a #.
C-style comments were added in 2017: https://github.com/mapserver/mapserver/pull/5362 - both single line (e.g.
/* foo */) and multi-line comments work.
A summary of all the main Mapfile components is shown below. These are directives that are in the form
LAYER type has been split out into its own diagram due to its more complex nature:
This section details the various declaration types found in a Mapfile.
Quoted strings. Used for quoted property values e.g.
NAME "Layer1" DATA "lakes.shp"
Non-quoted lists. E.g. a POINTS block can be defined as follows:
POINTS 0 100 100 200 40 90 END
Quoted lists. Used for property lists that should be quoted. E.g. the PROJECTION block can be defined as follows:
PROJECTION 'proj=utm' 'ellps=GRS80' 'datum=NAD83' 'zone=15' 'units=m' 'north' 'no_defs' END
PROCESSING "BANDS=1" PROCESSING "CONTOUR_ITEM=elevation" PROCESSING "CONTOUR_INTERVAL=20"
Key-double-value lists. As above but there are two strings for each directive:
CONFIG MS_ERRORFILE "stderr" CONFIG "PROJ_DEBUG" "OFF" CONFIG "ON_MISSING_DATA" "IGNORE"
Composite types- container declarations which finish with the keyword END. Examples:
MAP ... END LAYER ... END CLASS ... END STYLE ... END
The parser allows for files (containing further Mapfile declarations) referenced in the Mapfile to be loaded and parsed. Notes on the
directive can be found at https://mapserver.org/mapfile/include.html:
Includes may be nested, up to 5 deep.
File locations can be given as a full path to the file, or as a path relative to the Mapfile
If a string is provided to the
parsemethod, then an optional
root_folderparameter can be used to work with relative paths
MAP NAME "include_mapfile" EXTENT 0 0 500 500 SIZE 250 250 INCLUDE "test_include_symbols.map" INCLUDE "C:\Includes\test_include_layer.map" END
The Mapfile has a very flexible syntax, this section points out some of those syntax features, explains their significance to parsing, and detail the solution to accommodate them.
Most programming languages insist that all strings are quoted. Unquoted strings can lead to a lot of ambiguity, as it does in the Mapfile format. For example, in the line:
It is unclear to the lexer (short for “lexical analyzer” that is responsible for converting a Mapfile into tokens)
LINE is a command like
TYPE, or a string. In this case of course it’s a string, but it’s left to the parser to disambiguate it. This
is not always simple process.
In our parser, we simply allowed for attribute names as a value. In post-processing, we treat them the same as strings.
Composite and Attribute Ambiguity¶
Two composite names -
SYMBOL, are also attribute names. For example:
# a style block STYLE OUTLINECOLOR 0 255 0 END QUERYMAP # a style attribute STYLE SELECTED COLOR 255 0 0 END
This above example is not a problem to parse, but it becomes very tricky when compounded by the next issue - line-breaks.
On its surface, the Mapfile format appears very consistent in its line-break usage. But actually, there is a lot of variance allowed. For example:
STYLE COLOR 255 0 0 END
Containers can be placed completely on one line, but also partially:
LAYER DEBUG 5 GROUP "default" ... END
In this example, both attributes belong to
LAYER, but only one of them is on the same line.
In this last example, we see a culmination of all 3 issues to create a high-level of ambiguity.
It’s impossible to know if
LAYER here is a composite or an attribute. Only after looking much further ahead, could a smart parser figure it out.