1. Parsing

mappyfile uses Lark as the parsing engine.

  1. A new mapfile.g grammar file will be created.
  2. This should be tested to work with all the available test case sample maps (see Testing section below).

1.1. Keywords

1.2. MapFiles

Details on the structure of the Mapfile can be found at: http://mapserver.org/mapfile/index.html#notes

  • The Mapfile is NOT case-sensitive
  • Strings containing non-alphanumeric characters or a MapServer keyword MUST be quoted. It is recommended to put ALL strings in double-quotes.
  • The Mapfile has a hierarchical structure, with the MAP object being the root All other objects fall under this one.
  • Comments are designated with a #.
  • Note C-style comments have recently been added: https://github.com/mapserver/mapserver/pull/5362 - Both single line (e.g. /* foo */) and multi-line comments work.

1.3. Design Notes

mappyfile will include a single method, parse which will return a Mapfile object, which can be treated in a similar manner to a dictionary.

# a file name can be sent to the parse function

# alternatively a string containing Mapfile syntax can be parsed directly

# if the string contains INCLUDE references then an optional root_folder can
# be passed to the parse method that can be used for relative paths
mappyfile.parse(string, root_folder=r"C:\Data")

This will take a string, or read the contents of a file and attempt to create a valid Mapfile tree or object. If no valid object can be created a parsing exception will be thrown.

Unsure on how this will be best achieved. Assuming a single grammar at the Mapfile level, would any subclass need to be wrapped in its parent hierarchy keywords to parse correctly?

E.g. a STYLE is associated with a CLASS associated with a LAYER which in turn is associated with a MAP.

style_string = """
    COLOR 107 208 107
    WIDTH 1

new_class = mappyfile.parse(style_string)
  • Would mappyfile need to take care of this by wrapping the STYLE string in a CLASS, LAYER, and MAP keywords to parse correctly?
  • If so would each need to be a separate method e.g. mappyfile.parse_layer, mappyfile.parse_class etc. ?

Other hierarchies and relationships can be seen on the http://www.mapserver.org/mapscript/mapscript.html#mapscript-classes page.

+-------+ 0..*    1 +-------+
| Style | <-------- | Class |
+-------+           +-------+

+-------+ 0..*     1 +-------+
| Class | <--------> | Layer |
+-------+            +-------+

 +-----+ 0..1  0..* +-------+
 | Map | <--------> | Layer |
 +-----+            +-------+

1.4. Including Files

The parser will also need to allow for files (containing further Mapfile declarations) referenced in the Mapfile to be loaded and parsed.

  • Includes may be nested, up to 5 deep.
  • File location can be given as a full path to the file, or as a path relative to the Mapfile
  • If a string is provided to the parse method, then an optional root_folder parameter can be used to work with relative paths

See http://mapserver.org/mapfile/include.html for further details.

    NAME "include_mapfile"
    EXTENT 0 0 500 500
    SIZE 250 250

    INCLUDE "test_include_symbols.map"
    INCLUDE "C:\Includes\test_include_layer.map"

Is it easy to have an option to not process the INCLUDEs and leave them as a simple line of text?

1.4.1. Benchmarking

I chose to use the Earley algorithm due to unexpected flexibility in the syntax of the mapfiles. (I can expand on that subject if you wish ) However, many of the files can still be parsed using PLY. You may notice that the test script tries parsing with PLY first and only falls back to Earley if it fails. It’s not necessary but it’s about 3 times faster under CPython. Or you may choose to use Pypy, which is the fastest just with the Earley parser.

Here are some benchmarks from my PC for parsing all 301 files (2MB):

  • Pypy: 3.5 seconds
  • CPython: 15 seconds
  • CPython with fallback