Brussels / 3 & 4 February 2018


Parsing Posix [S]hell

Yann Regis-Gianas

Parsing the POSIX shell language is challenging: lexical analysis depends on the parsing context ; parsing is specified by an ambiguous BNF grammar annotated with adhoc rules ; the language specification is informal... and, the icing on the cake: statically parsing a shell script is actually undecidable!

Forget about your textbooks! Existing implementations of shell interpreters contain hand-crafted character-level non-modular syntactic analyzers and yes, their source code are very hard to read.

What if you had to program the parser of a shell script verifier? Would you accept to throw away your favorite lexer and parser generators which had let you write high-level declarative implementations for decades? Would you accept to mix lexing, parsing and evaluation in a single large bowl of spaghetti code? Of course not. You would hardly trust such a parser.

In this talk, we will introduce "Morbig", a modular, high-level and interoperable parser for POSIX shell. The implementation of Morbig relies on key features of Menhir, an LR(1) parser generator for OCaml.


Photo of Yann Regis-Gianas Yann Regis-Gianas