WisiToken Overview

WisiToken is an LALR, LR1, and Packrat parser generator and run-time. The grammar can be specified in a bison-like input file, or as Ada statements. The LR parser supports generalized parsing (parallel parsers to handle conflicts) and syntax error recovery (inserting and deleting tokens to allow parsing to continue).

The primary customer for WisiToken is the Emacs Ada mode indentation, fontification, and navigation engine, which uses an LR1 parser generated by WisiToken.

The Ada grammar specified in the Ada Language Reference manual is not LALR(1); it has shift/reduce and reduce/reduce conflicts. Modifying the grammar to be LALR(1) is hard, and complicates adding future changes to the language.. Using a generalized LALR parser to handle the conflicts allows keeping the grammar (mostly) in its original form.

Recovering from syntax errors allows indentation, fontification, and navigation in files that are in the process of being edited. The algorithm is described here.

WisiToken provides two lexers:

a simple regular-expression based lexer, used mostly in unit tests.
an interface to the re2c lexer, which supports unicode source files.

WisiToken is distributed under GPL version 3, with the GNAT modification that allows use in non-GPL projects.

WisiToken is a significant rewrite of OpenToken, deleting support for recursive-descent, adding syntax error correction in the parser, and adding support for simple Packrat parsers.

current version: 2.1

User guide

WisiToken may be obtained in two ways:

It relies on external libraries:

WisiToken has been tested with AdaCore GNAT Community Edition 2019. It uses Ada 2012 containers, so it requires an Ada 2012 compiler. It also uses some gnatcoll packages, so it may only work with GNAT.

Some packages have SPARK annotations, and can be proven by SPARK.

WisiToken is maintained by Stephen Leake. Submit bugs directly to Stephen. Discussion about WisiToken is on the newsgroup comp.lang.ada, or the Emacs Ada mode mailing list.

History

Version ?

?
LR1 generate time improved by a factor of 3 for very large grammars.

Version 2.1

4 Jun 2020
Improve translation of EBNF to BNF
Syntax tree improvements to support translation of EBNF to BNF
New executable wisitoken-to_tree_sitter translates wisitoken grammar file to tree-sitter grammar file.

Version 2.0

9 May 2020
Improve handling of recursion in error recovery Minimal_Complete_Action
Other improvements in error recovery.
Syntax tree improvements to support using virtual tokens in indent in Emacs ada-mode.
text_rep parse table format changed.
support re2c version 1.3
Fix ada-mode debbugs #37620; hang in error recovery tasking.
match SAL 3.4 changes

Version 1.3.1

16 Dec 2019
Improve data output to recover log
Add executable 'recover_stats' to summarize recover log.

Version 1.3.0

13 Aug 2019
Add "repair image" field to grammar file token syntax; used by Emacs wisi-repair-error.
Delete elisp output language from wisitoken-bnf-generate.
Parse table action lists changed to sorted array; allows binary search in Action_For, Goto_For; improves recover time.
Use bounded Spark stacks in Configuration; improves recover time by 30% by eliminating some dynamic memory allocation.
Misc bug fixes.

Version 1.2.0

11 Jul 2019
Add support for several flavors of extended Backus Naur Form (EBNF), to allow parsing the Java and Python grammars. LR parser generation is supported by translating the EBNF syntax to BNF. Packrat does not yet support EBNF.
Allow more than two actions in a conflict; required by the Java and Python grammars.
Support new wisi action wisi-name-action.
Recursion computation in Compute_Minimal_Terminal_Sequences is improved, using SAL.Gen_Graphs. A grammar file option %partial_recursion allows a less accurate but more tractable computation using only strongly connected components; necessary for Java.
Grammar file option %mckenzie_cost_limit is deleted; use %mckenzie_enqueue_limit instead.
New grammar file options for cost of undo_reduce, fast_forward, matching_begin.
Language function Use_Minimal_Complete_Actions is renamed Matching_Begin_Tokens, with different parameters.
Error recovery is improved.

Version 1.1.0

21 Mar 2019
Partial parse supported; the lexer accepts begin, end arguments, the parser exits on a parse-time action.
Improved error recovery; insert minimimal tokens to complete a statement/declaration before or after the error point.
Add pragma Inline in many places for speed.
Several bug fixes.

Version 1.0.1

8 Dec 2018
Assign copyright to Free Software Foundation, for use with Emacs.
Update user guide.

Version 1.0

17 Nov 2018
Major rewrite/renaming of OpenToken.
Refactor packages to support error correction in the run-time parser, and to support LR1 parsers.
Delete support for recursive descent parsers; broken by the refactor, not worth fixing.
Extend the LALR parser to support error recovery.
Add support for re2c lexer
Add support for simple Packrat parsers

my home page

Last modified: Fri Jul 24 09:43:26 Pacific Daylight Time 2020