WisiToken is an LALR, LR1, and Packrat parser generator and
run-time. The grammar can be specified in a bison-like input file, or
as Ada statements. The LR parser supports generalized parsing
(parallel parsers to handle conflicts) and syntax error recovery
(inserting and deleting tokens to allow parsing to continue).
The primary customer for WisiToken is the Emacs Ada mode
indentation, fontification, and navigation engine, which uses an LR1
parser generated by WisiToken.
The Ada grammar specified in the Ada Language Reference manual is
not LALR(1); it has shift/reduce and reduce/reduce conflicts.
Modifying the grammar to be LALR(1) is hard, and complicates adding
future changes to the language.. Using a generalized LALR parser to
handle the conflicts allows keeping the grammar (mostly) in its
original form.
Recovering from syntax errors allows indentation, fontification,
and navigation in files that are in the process of being edited. The
algorithm is
described here.
WisiToken provides two lexers:
- a simple regular-expression based lexer, used
mostly in unit tests.
- an interface to the re2c
lexer, which supports unicode source files.
WisiToken is distributed under GPL version
3, with the GNAT modification that allows use in non-GPL
projects.
WisiToken is a significant rewrite of OpenToken, deleting support
for recursive-descent, adding syntax error correction in the parser,
and adding support for simple Packrat parsers.
current version: 2.1
User guide
WisiToken may be obtained in two ways:
It relies on external libraries:
WisiToken has been tested with AdaCore GNAT Community Edition 2019.
It uses Ada 2012 containers, so it requires an Ada 2012 compiler. It
also uses some gnatcoll packages, so it may only work with GNAT.
Some packages have SPARK annotations, and can be proven by SPARK.
WisiToken is maintained
by Stephen Leake.
Submit bugs directly to Stephen. Discussion about WisiToken is on the
newsgroup comp.lang.ada, or
the Emacs
Ada mode mailing list.
History
Version ?
- ?
- LR1 generate time improved by a factor of 3 for very large grammars.
Version 2.1
- 4 Jun 2020
- Improve translation of EBNF to BNF
- Syntax tree improvements to support translation of EBNF to BNF
- New executable wisitoken-to_tree_sitter translates wisitoken
grammar file to tree-sitter grammar file.
Version 2.0
- 9 May 2020
- Improve handling of recursion in error recovery Minimal_Complete_Action
- Other improvements in error recovery.
- Syntax tree improvements to support using virtual tokens in indent in Emacs ada-mode.
- text_rep parse table format changed.
- support re2c version 1.3
- Fix ada-mode debbugs #37620; hang in error recovery tasking.
- match SAL 3.4 changes
Version 1.3.1
- 16 Dec 2019
- Improve data output to recover log
- Add executable 'recover_stats' to summarize recover log.
Version 1.3.0
- 13 Aug 2019
- Add "repair image" field to grammar file token syntax; used by
Emacs wisi-repair-error.
- Delete elisp output language from wisitoken-bnf-generate.
- Parse table action lists changed to sorted array; allows binary
search in Action_For, Goto_For; improves recover time.
- Use bounded Spark stacks in Configuration; improves recover time
by 30% by eliminating some dynamic memory allocation.
- Misc bug fixes.
Version 1.2.0
- 11 Jul 2019
- Add support for several flavors of extended Backus Naur Form
(EBNF), to allow parsing the Java and Python grammars. LR parser
generation is supported by translating the EBNF syntax to BNF.
Packrat does not yet support EBNF.
- Allow more than two actions in a conflict; required by the
Java and Python grammars.
- Support new wisi action wisi-name-action.
- Recursion computation in Compute_Minimal_Terminal_Sequences is
improved, using SAL.Gen_Graphs. A grammar file option
%partial_recursion allows a less accurate but more tractable
computation using only strongly connected components; necessary for
Java.
- Grammar file option %mckenzie_cost_limit is deleted; use
%mckenzie_enqueue_limit instead.
- New grammar file options for cost of undo_reduce, fast_forward, matching_begin.
- Language function Use_Minimal_Complete_Actions is renamed
Matching_Begin_Tokens, with different parameters.
- Error recovery is improved.
Version 1.1.0
- 21 Mar 2019
- Partial parse supported; the lexer accepts begin, end arguments,
the parser exits on a parse-time action.
- Improved error recovery; insert minimimal tokens to complete a
statement/declaration before or after the error point.
- Add pragma Inline in many places for speed.
- Several bug fixes.
Version 1.0.1
- 8 Dec 2018
- Assign copyright to Free Software Foundation, for use with Emacs.
- Update user guide.
Version 1.0
- 17 Nov 2018
- Major rewrite/renaming of OpenToken.
- Refactor packages to support error correction in the run-time
parser, and to support LR1 parsers.
- Delete support for recursive descent parsers; broken by the
refactor, not worth fixing.
- Extend the LALR parser to support error recovery.
- Add support for re2c lexer
- Add support for simple Packrat parsers
my home page
Last modified: Fri Jul 24 09:43:26 Pacific Daylight Time 2020