LRSTAR: LR(*) parser generator for C++


Home Downloads Feedback Comparison Theoretical Documentation Contact


  • An improved version of LRSTAR is coming, God willing. I was finding bugs so I removed the download. I guess you should use ANTLR at this time. Get on the mailing list (below) if you want to be notified about the next release.
  • Name:

    LRSTAR: LR(*) parser generator v 2020

  • Creates LR(1)/LR(*) parsers in C++
  • Creates extremely fast parsers with a compressed-matrix structure.
  • Reads EBNF grammar notation (i.e. +,*,?).
  • Reads Yacc/Bison and ANTLR grammars (after a few modifications)
  • Grammars are completely separate from code
  • Handles context-sensitivity (e.g. typedef in C)
  • Parsers have a symbol-table builder
  • Parsers can do AST construction and traversal
  • Includes Visual Studio 2017 projects, which work with VS 2019
  • DFA: fast lexer generator v 2020

  • Creates DFA compressed-matrix lexers
  • Lexers do keyword recognition

  • LR(1)/LR(*) Parsers

    LRSTAR creates minimal LR(1) parsers, the default. If you specify option /k=2 or more, then it creates LR(*) parsers, which are minimal LR(1) with the LR(*) algorithm being used. LR(*) is a look-ahead algorithm that is used to resolve conflicts in your grammar at parsing time. The LR(*) algorithm can lookahead 2, 3, or more tokens. In one of our test files it needed to look ahead 13 tokens. The LR(*) algorithm is efficient and does not impact the parsing speed very much.

    Extremely Fast Parsers

    LR(1) parsers, created by LRSTAR, are slightly faster than Yacc/Bison parsers. This is accomplished by using compressed-matrix parser tables and a chain-reduction elimination technique. Keep in mind that LRSTAR parsers also build a symbol table and an abstract-syntax tree (AST) automatically. Compared to ANTLR? A test of ANTLR 4.8 using the C++ target showed that the ANTLR parser required 10.0 seconds, to process a C source file of 227,000 lines. The LRSTAR parser required only 0.10 seconds (100 times faster).

    EBNF Grammar Notation

    EBNF is modern grammar notation, which allows regular expressions, whereas '+' means one or more, '*' means zero or more, '?' means optional. For example, if you want to define a list of one or more numbers, you could write: <number>+ or for a list of arguments separated by commas, you could write: Arg/','+

    Context Sensitivity

    The "typedef" declaration in the C and C++ languages is a context-sensitive issue.  This cannot be solved by upgrading from LALR(1) to LR(1) or LR(k).  It requires making use of a symbol table while parsing and this is how an LRSTAR parser handles the context-sensitive issue.  LRSTAR has this feature built into the grammar notation, making life easy for a user.

    DFA Compressed Matrix

    A DFA lexer is a finite-state automata.  DFA is the fastest recognition algorithm for reading characters and forming the tokens of your language. DFA's work fine for most programming languages.  You may include the keywords of your language in the lexical grammar which makes keyword recognition as fast as possible. The data structure created by DFA is a compressed-matrix. Our tests show that DFA lexers are almost twice the speed of flex lexers and nearly the same size.

    (c) Copyright LRTEC 2020.  All rights reserved.