Lexer

Parse code and other text into structured trees.

The Lexer loops over the characters in a string, and processes them through Grammars to generate Asymmetrical Syntax Trees (AST) - Basically just a multi-dimensional array that describes the code.

Documentation

  • Installation & Getting Started are below
  • Create a Parser - Create a language parser that builds an AST, or modify an existing parser.

Other/Old Documentation

  • Extend Lexer - Create your own grammars to support new languages
    • @see_file(docs/GettingStarted.md, Getting Started) - Write a grammar class, create directives, and test your grammar.
    • @see_file(docs/Tips.md, Tips & Overview) - How To tips & troubleshooting
    • @see_file(docs/DirectiveCommands.md, Directive Commands) - Commands you can use in your directives.
    • @see_file(docs/Examples.md, Extra Examples)
    • @see_file(docs/Testing.md, Testing your grammar)
  • @see_file(docs/api/code/) - Generated api documentation
  • @see_file(docs/Architecture.md, Architecture)
  • Development - Further develop this library
    • @see_file(Status.md, Development Status & Notes) - The current state of development, and common development tips.
    • @see_file(docs/Development/PhpGrammar.md, Php Grammar) - How to further develop the PHP Grammar.
  • Defunct documentation
    • @see_file(docs/GrammarExample.md, Grammar Examples)
    • @see_file(docs/OldReadme1.md) - An old README that might have useful information.

Supported Languages

Additional language support can be added through new grammars. See @see_file(docs/Extend.md).

  • Docblocks: Parses standard docblocks (/** */) for description and @param. Somewhat language independent.
  • Php: Very Good (php 7.4 - 8.2)
  • Bash: Broken. There was an old grammar which would parse functions and doclbocks (using ##\n#), but it has not been returned to functionality with the current lexer.

Install

@template(php/composer_install)

Parse with CLI

This prints an Asymmetrical Syntax Tree (AST) as JSON.

# only file path is required
bin/lex file rel/path/to/file.php -nocache -debug -stop_at -1

Basic Usage

For built-in grammars, it is easiest to use the helper class.

<?php
$file = '/path/to/file/';
$helper = new \Tlf\Lexer\Helper();
$lexer = $helper->get_lexer_for_file($full_path); // \Tlf\Lexer

# These are the default settings
$lexer->useCache = true; // set false to disable caching
$lexer->debug = false; // set true to show debug messages
$lexer->stop_loop = -1; // set to a positive int to stop processing & print current lexer status (for debugging)

$ast = $lexer->lexFile($full_path); // \Tlf\Lexer\Ast

print_r($ast->getTree()); // array

Example Tree

Grammars determine tree structure. This is a php file in this repo. See @see_file(code/Ast/StringAst.php). This example only has one method and no properties. From the root of this repo, run bin/lex file code/Lexer.php for an example of a larger class.

{
    "type": "file",
    "ext": "php",
    "name": "StringAst",
    "path": "\/path-to-downloads-dir\/Lexer\/code\/Ast\/StringAst.php",
    "namespace": {
        "type": "namespace",
        "name": "Tlf\\Lexer",
        "declaration": "namespace Tlf\\Lexer;",
        "class": [
            {
                "type": "class",
                "namespace": "Tlf\\Lexer",
                "fqn": "Tlf\\Lexer\\StringAst",
                "name": "StringAst",
                "extends": "Ast",
                "declaration": "class StringAst extends Ast",
                "methods": [
                    {
                        "type": "method",
                        "args": [
                            {
                                "type": "arg",
                                "name": "sourceTree",
                                "value": "null",
                                "declaration": "$sourceTree = null"
                            }
                        ],
                        "modifiers": [
                            "public"
                        ],
                        "name": "getTree",
                        "body": "return $this->get('value');",
                        "declaration": "public function getTree($sourceTree = null)"
                    }
                ]
            }
        ]
    }
}

A php class property looks like:

{
    "type": "property",
    "modifiers": [
        "public",
        "int"
    ],
    "docblock": {
        "type": "docblock",
        "description": "The loop we're on. 0 means not started. 1 means we're executing the first loop. 2 means 2nd loop, etc."
    },
    "name": "loop_count",
    "value": "0",
    "declaration": "public int $loop_count = 0;"
}