Parse code and other text into structured trees.
The Lexer loops over the characters in a string, and processes them through Grammars to generate Asymmetrical Syntax Trees (AST) - Basically just a multi-dimensional array that describes the code.
- Installation & Getting Started are below
- Create a Parser - Create a language parser that builds an AST, or modify an existing parser.
Other/Old Documentation
- Extend Lexer - Create your own grammars to support new languages
- @see_file(docs/GettingStarted.md, Getting Started) - Write a grammar class, create directives, and test your grammar.
- @see_file(docs/Tips.md, Tips & Overview) - How To tips & troubleshooting
- @see_file(docs/DirectiveCommands.md, Directive Commands) - Commands you can use in your directives.
- @see_file(docs/Examples.md, Extra Examples)
- @see_file(docs/Testing.md, Testing your grammar)
- @see_file(docs/api/code/) - Generated api documentation
- @see_file(docs/Architecture.md, Architecture)
- Development - Further develop this library
- @see_file(Status.md, Development Status & Notes) - The current state of development, and common development tips.
- @see_file(docs/Development/PhpGrammar.md, Php Grammar) - How to further develop the PHP Grammar.
- Defunct documentation
- @see_file(docs/GrammarExample.md, Grammar Examples)
- @see_file(docs/OldReadme1.md) - An old README that might have useful information.
Supported Languages
Additional language support can be added through new grammars. See @see_file(docs/Extend.md).
- Docblocks: Parses standard docblocks (
/** */
) for description and@param
. Somewhat language independent. - Php: Very Good (php 7.4 - 8.2)
- Bash: Broken. There was an old grammar which would parse functions and doclbocks (using
), but it has not been returned to functionality with the current lexer.
Parse with CLI
This prints an Asymmetrical Syntax Tree (AST) as JSON.
# only file path is required
bin/lex file rel/path/to/file.php -nocache -debug -stop_at -1
Basic Usage
For built-in grammars, it is easiest to use the helper class.
$file = '/path/to/file/';
$helper = new \Tlf\Lexer\Helper();
$lexer = $helper->get_lexer_for_file($full_path); // \Tlf\Lexer
# These are the default settings
$lexer->useCache = true; // set false to disable caching
$lexer->debug = false; // set true to show debug messages
$lexer->stop_loop = -1; // set to a positive int to stop processing & print current lexer status (for debugging)
$ast = $lexer->lexFile($full_path); // \Tlf\Lexer\Ast
print_r($ast->getTree()); // array
Example Tree
Grammars determine tree structure. This is a php file in this repo. See @see_file(code/Ast/StringAst.php). This example only has one method and no properties. From the root of this repo, run bin/lex file code/Lexer.php
for an example of a larger class.
"type": "file",
"ext": "php",
"name": "StringAst",
"path": "\/path-to-downloads-dir\/Lexer\/code\/Ast\/StringAst.php",
"namespace": {
"type": "namespace",
"name": "Tlf\\Lexer",
"declaration": "namespace Tlf\\Lexer;",
"class": [
"type": "class",
"namespace": "Tlf\\Lexer",
"fqn": "Tlf\\Lexer\\StringAst",
"name": "StringAst",
"extends": "Ast",
"declaration": "class StringAst extends Ast",
"methods": [
"type": "method",
"args": [
"type": "arg",
"name": "sourceTree",
"value": "null",
"declaration": "$sourceTree = null"
"modifiers": [
"name": "getTree",
"body": "return $this->get('value');",
"declaration": "public function getTree($sourceTree = null)"
A php class property looks like:
"type": "property",
"modifiers": [
"docblock": {
"type": "docblock",
"description": "The loop we're on. 0 means not started. 1 means we're executing the first loop. 2 means 2nd loop, etc."
"name": "loop_count",
"value": "0",
"declaration": "public int $loop_count = 0;"