Lexer
Parse code and other text into structured trees.
The Lexer loops over the characters in a string, and processes them through Grammars to generate Asymmetrical Syntax Trees (AST) - Basically just a multi-dimensional array that describes the code.
Documentation
- Installation & Getting Started are below
- Create a Parser - Create a language parser that builds an AST, or modify an existing parser.
Other/Old Documentation
- Extend Lexer - Create your own grammars to support new languages
- @see_file(docs/GettingStarted.md, Getting Started) - Write a grammar class, create directives, and test your grammar.
- @see_file(docs/Tips.md, Tips & Overview) - How To tips & troubleshooting
- @see_file(docs/DirectiveCommands.md, Directive Commands) - Commands you can use in your directives.
- @see_file(docs/Examples.md, Extra Examples)
- @see_file(docs/Testing.md, Testing your grammar)
- @see_file(docs/api/code/) - Generated api documentation
- @see_file(docs/Architecture.md, Architecture)
- Development - Further develop this library
- @see_file(Status.md, Development Status & Notes) - The current state of development, and common development tips.
- @see_file(docs/Development/PhpGrammar.md, Php Grammar) - How to further develop the PHP Grammar.
- Defunct documentation
- @see_file(docs/GrammarExample.md, Grammar Examples)
- @see_file(docs/OldReadme1.md) - An old README that might have useful information.
Supported Languages
Additional language support can be added through new grammars. See @see_file(docs/Extend.md).
- Docblocks: Parses standard docblocks (
/** */
) for description and@param
. Somewhat language independent. - Php: Very Good (php 7.4 - 8.2)
- Bash: Broken. There was an old grammar which would parse functions and doclbocks (using
##\n#
), but it has not been returned to functionality with the current lexer.
Install
@template(php/composer_install)
Parse with CLI
This prints an Asymmetrical Syntax Tree (AST) as JSON.
# only file path is required
bin/lex file rel/path/to/file.php -nocache -debug -stop_at -1
Basic Usage
For built-in grammars, it is easiest to use the helper class.
<?php
$file = '/path/to/file/';
$helper = new \Tlf\Lexer\Helper();
$lexer = $helper->get_lexer_for_file($full_path); // \Tlf\Lexer
# These are the default settings
$lexer->useCache = true; // set false to disable caching
$lexer->debug = false; // set true to show debug messages
$lexer->stop_loop = -1; // set to a positive int to stop processing & print current lexer status (for debugging)
$ast = $lexer->lexFile($full_path); // \Tlf\Lexer\Ast
print_r($ast->getTree()); // array
Example Tree
Grammars determine tree structure. This is a php file in this repo. See @see_file(code/Ast/StringAst.php). This example only has one method and no properties. From the root of this repo, run bin/lex file code/Lexer.php
for an example of a larger class.
{
"type": "file",
"ext": "php",
"name": "StringAst",
"path": "\/path-to-downloads-dir\/Lexer\/code\/Ast\/StringAst.php",
"namespace": {
"type": "namespace",
"name": "Tlf\\Lexer",
"declaration": "namespace Tlf\\Lexer;",
"class": [
{
"type": "class",
"namespace": "Tlf\\Lexer",
"fqn": "Tlf\\Lexer\\StringAst",
"name": "StringAst",
"extends": "Ast",
"declaration": "class StringAst extends Ast",
"methods": [
{
"type": "method",
"args": [
{
"type": "arg",
"name": "sourceTree",
"value": "null",
"declaration": "$sourceTree = null"
}
],
"modifiers": [
"public"
],
"name": "getTree",
"body": "return $this->get('value');",
"declaration": "public function getTree($sourceTree = null)"
}
]
}
]
}
}
A php class property looks like:
{
"type": "property",
"modifiers": [
"public",
"int"
],
"docblock": {
"type": "docblock",
"description": "The loop we're on. 0 means not started. 1 means we're executing the first loop. 2 means 2nd loop, etc."
},
"name": "loop_count",
"value": "0",
"declaration": "public int $loop_count = 0;"
}