Php Lexer
A declarative lexer seamlessly hooking into php functions, building ASTs for multiple languages.
Development Status / Roadmap
Sept 9, 2021 update: v0.7 is under development & I haven't updated the documentation for this branch.
v0.6 is fairly stable & feature rich. I still develop on this main v0.6
branch. Implementation of some instructions may change while remaining on this branch; there shouldn't be any major breaking changes.
If you use this Lexer, please submit an issue asking me to improve the branching workflow & stability. I won't do it until someone needs it.
Install
For development, it depends upon taeluf/php/php-tests and taeluf/php/CodeScrawl, which will be installed via composer.
composer require taeluf/lexer v0.7.x-dev
or in your composer.json
{"require":{ "taeluf/lexer": "v0.7.x-dev"}}
Generate an AST
See doc/Examples.md for more examples
Example:
$lexer = new \Tlf\Lexer();
$lexer->useCache = false; // cache is disabled only for testing
$lexer->addGrammar($phpGrammar = new \Tlf\Lexer\PhpGrammar());
$ast = $lexer->lexFile(dirname(__DIR__).'/php/SampleClass.php');
// An array detailing the file
$tree = $ast->getTree();
See test/php/SampleClass.php for the input file and test/php/SampleClass.tree.php for the output $tree
.
Status of Grammars
- Php: Early implementation that catches most class information (in a lazy form) but may have bugs
- Docblock: Currently handles
/*
style, cleans up indentation, removes leading*
from each line, and processes simple attributes (start a line with* @something description
).- Coming soon (maybe): Processsing of
@method_attributes(arg1,arg2)
- Coming soon (maybe): Processsing of
- Bash: Coming soon, but will only catch function declarations & their docblocks.
- the docblocks start with
##
and each subsequent line must start with whitespace then#
or just#
. - I'm writing it so i can document git-bent
- the docblocks start with
- Javascript: Coming soon, but will only catch docblocks, classes, methods, static functions, and mayyybee properties on classes.
- I'm writing it so i can document js-autowire
Write a Grammar
A Grammar is an array declaration of directives
that define instructions
. Those instructions
may call built-in command
s or may explicitly call methods on a grammar, the lexer, the token, or the head ast.
Writing a grammar is very involved, so please see doc/GrammarWriting.md for details.
Warning
- Sometimes when you run the lexer, there will be
echo
d output. Use output buffering if you want to stop this. - During
onLexerEnd(...)
, Docblock does$ast->add('docblock', $lexer->previous('docblock'))
IF there's a previous docblock set.
Contribute
- Need features? Check out the
Status.md
document and see what needs to be done. Open up an issue if you're working on something, so we don't double efforts.