Extending the lexer

Convert code or other text into a structured tree (multi-dimensional array).

In This File:

  • create a grammar with directives & handler functions
  • test a grammar
  • a complex directive that builds an AST without php
  • How to write an extension

Extending

To extend the lexer, we will do four things, iteratively.

  1. Create a Grammar
  2. Create an array of Directives
  3. Create a trait for callbacks Directives use
  4. Write Directive tests

To run it, I suggest using your preferred test suite. Though, the built-in directive tests (composer) require taeluf/tester

1. Grammar

We'll be recreating part of the Bash Grammar.

<?php

class MyBashGrammar extends \Tlf\Lexer\Grammar {

    use MyBashGrammarCallbacks;

    public function getNamespace(){return 'mybash';}

    protected $directives = [];

    public function __construct(){
        $file = file_get_contents(__DIR__.'/directives.php');
        $directives = require($file);
        $this->directives = $directives;
        // @tip use `array_merge()` to load directives from multiple files
    }

    public function onLexerStart($lexer,$file,$token){
        // if you have any additional setup to do
    }
}

2. Directives

directives.php

return [
        'root'=>[
            'is'=>[
                // ':comment',
                ':docblock',
                ':function',
            ],
        ],

        'docblock'=>[
            'start'=>[
                'match'=>'##',
            ],
            'stop'=>[
                'match'=>'/(^\s*[^\#])/m',
                'rewind 2',
                'this:handleDocblockEnd',
                'buffer.clear',
                // 'forward 2'
            ]
        ],

        'function'=>[
            'start'=>[
                'match'=>'/(?:function\s+)?([a-zA-Z\_0-9]*)(?:(?:\s*\(\))|\s+)\{/',
                'this:handleFunction',
                'stop',
                'buffer.clear',
            ]
        ],
        // an additional 'comment' directive is below
    ];

3. Callbacks

We'll write a trait with function to handle directive calls. You'll notice this:handleFunction in the directives above. That will call handleFunction(...) on your trait below.

<?php
trait MyBashGrammarCallbacks {

    public function handleDocblockEnd($lexer, $ast, $token, $directive){
        $block = $token->buffer();
        $clean_input = preg_replace('/^\s*#+/m','',$block);
        $db_grammar = new \Tlf\Lexer\DocblockGrammar();
        $ast = $db_grammar->buildAstWithAttributes(explode("\n",$clean_input));
        $lexer->setPrevious('docblock', $ast);
    }

    public function handleFunction($lexer, $ast, $token, $directive){
        // $func_name = $token->match(1);
        $func = new \Tlf\Lexer\Ast('function');
        $func->name = $token->match(1);
        $func->docblock = $lexer->previous('docblock');
        $lexer->getHead()->add('function', $func);
    }

4. Directive Tests

You need to be using taeluf/tester for built-in tests to work

<?php

class MyBashGrammarTest extends extends \Tlf\Lexer\Test\Tester {

    protected $my_tests = [
        'Comments'=>[
            // the 'comment' directive is below and can be added to the `MyGrammar` that is above
            'start'=>'comment', // t
            'input'=>"var=\"abc\"\n#I am a comment\nvarb=\"def\"",
            'expect'=>[
                "comments"=>[
                    0=>[
                        'type'=>'comment',
                        'src'=>'#I am a comment',
                        'description'=> "I am a comment",
                    ]
                ],
            ],
        ],
    ];

    public function testBashDirectives(){
        $myGrammar = new \MyGrammar();
        $grammars = [
            $myGrammar
        ];
        // $docGram->buildDirectives();

        $this->runDirectiveTests($grammars, $this->my_tests);
    }

}

A more complex directive

<?php
// you would put this in your directives class
$directives = [
    'comment'=>[
        'start'=>[
            'match'=>'/#[^\#]/',
            'rewind 2',
            'buffer.clear',
            'forward 1',
            // you can create & modify ASTs all in the directive code, without php
            'ast.new'=>[
                '_addto'=>'comments',
                '_type'=>'comment',
                'src'=>'_token:buffer',
            ],
            'buffer.clear //again',
        ],

        // `match` gets called for each char after `start`
        'match'=>[
            'match'=>'/@[a-zA-Z0-9]/', // match an @attribute
            'rewind 1',
            'ast.append src',
            'rewind 1 // again',
            'ast.append description',
            'forward 2',
            'buffer.clear',
            'then :+'=>[ // the :+ means that we're defining a new directive rather than referencing an existing one
                'start'=>[
                    //just immediately start
                    'match'=>'',
                    'rewind 1',
                ],
                'stop'=>[
                    // i honestly don't know why I have this here.
                    'match'=>'/(\\r|\\n)/',
                    'rewind 1',
                    'ast.append src',
                    'buffer.clear',
                ]
            ],

        ],
        'stop'=>[
            'match'=>'/(\\r|\\n)/',
            'rewind'=>1,
            'ast.append src',
            'ast.append description',
            'forward'=>1,
            'buffer.clear',
        ],
    ]
];