Write a Grammar
A Grammar is an array declaration of directives
that define instructions
. Those instructions
may call built-in commands or may explicitly call methods on the grammar, the lexer, the token, or the head ast.
First: Look at the example in doc/GrammarExample.md
Then:
- Read through this document
- Look at the commands available in doc/GrammarCommands.md
- Learn how to test a grammar. See doc/GrammarTesting.md
- Review the Architecture, if you like. See doc/Architecture.md
Tips
-
$grammar->getDirectives(':directive_name')
returns an associative array of directives. -
then
directive:- You can pass a directive declaration to
then
, likethen :name=>['start'=>[/*instructions*/]]
to override the targetd directive - You can pass
:directive_name.stop
to use thestop
asstart
.- idk if you can override in this case, but I think you can
- You can pass a directive declaration to
-
then.pop :directive_name X
lets you pop X layers when:directive_name
is matched. -
inherit :directive.stop
orinherit :directive.start
lets you auto-execute all commands from the named directive & instruction set. - Pass
:+name
tothen
to create a new directive, rather than loading from/merging with an existing directive.-
:_blank
and:_blank-name
are deprecated alternatives
-
Troubleshooting Tips
- Always
rewind
BEFOREbuffer.clear
, or no rewind is performed. -
match
has special handling & the recommended style is'match'=>'string'
or'match'=>'/regex/'
. The alternate styles likematch string
ormatch /regex/
should work, but might make problems. - set
$lexer->useCache
to false to disable cache. -
$lexer->debug = true
to print debug information -
$lexer->stop_loop = 30
to stop processsing on loop 30 & print debug info. -
rewind
can cause an infinite loop. Ex: The instructionsmatch == :
&rewind == 1
on the same directive. The:
is matched, then we rewind 1, then the:
is matched & we rewind 1 & so on. -
stop
instruction ALWAYS acts upon the top directive list at the time it is executed. If the current directive is not in the directive list'sstarted
, then nothing happens. Meaning it is NOT added to theunstarted
list. -
directive.inherit
instruction ALWAYS ignores thematch
instruction of the inherited directive.
Recommended structure
To keep files smaller & more organized, I keep my directives inside traits that my grammar use
s.
-
MyGrammarClass extends \Tlf\Lexer\Grammar
-
use MyGrammar\Main_Directives
-
use MyGrammar\Comments_Directives
, -
function buildDirectives()
:$this->directives = array_merge( comments_directives, main_directives)
- override
onGrammarAdded()
to implement this
- override
-
onLexerStart()
/onLexerEnd()
if needed - methods your directives will call
-
Structure of directives
The form is $directives -> directive_name -> instruction set -> array of instructions
. There are two instruction sets start
, stop
. There is a third instruction set, but I plan to remove or change it.
<?php
protected $directives = [
'php_open'=>[
'start'=>[
'match'=>'<?php',
//instructions go here
],
'stop'=>[
'match'=>'?>',
//instructions go here
],
],
]
- When
<?php
matches,php_open
becomesstarted
. - On subsequent loops
stop
will be checked. - When
?>
matches,php_open
becomes stopped.
Notes
- The subsequent instructions only execute if
match
passes. -
match
is NOT a required instruction -
match
does NOT have to be the first instruction -
match
has a lot of special handling to handle merging of overridden directives.
Declaring instructions
Many commands have a shorthand and a longhand like stop
and directive.stop
Examples:
-
'command arg1 arg2' => 'arg3'
-
'command arg1 arg2 //comment' => 'arg3'
-
'command arg1 ...' => ['arg2', 'arg3', 'arg4']
Instead of a command
, you can use a namespace:method
to directly call a method on an object from internally defined namespace targets or from one of the available grammars.
The available namespace targets are defined here:
$namespaceTargets = [
'lexer'=>$this,
'token'=>$this->token,
'ast'=>$this->getHead(),
];
$grammarTargets = $this->grammars;
$grammarTargets['this'] = $directive->_grammar ?? null;
Special object-calling
Some commands, like ast.new
allow you give values that call objects+methods in much the same way as instructions. This is on a command-by-command basis
The format is _namespace:method arg1 arg2 arg3
Example:
['ast.new'=>[
'_type'=>'class',
'name'=> '_token:buffer',
'docblock'=> '_lexer:unsetPrevious docblock'
]
]