Development Status of Lexer

Notes

  • PHP directive test:
    • phptest -test ShowMePhpFeatures for a summary of the directive tests
      • see test/output/PhpFeatures.md or view in the terminal
    • phptest -test Directives -run Directive.TestName
      • add -stop_loop 50 to stop after the 50th loop
      • add -version 0.1 to run without new code. Default in tests is version 1.0
      • see test/src/Php to create new directive test
      • see test/Tester.php::runDirectiveTests()
  • Docblock Directive Test
    • phptest -test DocblockDirectives -version 0.1
  • Generate Documentation (memory limit):
    • php -d memory_limit=900M "$scrawl_dir/scrawl" where $scrawl_dir is the path to Code Scrawl's bin directory

Bad stuff

  • function abc() use ($var){} uses method_arglist instructions. It works, it's just does not create the ideal ast structure.

Versions

  • v0.2: has simple (poor) bash lexing and old (incomplete, partially broken) php lexing
  • v0.3: I don't know (broken, probably?)
  • v0.5: I don't know (broken, probably?)
  • v0.6: abandoned intermediate version, I BELIEVE (but not 100% sure)
  • v0.7: The up and coming version
  • v0.8: get rid of old php grammar. Clean up tests. Document. Maybe some additional polish & niceties for running the lexer.

Grammar Changes

  • Feb 14, 2022, PHP: reset $xpn on op_block_end() when ast type is block_body. See Php->testScrawl2()

Nov 8, 2023

I fixed the ast being empty at the end problem - the test was set to 'expect.previous', but it must be 'expect' if I want to compare to the ast tree. Now, I seem to have one more array than I would like. the root ast has a 'docblock' entry with an array, who's 0th index is the actual docblock. It should just be one docblock, without a parent array. idk why that's happening.

I got the Docblock.OneLine & Docblock./**OneLine working, except for the parent array problem. I haven't tested the other directive tests.

phptest -test DocblockDirectives -version 0.1 -run Docblock.OneLine -stop_loop 11

Oct 24, 2023

I worked on the new Docblock grammar a bit. I'm working on phptest -test DocblockDirectives -version 0.1 -run Docblock.OneLine -stop_loop 11 ... Its printing what I expect, but if I remove the stop_loop, it appears there is no AST, & I belive this is an issue with my AST stack. Idk. I'm confused.

I'm also not really sure what I should do with the docblock. Inside php, I'm setting a docblock to a previous key, not to an AST. But i want the docblock to finish and be able to return a full docblock, if that's the only thing being processed ... i guess ... idk ... i want it to be able to be tested on its own, but also be able to be used WITHIN other language parsers, so that each language doesn't have to make its own.

I'm making some decisions within the new docblock directives that I don't like too ... So I'll probably need to change some of that. But that's okay, it's just part of the process.

Oct 22, 2023

  • removed php handleWhitespace, replaced with instructions
  • removed php handleComment(), added buffer.trim
  • worked on making new implementation for docblock grammar that is entirely directive driven.

My biggest holdup with the Docblock grammar is that i don't know what the rules are. I have an existing PHP implementation that works, but I find it VERY confusing, which makes it extremely hard to edit. It doesn't really need to be changed, but it's a relatively small part to develop, and I think it will help me practice & get ready for the much more difficult rewrite of PHP's parser into directives.

I can probably find a reference that defines how docblocks should be processed. Perhaps javadoc. If not, I should set some expectations ahead of time about what AST structure we're trying to build (i should do that anyway!). I should describe the structure of PHP Ast! That would be a lot of work, but it would be supremely useful for navigating the AST, and for keeping me consistent in developing.

Oct 21, 2023

Making a .bak file of the current PHP Grammar. Going to maybe make some major changes to the directives. Tempted to start fresh, but kind of know that's gonna be a bad idea. I want to eliminate all of the php components, except for more generic directives. I want all of the logic to be in the Directive definitions.

I deleted the .bak file.

I set up Commands.php where i can define methods like run_something_something(...), and 'something.something' will automatically work in the directives! I added 2 instructions there, and included it in the directive processing. I removed the php methods for handling string & for handling backslash inside a string. Those are replaced with the 2 new instructions I wrote. I added one of them or two of them to the create parser documentation

Next time, I'll want to continue replacing php stuffs with directive instructions. Oh, and i restructured the php grammar directory. See code/Php/* I didn't update any trait/class names because i'm lazy.

I had previously made a versioning system, but I hate it. I want to do this in directives instead. And I want a php version instruction too.

Oct 18, 2023

I'm basically done with CreateParser.src.md. I maybe should add a better example of a full Directive. I may want to move the breakdown of instructions into its own file. And maybe rename CreateParser. Also, I suppose I haven't actually touched on any of the PHP of creating a Parser. Yeah. I guess I'll need to do that. So yeah, there's work to do yet on CreateParser.src.md

I updated the readme slightly. Commented out the Use an ast & parse code documentation links, but those will need to be made eventually.

Oct 17, 2023

Documented in CreateParser.src.md. Did a little docblocking. Only 7 more instructions to document!!!

Oct 16, 2023

I worked on documentation. I started a new README (DON'T RUN SCRAWL!). I added .docsrc/CreateParser.src.md and documented a fair bit about how the lexer works, as well as several instructions. I documented ALL the instructions that are in the MappedMethods.php file. I need to still document all the instructions in the switch statement in Instructions.php.

At some point it might be nice to refactor this stuff. It's all pretty confusing. But, I'm doing fine going through it and writing documentation, so I guess it doesn't really matter. As long as the documentation is good, it should be fine.

I probably need a new / better format for documenting the instructions, but what I have currently in CreateParser.src.md is quite alright for now.

I didn't add or modify any docblocks, and some of the documentation I found was wrong.

Aug 10, 2023

started in-depth parsing of vars in method body. added some return types added some docblocks added \Tlf\Lexer\Versions for versioning code added $signal to lexer for expressing expectations simply. added has() to ast to check for a key

Production by default runs with the old lexing. Tests, however, by default use Lexer\Versions::_1 (i.e. 1.0)

To check the var parsing & new signal-stuff see:

  • phptest -test Directives -run Var.Assign.Variable -stop_loop 27
  • phptest -test Directives -run Var.Assign.String (i broke this some how, but it WAS working)
  • or to run directive tests without new code: phptest -test Directives -version 0.1
  • code/Php/Operations.php
  • code/Php/Words.php
  • test/src/Php/Vars.php

TODO

  • (May 13, 2022) LilMigrations fails parsing. I believe foreach() is the problem, starting line 105. Need to figure out / fix this.
  • (apr 27, 2022) php method body does not contain blank newlines. It should, though, so i can print it back out just as it is written
  • review documentation
  • add to documentation: test/output/PhpFeatures.md contains a list of directive unit tests (their description, input code, and whether they passed their tests). Looking at this is a good way to see what language features have been implemented
  • maybe remove nesting a php class inside a namespace ast ... i just don't like it
  • am i handling interfaces?
  • clean up the php grammar (remove old commented debug code & whatnot)
  • create passing BashGrammar tests
  • create an easy mode feature like: $ast = $lexer->easy_mode('php', $string_to_lex)
  • delete the old BashGrammar, JsonGrammar, defunct Docblock grammar,
  • document directives (use docblocks, but i don't think the lexer can currently get docblocks attached to array keys like i want for this)
  • make generic base grammar that works how the PhpGrammar does
    • maybe make a StringGrammar as well, since string features are often shared between languages

March 13, 2022

All my tests are passing! I haven't implemented all of the PHP features, but I have the majority of them handled. I haven't added anything that's new in 8.0 or 8.1 & I don't know how I'll handle versioning ...

My directive tests are cleaned up. All my grammar tests are cleaned up. Basically, this thing is now awesome, robust, trustworthy.

It will still probably need features added here & there, but it's finally at a point where I can count on it.

It would be nice (but not necessary) to write a code-scrawl extension that documents all the directive tests for each grammar.

I would like to add an "EasyMode" to simply do like ... $ast = $lexer->easy_mode('php', $string_to_lex)

And I want to clean up this document.

Eventually, I want to catch expression, so a script or function body would have an array of expressions like $var = 'whatever'; and return true; as well as breaking down ifs/foreaches (ones with block at least) as expressions containing expressions.

Eventually, i would like to break down each expression so $var = 'whatever' is generic like: type=set variable, name='var', value='whatever'

Eventually, all this expression breakdown could be used to transpile between different languages.

TODOs (old, but maybe still relevant)

  • Finish the GrammarTesting documentation (once method bodies are available)

High Priority (internals)

  • Make DocblockGrammar work for bash (docblock starts with ##)
  • BashGrammar + simple tests. Catch:
    • docblocks
    • functions
    • maybe comments
    • nothing else

Low Priority

  • rename previous to meta or data. I'm thinking meta. & make it its own group if its not
  • Add meta information to ASTs & OPTIONALLY include meta information when getting an AST tree
    • Convert type to _type on ASTs maybe? (its meta information)
  • add additional command arg expansion features ([] and !)
  • abort feature:
    • Add an instruction that: Mark a directive as an abortion target
    • Add an instruction that: pop directives until a named abortion target is on the top of the stack
  • Threaded parsing. Starts as a single thread, then any instruction can create a new thread & now we'll continue processing on the main thread & process on the new thread as well. Ultimately only one thread can "win", so all but the "correct" thread will be discarded
    • move the while loop in Internals into its own function like do_the_actual_lexing($lexer, $token) or something.
    • Might just have to create multiple lexer instances to do this.
  • Performance: Use arrays, & do object conversions on directives at the last possible moment. I only want them as objects, so they're easier to work with by reference.

Latest (pretty old list)

  • Support return by reference on methods: public function &global_warming() (this method launches billionaires to space temporarily & returns a reference to an array detailing the CO2 pollution. (which is bad, because you can just lie and change it to less pollution, now that we're done with the space trip))
  • Support all arithmetic operators
  • added none operation target, which just stops the directive from being processed if it is that operation
    • Used by the /* operation so that the docblock directive still handles it
  • Refactored operation stuff & multi-char operations are now functional.
  • wrote operation_match() method to check if the current buffer is an operation. Simplifies matching multi-char operations.
  • Operators are refactored, so multi-char operators can be used
  • support simple array declaration
  • add Values trait to test value assignment
  • parse traits the same way I do classes (for the most part)
  • implemented: namespace Abc; class Def {} - get fully qualified name for the class
  • support & test php class use TraitName;
  • support & test php class consts
  • add a stop instruction set to php_code directive
  • catch <?php and ?>
  • Unit test static properties and methods
  • add minimal debugging output to the wd_ and op_ routing.
  • clean up notes
  • Property & method tests passing (& all other current tests)
  • Fix the methods are inside class modifiers bug
  • Add ability to -run Namespace.* and run any tests that wildcard match, basically.
  • Separate php directive tests into traits
  • Add word routing to wd_theword($lexer, $xpn, $ast)
  • Rename OtherDirectives to CoreDirectives
  • Refactored operations into Handlers, Operations, and Words
  • Methods are completely handled! (I think)
  • Implement almost complete for properties
    • Properties handle string concatenation, but NOT numeric operations
  • add operation routing to op_opname($lexer, $xpn)
  • New mechanism of operations & words