Docblock + Comments Grammar
July 13, 2021
I decided to just do a custom php implementation of the whole thing. The grammar handles catching the start & end of the docblock & calling the appropriate php function. This seemed easier, because I couldn't really store anything to the ast until I had already gone through the entire thing & I need to know about lines & it just seemed really complex, so writing pure php seemed easier.
I still think it should be re-written as a proper grammar, but there's no explicit reason to do this.
I may want a grammar that processes strings containing attributes, though, like so I can capture @attr(arg, arg2, etc) and description
But I don't need that right now, so I'm not doing it right now
The "OLD" notes below are likely a good starting point for writing a proper docblock grammar.
OLD (before July 13, 2021)
I wrote most of a Docblock Grammar. Then started trying to handle the padding issue & it was non-starter. So that has to basically be scratched so I can implement this new version.
I think I'll just go through the breakdown WITH attributes & set that one up, instead of trying to work attributes in after setting up the simpler attribute-free one.
I would mayybe like to setup some smaller tests. But Idunno. I'm probably okay with just parsing one big docblock & making sure its how I expect it. Especially since docblocks aren't complex & don't have a lot of components.
Some rules
- leading whitespace is always removed from the first line. Like
/** something
becomessomething
- left-pad consists of
whitespace * whitespace
& is converted to full whitespace- the
*
is always replaced with a
- the
-
leftPadToRemove =
the length of whichever line has the shortest left-pad.- Does NOT count any lines after a start-of-line
@attribute
- Does NOT count the first line
- Does NOT count empty lines (though they will be trimmed)
- Then each non-empty line has that much left-pad removed from them.
- Does NOT count any lines after a start-of-line
- Attributes MUST be
[a-zA-Z_]+
- Attributes MAY have parenthetical arguments like
@attr(arg1, arg2)
- Can occur anywhere
- Can only have a description if it's the first thing on a line
-
\n * something @see(this) thing yes please
. There is no description for@see(this)
-
\n * @see(this) thing yes please
.thing yes please
is the description
-
- Attributes without parenthetical arguments
- MUST be the first non-whitespace non-star character. Like
\n * @attr whatever
- Captures a description until the next
@attribute
- MUST be the first non-whitespace non-star character. Like
The Lexer breakdown
/* abc
* def
* ghi
* jkl
mno
*/
- match
/*
& discard all before. - match
\n
& storeabc
asdocblock.line1
- match
*
& replace it with - match
\n
& push the linedef
ontodoblock.lines
- match
*
& replace it with - match
\n
& push the lineghi
ontodocblock.lines
- match
\n
& push the linemno
ontodocblock.lines
- match
*/
& set the linedocblock.last_line
- call php method to clean up the docblock, which will do:
- trim the first line
- iterate over all lines & find the shortest left-pad
- remove left-padding from all lines per the rules
- trim the last line
- combine all the lines for the description & the attributes.
Breakdown with attributes
/* abc
* def
@param this is a sentence about @param
line 2 param
* @param(two,three) has a description. @see(something)
*/
- match
/*
& discard all before. - match
\n
& storeabc
asdocblock.line1
- match
*
& replace it with - match
\n
& push the linedef
ontodoblock.lines
- match
@param
, create new attributeast
& set its name - match
\n
. Store the line on the attribute ast asattr.line1
- match
\n
. Store the lineline 2 param
onattr.lines
- it will be trimmed via the same left-pad trimming as docblock lines
- match
*
& replace it with - match
@param(
, terminate processing of any prior@attribute
. Create new attribute ast & set its name & create an empty argslist - match
,
. pushtwo
ontoattribute.args
- match
)
. pushthree
ontoattribute.args
. - match
@see(
. new attributeast
, set name. createattribute.args
list.- Append it to
@param(two,three)
sattributes
- OR add the attribute to the docblock? (probably not)
- Append it to
- match
)
. pushsomething
toattribute.args
. immediately stop this directive. - match
\n
. Storehas a description. @see(something)
asattribute.line1
on@param(two,three)
- match
*/
. set the lineattribute.last_line
- call php method to clean up the docblock, which will do:
- trim the first line
- iterate over all lines & find the shortest left-pad
- remove left-padding from all lines per the rules
- trim the last line
- combine all the lines for the description & the attributes.
Older notes
The indentation problem
/* abc
* def
* ghi
* jkl
mno
*/
Should become:
abc
def
ghi
jkl
mno
So the indent to remove is the shortest indent, other than the first line.
The shortest indent is:
*
, before def
For the rest the lines, it doesn't matter if they have a star or not. It just matters the distance from the 0 column to the first non-star char
So we find the SMALLEST distance from the 0 column to the first non-star char
The first line is always free from indentation considerations, due to how the grammar processes
The old notes
- Write DocBlock + Comments Grammar
-
src
,description
, andattributes
- Can come from multiple different styles (
##\n#\n#\n#
or/**\n*\n*
or// comment
or! comment
or whatever)- Maybe there is a docblock cleanup step that is non lexerryy?? Or I have to add something to make it flexible ...
- I suppose I could have a method that modifies one of my directives, so basically you just set the docblock type to
/*
, then the relevant directives are modified accordingly. This is a fairly convoluted approach, but it would also work. It could even be "aware" of what other language grammar is present, maybe. (maybe not though).
- catches
@name and the description about it
,@name(arg1,arg2) description about it
, ... what about@arg prop_name description about it
& catchprop_name
??? I think this is a case-by-case kind of thing. Maybe depending upon the particular attribute? maybe the language grammar can define it
-
/** * */
or /** */
or
or
or // // //
So, first I have to make a decision about "what is a docblock?", well the common answer is:
/**
*
*
*/
But if I want docblock functionality in different languages (bash), how tf do I do that? Well I've proposed:
##
#
#
#
Which doesn't supply a clear terminator, but it makes sense.
I could also do just a series of single-line comments. But I think that breaks a common expectation. So no. I won't do that.
So right now, I'm counting ##\n#\n#
and /**\n*\n*/
as well as their single-line counterparts ## stuff
and /** */
But the ## stuff
breaks the docblock contract
But how do I count ## whatever
as a docblock but NOT catch it as a comment?
by simply defining things separately.... basically. The #
would start a comment, but listen for docblock. Then if the next char is #
, then that means we have a docblock & the comment listening can stop & we go into docblock mode.
But how does docblock processing actually work? Lets start with a single line example.
/** I am a simple docblock */
/*
starts a docblock
*/
ends a docblock
I am a simple docblock
is the description / body
Then
/**
* Simple multi-line docblock
*/
So ... previously, I was just parsing all of the stars out after getting the end of the docblock. I could do that still, I suppose, but its kind of a jank solution, especially considering I have a lexer!
So the things I want to remove are:
^\s+\*
If there is no *
on a line,
- Get a
/*
to start a docblock - Get a
\n
to end a line- rewind 1
- append to body. Append to description?
- forward 1
- Get a
\n
or a*
and discard what's before*
Attributes:
- Start-of-line attributes like
@param $argName description of arg
- in-line attributes like
I want you to @see TheThing
- start-of-line like
@param($argName, string) description of the arg
- in-line like:
Lets @see(TheThing)
I don't really care about the This is @abad Inline Type
. Because its just ... not really clear how that should be interpreted & I don't think I want to make decisions about it & Code Scrawl doesn't use this style. Or it hasn't, anyway.