rof File Format

STATUS: Likely trashed

I couldn't get the regex-only parsing to work. I needed a config format for bash, but that was kind of... complex... I thought I could find a really simple, straightforward way to handle key:value pairs in multiple programming languages, but... Idunno. I'm very likely giving up on this project.

a.key: A value

Another_key-whoo: A
multiline
Value

A.Key.again: \   <- Keep leading whitespace
Escape the next line, because it looks like a key
\not.really-a_key: Needs escaping
\\ <- To have an actual backslash

The next key has an empty value, but that's okay.
This key wants to keep trailing whitespace      \

last.key:

Under Development

This idea is kind of a flop atm. But I have a crude version that kind of works, detailed just below.

What's working:

The Regex

  • ^([a-zA-Z0-9_\-\.]+):(?:\s|\r|\n)*((?:(?:.|\n|\r)(?!^[a-zA-Z0-9_\-\.]+:))+)
  • Require multi-line & global flags/modifiers

Rules:

  • keys contain a-z, A-Z, dash (-), underscore (_), 0-9, and dot (.) and
  • Keys terminated by :
  • values contain any characters
  • values can be multi-line
  • A value line, if matching keypattern:, will be parsed as a key & a new value will start
  • White-space is trimmed form beginning of value
  • White-space is NOT trimmed from end of value
    • I desperately want to fix this

Decompressed version of the regex

^([a-zA-Z0-9_\-\.]+) # key
 :
(?:\s|\r|\n)*
 ((?:
    (?:.|\n|\r) # characters we want
    (?!^[a-zA-Z0-9_\-\.]+:) # But NOT if those characters make up a key
 )+)

TODO

  • Refine the regex so all keys are $1 & all values are $2
  • Trim whitespace surrounding values
  • Expand the keys to allow for ([a-zA-Z\-\_0-9\.]+) & possibly other characters.
    • I'm testing with just a-z keys because that's wayyyy simpler
  • Create a multi-lingual examples, at least: bash, PHP, javascript (because I know how to use those lol)
  • And never use (.|\r|\n)*, use (?s).*



Notes & Wishful thinking

The target format is

key: value 1
nightmare:DELIM:
notakey:
    obviously not a key
notakey:
:DELIM:
abc: value 2
new line
anotherkey:: value 
nostring: on this one
::

Which would yield These key/value pairs

key

value1

nightmare

notakey:
    obviously not a key
notakey:

abc

value 2
new line

anotherkey

value 
nostring: on this one

What is working

  • Correctly matches non-delimited keys & values
    • ([a-z]+):((?:(?:.|\n|\r)(?!^[a-z]+:))+)
    • ([a-z]+):((?:(?:.|\n|\r)(?!^[a-z]+:(?![A-Z]*:)))+)
  • Correctly matches delimited keys & values
    • ([a-z]+):([A-Z]*:)((.|\r|\n)*)^:\2
  • Matches everything correctly, BUT references are not right
    • (?:(?:([a-z]+):([A-Z]*:)((.|\r|\n)*)^:\2)|([a-z]+):((?:(?:.|\n|\r)(?!^[a-z]+:))+))
    • The 'list' feature on regexr, using ($1|$5) = --$3 || $6 --\n shows everything with clear differentiation between delim & non-delim
      • (?:(?:([a-z]+):([A-Z]:)((.|\r|\n))^:\2)|([a-z]+):()((?:(?:.|\n|\r)(?!^[a-z]+:))+)) to make it 1/5 & 3/7
    • # $1$5\n$3$6\n - shows everything cleanly
  • Matches everything and gives me victory:
    • (?|(?:([a-z]+):([A-Z]*:)((.|\r|\n)*)^:\2)|([a-z]+):()((?:(?:.|\n|\r)(?!^[a-z]+:))+))
    • $1 = $3\n\n does a nice print of it