Yeah, though that it likely is not a problem for you - just to be wary.
Btw. i too use greedy tokenizer (Without requiring spaces between tokens tho. With one exception: all concatenations of "+", "-" ... so, "+-" and "---" will fail at tokenization time. To name an atrocious unambiguous/well-defined example: "--x -= --x - - - - x--"). When parsing, due to LL1 nature, whenever i encounter a token that was not expected i "degrade" the token if allowed and would provide the token the parser asks for. Ex: ">>=" -> ">" ">" "=", ">>" -> ">" ">".
On my front: did not have time to do anything yesterday, but today was quite fruitful.
Decided that i need to write down the pseudocode for the lexer (the part pertinent to the "Off-side rule" to ensure sanity in my production rules):
Also, the production rules have seen quite a lot of changes and fixes - almost all done. And the followup (Writing down what and how happens in the end. Also writing down what and why i discarded.) is progressing nicely too.
Code:
'??' - nonterminal symbol
<??> - terminal symbol representing a group (a'la: numbers, names, etc)
"??" - terminal symbol
() - mandatory choice list
[] - optional choice list
| - option separator
.. - optional repetition of preceding element without separators
?? - optional repetition of preceding element with ?? as separator
note : degradable tokens: ">>", ">>=" - nothing else is degradeable and the longest match for a token is taken.
note : nonsensical strings of "+" and "-" (a'la "---", "+-7") generate Lexer errors and do not generate tokens.
================================================================================ various bits an pieces
'typeVar' : ["read] ( "var" | 'type' ) // note: "var" is valid even in 'defContext'! (especially good for: "var {a, b] = ...")
'type' : 'typeBase' ['typeStore']
'typeStore' : ( "&" | "*" | "[" ['expr'] ]" ) .. // note: degrades "&&"
'typeBase' : [<Type> . ".] <Type> ["<" 'typeSub' , ">] // note: degrades ">>", ">>="
'typeSub' : ('typeBase' | "_") // can omit trailing ",", ex: "Foo<T1 = A, T2, T3 = B>" -> "Foo<_, C>"
: 'typeAlias' ["=" 'typeBase'] // only valid in 'defContext'. note: storage is not part of type. degrades ">>", ">>="
'typeAlias' : "<" <Type> ">"
--------------------------------------------------------------------------------
'varName' : ["#] <name> [<tag>] // "#" and <tag> are only valid in 'defStatic' and 'defContext'
'varBlock' : ( "\" 'varList' ; | "\OPEN" ('varList' ; ) \n "\CLOSE" )
'varList' : 'typeVar' 'varElem' ,
'varElem' : 'varName' ['varConstruct' | "=" 'expr']
: 'deconstruct' ["=" 'expr']
'varConstruct' : 'construct'
: ["." <name>] "(" 'expr' , ")"
--------------------------------------------------------------------------------
'fnParams' : ( ['typeVar' | 'typeAlias'] 'varElem' ) ,
'fnResult' : "->" 'typeVar' ["=" 'expr']
--------------------------------------------------------------------------------
'constructObj' : "new" 'typeBase' 'construct'
: "new" 'typeBase' ["." <name>] [ "(" 'expr' , ")]
: 'typeBase' 'construct'
'deconstruct' : "{" (['typeVar'] 'varName') , "}" // ex: "var {a, b} = {1, 3.14}" "Int a, {Float f} = {3.14}"
'construct' : "{" ([ ['typeVar'] "=" ] 'expr') , "}" // ex: "return Foo{Int& = a, b}"
'extent' : ['expr'] (".." | "<=" | ">=" | "<" | ">") ['expr']
--------------------------------------------------------------------------------
'alias' : ("!" | "as" <Type>) // ex: "alias Foo.Bar.Snafu!" == "alias Foo.Bar.Snafu as Snafu"
================================================================================ SOURCE
'source' : ['def'] \n "\EOF"
--------------------------------------------------------------------------------
'def' : 'defFunction'
: 'defContext'
: 'defAlias'
: 'defStatic'
: 'defInterface'
: 'defOperator'
: 'defUnwrap'
--------------------------------------------------------------------------------
'defContext' : ("context" | "namespace") 'typeBase' ['alias']
: ("class" | "struct") ['typeBase' "->] 'typeBase' [<unit>] ['alias'] [<tag>] ['varBlock']
'defAlias' : "alias" ('typeBase' [<unit>] 'alias') ,
'defStatic' : ["const] 'varList'
'defFunction' : ["final] ("function" | "override" | "abstract") ['typeBase' ".] <name> ["(" 'fnParams' ")] [<tag>] ['fnResult'] 'codeBlockTrail'
'defInterface' : "interface" ['typeBase' ".] <name> "->" (["#] <name> ["(" 'typeBase' , ")]) ,
'defOperator' : ["final] "operator" <name> ['typeBase' ".] ("+" | "-" | "*" | "/" | "%" | "&" | "|" | "^" | "<<" | ">>" | "==" | "!=" | ">=" | "<=" | ">" | "<" | "&&" | "||" | "^^")
['type' | 'typeAlias'] <name> ['fnResult'] 'codeBlockTrail'
: ["final] "operator" ['typeBase' ".] ("++" | "--" | "!" | "~" | "-" | "+") <name> ['fnResult'] 'codeBlockTrail'
: ["final] "operator" "[" (['type'] <name>) , ]" ['fnResult'] 'codeBlockTrail'
: ["final] "operator" <name> ['typeBase' ".] ("++" | "--") 'codeBlockTrail' // clones source object before calling this (note: no return value)
: "operator" ['typeBase' ".] "new" ["." <name>] ["(" 'fnParams' ")] ["clone] ['fnResult'] 'codeBlockTrail' // "clone" -> define clone operator using this constructor (one parameter, no conversions - good direct/unwrap match)
: "operator" ['typeBase' ".] "bye" ["." <name>] ["(" 'fnParams' ")] 'codeBlockTrail'
: "operator" ['typeBase' ".] "clone" [ <name> 'codeBlockTrail' ] // this = new_nonbuilt_obj, name = source_obj. missing-code: "operator clone obj \ this(obj)" -> use copy-constructor
'defUnwrap' : "unwrap" ['typeBase'] "->" ['fnResult'] 'codeBlockTrail'
: "unwrap" "return" 'expr'
: "unwrap" <name>
================================================================================ code
'statements' : ('statementC' | 'statementV') ;
'statementsV' : 'statementV' ; // variable declarations and expressions only
'statementBlock': "\OPEN" 'statements' \n "\CLOSE"
'codeBlock' : ('statementBlock' | 'statements')
'codeBlockTrail': ['statementBlock' | "\" 'statements']
'codeBlockCase' : ['statements'] ['statementBlock'] // combine - 'statements' begins where 'statementBlock' indent is
--------------------------------------------------------------------------------
'statementC' : 'jumpContinue' // note: all begin with a terminal
: 'jumpBreak'
: 'jumpReturn'
: 'jumpGoto'
: 'flowIf'
: 'flowWhile'
: 'flowDo'
: 'flowForeach'
: 'flowSwitch'
: 'flowScope' // not allowing 'statementBlock' directly
: 'flowDefer'
: 'label'
: 'assume'
'statementV' : 'varList' // both might begin with 'baseType', but will diverge immediately afterwards
: 'expr' // ...
--------------------------------------------------------------------------------
'jumpContnue' : "continue" ['exprTern']
'jumpBreak' : "break" [<name> | <value>]
'jumpReturn' : "return" ['expr']
'jumpGoto' : "goto" <name>
'flowIf' : "if" 'statementsV' 'codeBlockTrail' ["else" 'codeBlock']
'flowWhile' : "while" ['statementsV'] 'codeBlockTrail'
'flowDo' : "do" 'codeBlock' "while" 'statementsV'
'flowForeach' : "foreach" 'statementsV' "in" 'expr' // ie. "foreach var a = 10; a.foo() - 42; _ in ..."
'flowSwitch' : "switch" 'statementsV' "\n" 'swCase' \n
'flowScope' : "scope" 'codeBlock'
'flowDefer' : "defer" ["?] 'codeBlock'
'label' : "@" <name> ['statement']
'assume' : "assume" 'exprBool'
--------------------------------------------------------------------------------
'swCase' : "case" ["@" <name>] 'swOption' , ":" 'codeBlockCase'
'swOption' : ('extent' | 'expr') // Note: 'extent' begins with 'expr' (ie. first build 'expr' and then 'extent' if possible)
================================================================================ expression: precedence rules
'expr' : 'deconstruct' "=" 'exprAssign'
: 'exprAssign'
'exprAssign' : 'exprTern' [("+=" | "-=" | "*=" | "/=" | "%=" | "&=" | "|=" | "^=" | "<<=" | ">>=" | "=" | ":=") 'exprAssign'] // note: ":=" makes anything besides just <name> before it invalid
'exprTern' : 'exprBool' ["?" 'expr' ":" 'expr']
'exprBool' : 'exprIs' [("&&" | "||" | "^^") 'exprBool']
'exprIs' : 'exprCmp' [("is" | "instanceof") 'typeBase'] // "is" -> is-a (public interfaces), "instanceof" -> exact type match
'exprCmp' : 'exprAdd' [("==" | "!=" | ">=" | "<=" | ">" | "<") 'exprCmp']
'exprAdd' : 'exprMul' [("-" | "+") 'exprAdd']
'exprMul' : 'exprBin' [("*" | "/" | "%") 'exprMul']
'exprBin' : 'exprMov' [("&" | "|" | "^") 'exprBin']
'exprMov' : 'exprUn' [("<<" | ">>") 'exprMov']
'exprUn' : ("!" | "~" | "-" | "+") 'value'
: "&" 'value' // note: there can be only one (cannot degrade "&&" and "& &foo" is also wrong)! "&" means that value should maintain its indirection ("new" does that too), ex: "Int&& foo, bar = &foo".
: ("++" | "--") 'value'
: 'value' ("++" | "--")
: 'value'
================================================================================ value
'value' : <const>
: "(" 'expr' ")" // note: value is expression - if expression would also be value then infinite recursion on syntax error ... that would be bad
: "null" ["(" 'type' ")]
: 'lambda'
: "_" // anonymous/internal variable
: 'construct' // anonymous structure: "{1, 2}[0] == 1" -> because properties have same type and hence the structure can be treated as an array
: "[" 'extent' ]"
: 'constructObj' // both might begin with 'baseType', but will diverge immediately afterwards
: 'lookup' // ...
--------------------------------------------------------------------------------
'lambda' : "$" ["&] [<name> ,] ["(" 'fnParams' ")] ['fnResult'] 'codeBlockTrail' // type: Handle (exact type is tracked by compiler)
-------------------------------------------------------------------------------- lots of cutting corners here
'lookup' : ['typeBase' | "this] ["." "super] . 'lookupBody' // 'typeBase' only valid on initial lookup, ex: "Foo.handlerVar.this.super.getter(7)" calls handlerVar(7) with this = this.super.getter
'lookupBody' : 'varConstruct' // call dependent constructors (will be done automatically first if not done explicitly)
: "." ('fnOperator' | <name>) "(" ['typeBase' ,] ")" // ex: "&Foo.bar.dooh(Int, Int)" - not allowed to omit the signature!
: "." 'fnOperator' "(" 'expr' , ")" ['fnInvokers'] ["." 'lookup'] ??? extra bloody dots everywhere
: [".] <name> ['fnInvokers'] ['lookupBody'] // "." can only be omitted if there is nothing prior it, 'varConstruct' is not applicable anymore.
... eh, redo this crap
'fnInvokers' : ( "(" 'expr' , ")" | "[" 'expr' , ]" ) ..
'fnOperator' : ("+" | "-" | "*" | "/" | "%" | "&" | "|" | "^" | "<<" | ">>"
| "==" | "!=" | ">=" | "<=" | ">" | "<" | "&&" | "||"
| "^^" | "++" | "--" | "!" | "~" | "-" | "+" | "[" ]"
| "\" ("++" | "--")) // post- increment/decrement
... still some crap to straighten.
That would be nice ... but i can not think of a way how to do it :/
Not anything clean at least - and ugly hacks might lead us back to C++ nightmares.