Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar railroad diagram #50

Open
mingodad opened this issue Dec 27, 2021 · 2 comments
Open

Grammar railroad diagram #50

mingodad opened this issue Dec 27, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@mingodad
Copy link

After going through the CiParser.cs and writing an LL(1) grammar with a modified https://github.com/mingodad/CocoR-CSharp I've got it to generate an EBNF understood by https://www.bottlecaps.de/rr/ui to generate railroad diagram (https://en.wikipedia.org/wiki/Syntax_diagram).

Copy the EBNF shown bellow and paste on https://www.bottlecaps.de/rr/ui in the tab Edit Grammar then switch to the tab View Diagram :

//
// EBNF generated by CocoR parser generator to be viewed with https://www.bottlecaps.de/rr/ui
//

//
// productions
//

Cito ::=  stmt* EOF
stmt ::=  ParseDoc? Public? ( ParseClass | ParseEnum | ParseNative )
ParseDoc ::=  DocComment ParseCodeDoc
ParseClass ::=  ( Static | Abstract | Sealed )? Class ParseId ( Colon ParseId )? LeftBrace ( ParseDoc? CiVisibility? ( ParseConst | ParseCallType? ( Void ParseClassMember | ParseType ( ParseBlock | ParseClassMember ) ) ) )* RightBrace
ParseEnum ::=  Enum Asterisk? ParseId LeftBrace EnumElement ( Comma EnumElement )* RightBrace
ParseNative ::=  Native LeftBrace RightBrace
ParseCodeDoc ::=  ParsePara ( Dot ParseBlock* )?
ParsePara ::=  ParseText ( CodeDelimiter ParseText )*
ParseBlock ::=  LeftBrace ParseStatement* RightBrace
ParseText ::=  ANY
ParseId ::=  ident
CiVisibility ::=  Internal | Protected | Public | Private
ParseConst ::=  Const ParseType ParseId Assign ParseConstInitializer Semicolon
ParseCallType ::=  Static | Abstract | Virtual | Override | Sealed
ParseClassMember ::=  ParseId ( ParseMethod | ( Assign ParseExpr )? Semicolon )
ParseType ::=  ParseExpr ( Range ParseExpr )?
ParseMethod ::=  ExclamationMark? LeftParenthesis ( MethodParameter ( Comma MethodParameter )* )? RightParenthesis Throws? ( Semicolon | FatArrow ParseReturnExpr | ParseBlock )
ParseExpr ::=  ParseCondOrExpr ( QuestionMark ParseExpr Colon ParseExpr )?
ParseConstInitializer ::=  LeftBrace ParseCollection? RightBrace | ParseExpr
ParseCollection ::=  ParseExpr ( Comma ParseExpr )*
MethodParameter ::=  ParseDoc? ParseVarType
ParseReturnExpr ::=  ParseExpr? Semicolon
ParseVarType ::=  ParseType ParseVarAssign
ParseVarAssign ::=  ParseId ( Assign ParseAssign )?
ParseAssign ::=  ParseType ( ( Assign | AddAssign | SubAssign | MulAssign | DivAssign | ModAssign | AndAssign | OrAssign | XorAssign | ShiftLeftAssign | ShiftRightAssign ) ParseAssign | ParseVarAssign )?
EnumElement ::=  ParseDoc? ParseId ( Assign ParseExpr )?
ParseStatement ::=  ParseBlock | ParseAssert | ParseBreak | ParseConst | ParseContinue | ParseDoWhile | ParseFor | ParseForeach | ParseIf | ParseLock | ParseNative | ParseReturn | ParseSwitch | ParseThrow | ParseWhile | ParseAssign Semicolon
ParseAssert ::=  Assert ParseExpr ( Comma ParseExpr )? Semicolon
ParseBreak ::=  Break Semicolon
ParseContinue ::=  Continue Semicolon
ParseDoWhile ::=  Do ParseLoopBody While ParseParenthesized Semicolon
ParseFor ::=  For LeftParenthesis ParseAssign? Semicolon ParseExpr? Semicolon ParseAssign? RightParenthesis ParseLoopBody
ParseForeach ::=  Foreach LeftParenthesis ( LeftParenthesis ParseForeachIterator Comma ParseForeachIterator RightParenthesis | ParseForeachIterator ) In ParseExpr RightParenthesis ParseLoopBody
ParseIf ::=  If ParseParenthesized ParseStatement ( Else ParseStatement )?
ParseLock ::=  Lock ParseParenthesized ParseStatement
ParseReturn ::=  Return ParseReturnExpr
ParseSwitch ::=  Switch ParseParenthesized LeftBrace ParseCaseStatement* ( Default Colon ParseStatement* )? RightBrace
ParseThrow ::=  Throw ParseExpr Semicolon
ParseWhile ::=  While ParseParenthesized ParseLoopBody
ParseLoopBody ::=  ParseStatement
ParseParenthesized ::=  LeftParenthesis ParseExpr RightParenthesis
ParseForeachIterator ::=  ParseType ParseId
ParseCaseStatement ::=  ParseCase ParseCase* ParseStatement*
ParseCase ::=  Case ParseExpr Colon
ParseCondOrExpr ::=  ParseCondAndExpr ( CondOr ParseCondAndExpr )*
ParseCondAndExpr ::=  ParseOrExpr ( CondAnd ParseOrExpr )*
ParseOrExpr ::=  ParseXorExpr ( Or ParseXorExpr )*
ParseXorExpr ::=  ParseAndExpr ( Xor ParseAndExpr )*
ParseAndExpr ::=  ParseEqualityExpr ( And ParseEqualityExpr )*
ParseEqualityExpr ::=  ParseRelExpr ( ( Equal | NotEqual ) ParseRelExpr )*
ParseRelExpr ::=  ParseShiftExpr ( ( Less | LessOrEqual | Greater | GreaterOrEqual ) ParseShiftExpr | Is ParsePrimaryExpr ParseId? )*
ParseShiftExpr ::=  ParseAddExpr ( ( ShiftLeft | ShiftRight ) ParseAddExpr )*
ParsePrimaryExpr ::=  ( Increment | Decrement | Minus | Tilde | ExclamationMark | New ) ParsePrimaryExpr | ( Literal | ParseInterpolatedString | LeftParenthesis ParseType RightParenthesis | ParseSymbolReference | ParseListType | ParseDictionaryType | Resource Less BYTE LeftBracket RightBracket Greater ParseParenthesized ) ( Dot ParseSymbolReference | LeftParenthesis ParseCollection? RightParenthesis | LeftBracket ParseExpr? RightBracket | ( Increment | Decrement | ExclamationMark | Hash ) )*
ParseAddExpr ::=  ParseMulExpr ( ( Plus | Minus ) ParseMulExpr )*
ParseMulExpr ::=  ParsePrimaryExpr ( ( Asterisk | Slash | Mod ) ParsePrimaryExpr )*
Literal ::=  intcon | floatcon | string | charcon | BoxedFalse | BoxedTrue | BoxedNull
ParseInterpolatedString ::=  interpolatedstring
ParseSymbolReference ::=  ParseId
ParseListType ::=  List Less ParseTypeTemplate Greater
ParseDictionaryType ::=  ( Dictionary | SortedDictionary ) Less ParseTypeTemplate Comma ParseTypeTemplate Greater
ParseTypeTemplate ::=  ParsePrimaryExpr

//
// tokens
//

Abstract ::= "abstract"
AddAssign ::= "+="
And ::= "&"
AndAssign ::= "&="
Assert ::= "assert"
Assign ::= "="
Asterisk ::= "*"
Break ::= "break"
Case ::= "case"
Class ::= "class"
CodeDelimiter ::= "`"
Colon ::= ":"
Comma ::= ","
CondAnd ::= "&&"
CondOr ::= "||"
Const ::= "const"
Continue ::= "continue"
Decrement ::= "--"
Default ::= "default"
Dictionary ::= "Dictionary"
DivAssign ::= "/="
DocComment ::= "///"
Do ::= "do"
Dot ::= "."
Else ::= "else"
Enum ::= "enum"
Equal ::= "=="
ExclamationMark ::= "!"
FatArrow ::= "=>"
Foreach ::= "foreach"
For ::= "for"
Greater ::= ">"
GreaterOrEqual ::= ">="
Hash ::= "#"
If ::= "if"
Increment ::= "++"
In ::= "in"
Internal ::= "internal"
Is ::= "is"
LeftBrace ::= "{"
LeftBracket ::= "["
LeftParenthesis ::= "("
Less ::= "<"
LessOrEqual ::= "<="
List ::= "List"
Lock ::= "lock"
Minus ::= "-"
Mod ::= "%"
ModAssign ::= "%="
MulAssign ::= "*="
Native ::= "native"
New ::= "new"
NotEqual ::= "!="
Or ::= "|"
OrAssign ::= "|="
Override ::= "override"
Plus ::= "+"
Protected ::= "protected"
Public ::= "public"
QuestionMark ::= "?"
Range ::= ".."
Resource ::= "resource"
Return ::= "return"
RightBrace ::= "}"
RightBracket ::= "]"
RightParenthesis ::= ")"
Sealed ::= "sealed"
Semicolon ::= ";"
ShiftLeft ::= "<<"
ShiftLeftAssign ::= "<<="
ShiftRight ::= ">>"
ShiftRightAssign ::= ">>="
Slash ::= "/"
SortedDictionary ::= "SortedDictionary"
Static ::= "static"
SubAssign ::= "-="
Switch ::= "switch"
Throw ::= "throw"
Throws ::= "throws"
Tilde ::= "~"
Virtual ::= "virtual"
Void ::= "void"
While ::= "while"
Xor ::= "^"
XorAssign ::= "^="
BoxedFalse ::= "false"
BoxedTrue ::= "true"
BoxedNull ::= "null"
BYTE ::= "byte"
Private ::= "private"

This grammar deviates a bit from the code flow on CiParser.cs but I think is good enough to give a global overview of Cito grammar
actually it recognizes almost all of test/*.ci:


COMPILER Cito

TERMINALS
	T_SYMBOL

CHARACTERS
	letter    = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_".
	oct        = '0'..'7'.
	digit     = "0123456789".
	nzdigit    = '1'..'9'.
	cr        = '\r'.
	lf        = '\n'.
	tab       = '\t'.
	stringCh  = ANY - '"' - '\\' - cr - lf.
	charCh    = ANY - '\'' - '\\' - cr - lf.
	printable = '\u0020' .. '\u007e'.
	hex       = "0123456789abcdefABCDEF".

	newLine   = cr + lf.
	notNewLine = ANY - newLine .
	ws         = " " + tab + '\u000b' + '\u000c'.

TOKENS
	ident     = letter { letter | digit }.
	floatcon = ( '.' digit {digit} [('e'|'E')  ['+'|'-'] digit {digit}]
		| digit {digit} '.' {digit} [('e'|'E')  ['+'|'-'] digit {digit}]
		| digit {digit} ('e'|'E')  ['+'|'-'] digit {digit}
		) ['f'|'l'|'F'|'L'].

	intcon   = ( nzdigit {digit}
		| '0' {oct}
		| ("0x"|"0X") hex {hex}
		) {'u'|'U'|'l'|'L'}.

	string    = '"' { stringCh | '\\' printable } '"'.
	interpolatedstring    = "$\"" { stringCh | '\\' printable } '"'.
	badString = '"' { stringCh | '\\' printable } (cr | lf).
	charcon      = '\'' ( charCh | '\\' printable { hex } ) '\''.

	Abstract = "abstract" .
	AddAssign = "+=" .
	And = "&" .
	AndAssign = "&=" .
	Assert = "assert" .
	Assign = "=" .
	Asterisk = "*" .
	Break = "break" .
	Case = "case" .
	Class = "class" .
	CodeDelimiter = "`" .
	Colon = ":" .
	Comma = "," .
	CondAnd = "&&" .
	CondOr = "||" .
	Const = "const" .
	Continue = "continue" .
	Decrement = "--" .
	Default = "default" .
	Dictionary = "Dictionary" .
	DivAssign = "/=".
	DocComment = "///" .
	Do = "do" .
	Dot = "." .
	Else = "else" .
	Enum = "enum" .
	Equal = "==" .
	ExclamationMark = "!" .
	FatArrow = "=>" .
	Foreach = "foreach" .
	For = "for" .
	Greater = ">" .
	GreaterOrEqual = ">=" .
	Hash = "#" .
	If = "if" .
	Increment = "++" .
	In = "in" .
	Internal = "internal" .
	Is = "is" .
	LeftBrace = "{" .
	LeftBracket = "[" .
	LeftParenthesis = "(" .
	Less = "<" .
	LessOrEqual = "<=" .
	List = "List" .
	Lock = "lock" .
	Minus = "-" .
	Mod = "%" .
	ModAssign = "%=" .
	MulAssign = "*=" .
	Native = "native" .
	New = "new" .
	NotEqual = "!=" .
	Or = "|" .
	OrAssign = "|=" .
	Override = "override" .
	Plus = "+" .
	Protected = "protected" .
	Public = "public" .
	QuestionMark = "?" .
	Range = ".." .
	Resource = "resource" .
	Return = "return" .
	//RightAngle = ">" .
	RightBrace = "}" .
	RightBracket = "]" .
	RightParenthesis = ")" .
	Sealed = "sealed" .
	Semicolon  = ";" .
	ShiftLeft = "<<" .
	ShiftLeftAssign = "<<=" .
	ShiftRight = ">>" .
	ShiftRightAssign = ">>=" .
	Slash = "/" .
	SortedDictionary = "SortedDictionary" .
	Static = "static" .
	SubAssign = "-=" .
	Switch = "switch" .
	Throw = "throw" .
	Throws = "throws" .
	Tilde = "~" .
	Virtual = "virtual" .
	Void = "void" .
	While = "while" .
	Xor = "^" .
	XorAssign = "^=" .

	BoxedFalse = "false" .
	BoxedTrue = "true" .
	BoxedNull = "null" .

	BYTE : ident = "byte" .
	Private : ident = "private" .

PRAGMAS
	PreIf = "#if" {notNewLine} newLine.
	PreElIf = "#elif" {notNewLine} newLine.
	PreElse = "#else" {notNewLine} newLine.
	PreEndIf = "#endif" {notNewLine} newLine.

	COMMENTS FROM "/*" TO "*/" NESTED
	COMMENTS FROM "//" TO lf

IGNORE cr + lf + tab

/*-------------------------------------------------------------------------*/

PRODUCTIONS

Cito =
	{stmt}
	EOF
	.

stmt =
	[ParseDoc]
	["public"] (
		ParseClass
		| ParseEnum
		| ParseNative
	)
	.

ParseDoc =
	DocComment ParseCodeDoc
	.

ParseCodeDoc =
	ParsePara ['.' {ParseBlock}]
	.

ParsePara =
	ParseText {CodeDelimiter ParseText}
	.

ParseClass =
	[Static | Abstract | Sealed]
		Class ParseId [Colon ParseId]
		LeftBrace
		{
			[ParseDoc]
			[CiVisibility] (
				ParseConst
				| [ParseCallType] (
					Void ParseClassMember
					| ParseType (
						ParseBlock //Constructor
						| ParseClassMember
						)
					)
			)
		}
		RightBrace
	.

ParseClassMember =
	ParseId (
		ParseMethod
		| [Assign ParseExpr] Semicolon
	)
	.

CiVisibility =
	Internal
	| Protected
	| Public
	| Private
	.

ParseConst =
	Const ParseType ParseId Assign ParseConstInitializer Semicolon
	.

ParseConstInitializer =
	LeftBrace [ParseCollection] RightBrace
	| ParseExpr
	.

ParseType =
	ParseExpr [Range ParseExpr]
	.

ParseCallType =
	Static
	| Abstract
	| Virtual
	| Override
	| Sealed
	.

ParseMethod =
	[ExclamationMark]
		LeftParenthesis
			[MethodParameter { Comma MethodParameter }]
		RightParenthesis [Throws]
		(
			Semicolon
			| FatArrow ParseReturnExpr
			| ParseBlock
		)
	.

MethodParameter =
	[ParseDoc] ParseVarType
	.

ParseVarType =
	ParseType ParseVarAssign
	.

ParseVarAssign =
	ParseId [Assign ParseAssign]
	.

ParseAssign =
	ParseType [
		(
			Assign
			| AddAssign
			| SubAssign
			| MulAssign
			| DivAssign
			| ModAssign
			| AndAssign
			| OrAssign
			| XorAssign
			| ShiftLeftAssign
			| ShiftRightAssign
		) ParseAssign
		| ParseVarAssign
	]
	.

ParseEnum =
	Enum [Asterisk] ParseId LeftBrace EnumElement {',' EnumElement} RightBrace
	.

EnumElement =
	[ParseDoc] ParseId [Assign ParseExpr]
	.

ParseBlock =
	LeftBrace {ParseStatement} RightBrace
	.

ParseStatement =
	ParseBlock
	| ParseAssert
	| ParseBreak
	| ParseConst
	| ParseContinue
	| ParseDoWhile
	| ParseFor
	| ParseForeach
	| ParseIf
	| ParseLock
	| ParseNative
	| ParseReturn
	| ParseSwitch
	| ParseThrow
	| ParseWhile
	| ParseAssign Semicolon
	.

ParseAssert =
	Assert ParseExpr [Comma ParseExpr] Semicolon
	.

ParseBreak =
	Break Semicolon
	.

ParseContinue =
	Continue Semicolon
	.

ParseLoopBody =
	ParseStatement
	.

ParseDoWhile =
	Do ParseLoopBody While ParseParenthesized Semicolon
	.

ParseFor =
	For LeftParenthesis
			[ParseAssign] Semicolon [ParseExpr] Semicolon [ParseAssign]
		RightParenthesis ParseLoopBody
	.

ParseForeachIterator =
	ParseType ParseId
	.

ParseForeach =
	Foreach LeftParenthesis (
		LeftParenthesis ParseForeachIterator Comma ParseForeachIterator RightParenthesis
		| ParseForeachIterator
		) In ParseExpr RightParenthesis ParseLoopBody
	.

ParseIf =
	If ParseParenthesized ParseStatement [Else ParseStatement]
	.

ParseLock =
	Lock ParseParenthesized ParseStatement
	.

ParseNative =
	Native LeftBrace (. SkipNested(_LeftBrace, _RightBrace); .) RightBrace
	.

ParseReturn =
	Return ParseReturnExpr
	.

ParseReturnExpr =
	[ParseExpr] Semicolon
	.

ParseSwitch =
	Switch ParseParenthesized LeftBrace {ParseCaseStatement} [Default Colon {ParseStatement}] RightBrace
	.

ParseCaseStatement =
	ParseCase {ParseCase} {ParseStatement}
	.

ParseCase =
	Case ParseExpr Colon
	.

ParseThrow =
	Throw ParseExpr Semicolon
	.

ParseWhile =
	While ParseParenthesized ParseLoopBody
	.

ParseParenthesized =
	LeftParenthesis ParseExpr RightParenthesis
	.

ParseExpr =
	ParseCondOrExpr [QuestionMark ParseExpr Colon ParseExpr]
	.

ParseCondOrExpr =
	ParseCondAndExpr {CondOr ParseCondAndExpr }
	.

ParseCondAndExpr =
	ParseOrExpr {CondAnd ParseOrExpr}
	.

ParseOrExpr =
	ParseXorExpr {Or ParseXorExpr}
	.

ParseXorExpr =
	ParseAndExpr {Xor ParseAndExpr}
	.

ParseAndExpr =
	ParseEqualityExpr {And ParseEqualityExpr}
	.

ParseEqualityExpr =
	ParseRelExpr {(Equal | NotEqual) ParseRelExpr}
	.

ParseRelExpr =
	ParseShiftExpr {
		(Less | LessOrEqual | Greater | GreaterOrEqual) ParseShiftExpr
		| Is ParsePrimaryExpr [ParseId]
		}
	.

ParseShiftExpr =
	ParseAddExpr {(ShiftLeft | ShiftRight) ParseAddExpr}
	.

ParseAddExpr =
	ParseMulExpr {(Plus | Minus) ParseMulExpr}
	.

ParseMulExpr =
	ParsePrimaryExpr {(Asterisk | Slash | Mod) ParsePrimaryExpr}
	.

ParsePrimaryExpr =
	(Increment | Decrement | Minus | Tilde | ExclamationMark | New) ParsePrimaryExpr
	| (
		Literal
		| ParseInterpolatedString
		| LeftParenthesis ParseType RightParenthesis
		| ParseSymbolReference
		| ParseListType
		| ParseDictionaryType
		| Resource Less BYTE LeftBracket RightBracket Greater ParseParenthesized
	) {
		Dot ParseSymbolReference
		| LeftParenthesis [ParseCollection] RightParenthesis
		| LeftBracket [ParseExpr] RightBracket
		| (Increment | Decrement | ExclamationMark | Hash)
	}
	.

/*
CheckXcrementParent =
	(Increment | Decrement) ParsePrimaryExpr
	.
*/
ParseInterpolatedString =
	interpolatedstring //InterpolatedString ParseExpr [Comma ParseExpr] [Colon Number] "\"$"//RightBrace
	.

ParseDictionaryType =
	(Dictionary | SortedDictionary) Less ParseTypeTemplate Comma ParseTypeTemplate Greater
	.

ParseListType =
	List Less ParseTypeTemplate Greater
	.

ParseTypeTemplate =
	ParsePrimaryExpr
	.

ParseSymbolReference =
	ParseId
	.

ParseId =
	ident
	.

ParseCollection =
	ParseExpr {Comma ParseExpr}
	.

ParseText =
	ANY (. SkipTillEOL(); .)
	.

Literal =
	intcon
	| floatcon
	| string
	| charcon
	| BoxedFalse
	| BoxedTrue
	| BoxedNull
	.

END Cito.
@andr1972
Copy link

Fundamental question: is Ć grammar LL(1) or LL(k) or LL(*) ? Which tools is used to compile second form of this grammar?

@pfusik
Copy link
Collaborator

pfusik commented Mar 13, 2022

is Ć grammar LL(1) or LL(k) or LL(*)

It's not LL(k), as can be seen in these examples:

C[2][3][n * 2] a;   // three-dimensional array storage local variable

a[2][3][i * 2] = 5; // array element assignment

The number of array dimensions isn't limited by the language. Also, the expressions in brackets can be arbitrarily long.
If the brackets are followed by an identifier, it's a definition. Otherwise, it's an expression. I think that makes it LL(*).

CiParser.cs is a recursive-descent LL(1) parser. It handles the above case by outputting a temporary form for the variable definitions: the type is specified as an expression that is translated into proper type in CiResolver.ToType. That means that CiParser doesn't catch all the syntax errors, for example:

a[] = 5; // syntax error

These are checked in CiResolver.cs.

@pfusik pfusik added the enhancement New feature or request label Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants