Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define formal grammar for datacard #818

Open
nsmith- opened this issue Feb 9, 2023 · 0 comments
Open

Define formal grammar for datacard #818

nsmith- opened this issue Feb 9, 2023 · 0 comments

Comments

@nsmith-
Copy link
Collaborator

nsmith- commented Feb 9, 2023

I would propose to define a grammar that could then be used to parse datacards into in-memory structures.
I'm somewhat familiar with PEG grammars (see references below) but also there are many others.

Some references:
https://bford.info/pub/lang/peg.pdf
https://peps.python.org/pep-0617/#overview
https://github.com/yhirose/cpp-peglib (this is what is used in correctionlib for the formulas)
https://lark-parser.readthedocs.io/en/latest/

A WIP implementation is shown below (but it could probably be expressed much better)

EXPRESSION  <- CountChecks ShapeSources? Observation Expectation Systematics
CountChecks <- CountCheck{3} SectionSeparator
ShapeSources <- ShapeSource+ SectionSeparator
Observation <- BinList 'observation' (Space+ Float)* EndOfLine SectionSeparator
Expectation <- BinList ProcessList ProcessIndexList RateList SectionSeparator
Systematics <- Systematic* SectionSeparator RateParam*

CountCheck  <- CountType Space (Integer / '*') (!EndOfLine .)* EndOfLine

ShapeSource <- 'shapes' Space+ ProcessMatch Space+ ChannelMatch Space+ FileName Space+ HistName (Space+ HistName)? RestOfLine

BinList <- 'bin' Space+ (ChannelName (Space+ ChannelName)*) RestOfLine
ProcessList <- 'process' (Space+ ProcessName)* RestOfLine
ProcessIndexList <- 'process' (Space+ Integer)* RestOfLine
RateList <- 'rate' (Space+ Float)* RestOfLine

Systematic <- SystName Space+ SystType (Space+ SystEffect)* RestOfLine

RateParam <- SystName Space+ 'rateParam' Space+ ChannelMatch Space+ ProcessMatch Space+ (Param / Formula) RestOfLine

~EndOfLine   <- '\r\n' / '\n' / '\r'
~RestOfLine <- (Space* (EndOfLine / Comment))*
~Space  <- ' ' / '\t'
Comment     <- '#' (!EndOfLine .)* EndOfLine
CountType   <- < [ijk] 'max' >
Integer     <- < '-'? [0-9]+ >
Float       <- < '-'? [0-9]+ ('.' [0-9]*)? >
~SectionSeparator <- ('-'+ RestOfLine)?

ProcessMatch <- < ProcessName / '*' >
ChannelMatch <- < ChannelName / '*' >
ProcessName <- < [a-zA-Z0-9_]+ >
ChannelName <- < [a-zA-Z0-9_]+ >
FileName <- < [^ \t]+ >
HistName <- < [a-zA-Z0-9_/]+ >
SystName <- < [a-zA-Z0-9_]+ >
SystType <- < 'lnN' | 'lnU' | 'trG' >
SystEffect <- < (Float '/' Float) / Float / '-' >
Param <- Float (Space* '[' Space* Float Space* ',' Space* Float Space* ']')?
Formula <- (QuotedString / String) (Space+ Param)*
String <- < (!Space .)* >
QuotedString  <- < '"' (!'"' ('\\"' / .))* '"' >

You can play with this online with https://yhirose.github.io/cpp-peglib/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

1 participant