Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed Refactor to make the formats easier to comprehend, more flexible #4

Open
zostay opened this issue Oct 12, 2012 · 0 comments
Open

Comments

@zostay
Copy link
Contributor

zostay commented Oct 12, 2012

Right now the parser uses a relatively ad hoc parsing mechanism with a bunch of regular expressions that have been cobbled together and built up over time. While this works pretty well, I've been reading through the code again and I can see that the documentation I added for the various formats is plainly wrong in several cases. However, it's not obvious from the code that it was wrong until you really, really, really dig in. That really shouldn't be necessary.

The other problem is that this thing is really anti-extensible. The only way to extend it is to wrap the parser in another method which either preprocesses the string, catches additional cases prior to trying range parser, and/or tries additional cases after calling range parser. Adding additional languages or adjusting certain notions of relative date by region or context is not very practical.

I propose we replace Date::RangeParser::EN with Date::RangeParser. I propose that that module be made into an interpreter which works with range parser implementations, such as Date::RangeParser::EN, which can then be further customized to parse dates in whatever manner the developer wants.

This interpreter will do the following:

  1. Compile a program from the range parser implementation, e.g. Date::RangeParser::EN. The implementation will be a combination of Perl code for advanced parsing and a YAML configuration (probably in DATA) for simple rule matching (similar to regular expressions).
  2. Take the input string and divide up into tokens.
  3. Run the interpreter against the tokens to try and find a match.
  4. Once a match is found, the match will be run through some predefined operations to compile a start and end date. Example operations include: "now", "-3 days", "start of day", "end of month", "weekday tuesday", etc.
  5. Return the computed dates to the caller.

I have come up with the following configuration that still needs to be tweaked, but I think it's self-documenting. In fact, I think RangeParser itself or a release script could be made to generate the format documentation for each implementation automatically. Paired with Pod::Weaver and Dist::Zilla, that could greatly simplify the documentation process for this module.

Anyway, here's the sample configuration. It is just a draft and it is not complete. Time does not allow me to consider this further today.


---
words2nums_package: Lingua::EN::Words2Nums

skip_words:
    - the
    - an
    - a
    - of

tokens:
    DATE: "^"
    CARDINAL: "#th"
    CARDINAL: "#"
    CURRENT: [ this, current ]
    WEEKDAYS:
        - value: 0
        synonyms: [ sunday, sundays, sun, sun. ]
        - value: 1
        synonyms: [ monday, mondays, mon, mon. ]
        - value: 2
        synonyms: [ tuesday, tuesdays, tues, tues., tue, tue. ]
        - value: 3
        synonyms: [ wednesday, wednesdays, wed, wed. ]
        - value: 4
        synonyms: [ thursday, thursdays, thurs, thurs., thur, thur., thu, thu. ]
        - value: 5
        synonyms: [ friday, fridays, fri, fri. ]
        - value: 6
        synonyms: [ saturday, saturdays, sat, sat. ]
    PAST: [ last, past ]
    PREVOIUS: [ last, past, previous ]
    HOURS: [ hour, hours ]
    DAYS: [ day, days ]
    WEEKS: [ week, weeks ]
    MONTHS: [ month, months ]
    QUARTERS: [ quarter, quarters ]
    YEARS: [ year, years ]

rules:
    - match: 
        - today
        - CURRENT day
    beginning:
        - start of day
    end:
        - end of day

    - match: CURRENT week
    beginning:
        - start of week
    end:
        - end of week

    - match: CURRENT month
    beginning:
        - start of month
    end:
        - end of month

    - match: CURRENT quarter
    beginning:
        - start of quater
    end:
        - end of quarter

    - match: CURRENT year
    beginning:
        - start of year
    end:
        - end of year

    - match: 
        - CURRENT WEEKDAYS
        - WEEKDAYS
    beginning:
        - weekday $WEEKDAYS
        - start of day
    end:
        - weekday $WEEKDAYS
        - end of day

    - match: 
        - PAST CARDINAL HOURS
        - CARDINAL HOURSS ago
    beginning:
        - -$CARDINAL hours
        - start of hour
    end:
        - now

    - match: PAST CARDINAL DAYS
    beginning:
        - -$CARDINAL days
        - start of day
    end:
        - end of day

    - match: PAST CARDINAL WEEKS
    beginning:
        - -$CARDINAL weeks
        - start of week
    end:
        - end of day

    - match: PAST CARDINAL MONTHS
    beginning:
        - -$CARDINAL months
        - start of month
    end:
        - end of month

    - match: PAST CARDINAL YEARS
    beginning:
        - -$CARDINAL years
        - start of year
    end:
        - end of year

    - match: PAST CARDINAL QUARTERS
    beginning:
        - -$CARDINAL quarters
        - start of quarter
    end:
        - end of quarter

    - match: CARDINAL MONTHS ago
    beginning:
        - -$CARDINAL months
        - start of month
    end:
        - beginning
        - end of month

    - match: CARDINAL DAYS ago
    beginning:
        - -$CARDINAL days
        - start of day
    end:
        - beginning
        - end of day

    - match: CARDINAL WEEKS ago
    beginning:
        - -$CARDINAL weeks
        - start of week
    end:
        - beginning
        - end of week

    - match: CARDINAL QUARTERS ago
    beginning:
        - -$CARDINAL quarters
        - start of quarter
    end:
        - beginning
        - end of week

    - match:
        - CARDINAL WEEKDAYS ago
    beginning:
        - -$CARDINAL weeks
        - weekday $WEEKDAYS
        - start of day
    end:
        - beginning
        - end of day

    - match: yesterday
    beginning:
        - -1 days
        - start of day
    end:
        - beginning
        - end of day

    - match: PREVIOUS week
    beginning:
        - -1 weeks
        - start of week
    end:
        - beginning
        - end of week

    - match: PREVIOUS month
    beginning:
        - -1 months
        - start of month
    end:
        - beginning
        - end of month

    - match: PREVIOUS quarter
    beginning:
        - -1 quarters
        - start of quarter
    end:
        - beginning
        - end of quarter

    - match: PREVIOUS year
    beginning:
        - -1 years
        - start of year
    end:
        - beginning
        - end of year

    - match: 
        - PREVIOUS WEEKDAYS
    beginning:
        - -1 weeks
        - weekday $WEEKDAYS
        - start of day
    end:
        - beginning
        - end of day

    - match:
        - this past WEEKDAYS
        - past WEEKDAYS
    beginning:
        - weekday $WEEKDAYS
        - if future: -1 weeks
        - start of day
    end:
        - beginning
        - end of day

    - match:
        - this coming WEEKDAYS
        - coming WEEKDAYS
        - this WEEKDAYS
    beginning:
        - weekday $WEEKDAYS
        - if past: +1 weeks
        - start of day
    end:
        - beginning
        - end of day

    - match:
        - next hour
        - hour from now
        - hour hence
    beginning:
        - now
    end:
        - +1 hours
        - end of hour

    - match:
        - next CARDINAL HOURS
        - CARDINAL HOURS from now
        - CARDINAL HOURS hence
    beginning:
        - now
    end:
        - +$CARDINAL hours
        - end of hour

    - match:
        - next CARDINAL DAYS
        - CARDINAL DAYS from now
        - CARDINAL DAYS hence
    beginning:
        - +1 days
        - start of day
    end:
        - +$CARDINAL days
        - -1 seconds

    - match:
        - tomorrow
        - next day
    beginning:
        - +1 days
        - start of day
    end:
        - beginning
        - end of day

    - match:
        - next CARDINAL WEEKS
        - CARDINAL WEEKS from now
        - CARDINAL WEEKS hence
    beginning:
        - +1 weeks
        - start of week
    end:
        - +$CARDINAL weeks
        - end of week

    - match: next week
    beginning:
        - +1 weeks
        - start of week
    end:
        - +1 weeks
        - ened of week

    - match:
        - next CARDINAL MONTHS
        - CARDINAL MONTHS from now
        - CARDINAL MONTHS hence
    beginning:
        - +1 months
        - start of month
    end:
        - +$CARDINAL months
        - end of month

    - match:
        - next month
    beginning:
        - +1 months
        - start of month
    end:
        - +1 months
        - end of month

    - match:
        - next CARDINAL QUARTERS
        - CARDINAL QUARTERS from now
        - CARDINAL QUARTERS hence
    beginning:
        - +1 quarters
        - start of quarter
    end:
        - +$CARDINAL quarters
        - end of quarter

    - match:
        - next quarter
    beginning:
        - +1 quarters
        - start of quarter
    end:
        - +1 quarters
        - end of quarter

    - match:
        - next CARDINAL YEARS
        - CARDINAL YEARS from now
        - CARDINAL YEARS hence
    beginning:
        - +1 years
        - start of years
    end:
        - +$CARDINAL years
        - end of year

    - match:
        - next year
    beginning:
        - +1 years
        - start of year
    end:
        - +1 years
        - end of year

    - match:
        - next CARDINAL WEEKDAYS
        - CARDINAL WEEKDAYS from now
        - CARDINAL WEEKDAYS hence
    beginning:
        - +$CARDINAL weeks
        - weekday $WEEKDAYS
        - start of day
    end:
        - beginning
        - end of day

    - match: next WEEKDAYS
    beginning:
        - +1 weeks
        - weekday $WEEKDAYS
        - start of day
    end:
        - beginning
        - end of day

    - match: 
        - ORDINAL
        - ORDINAL this month
    beginning:
        - day of month $ORDINAL
        - start of day
    end:
        - beginning
        - end of day

    - match: ORDINAL last month
    beginning:
        - -1 months
        - day of month $ORDINAL
        - start of day
    end:
        - beginning
        - end of day

    - match: ORDINAL next month
    beginning:
        - +1 months
        - day of month $ORDINAL
        - start of day
    end:
        - beginning
        - end of day

    - match:
        - end month
        - end this month
    beginning:
        - day of month end
        - start of day
    end:
        - beginning
        - end of day

    - match: end last month
    beginning:
        - -1 months
        - day of month end
        - start of day
    end:
        - beginning
        - end of day

    - match: end next month
    beginning:
        - +1 months
        - day of month end
        - start of day
    end:
        - beginning
        - end of day

# ... more to come ...
afresh1 pushed a commit to GrantStreetGroup/Date-RangeParser-EN that referenced this issue Jun 12, 2024
Merge in GITHUB/date-rangeparser-en from SM-4899 to master

* commit 'ceac09134238bf4b61f134f31c9d1f89e075f2b9':
  SM-4899 Support weekday intervals
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant