-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ✨ OpenAI parser #245
Conversation
8fa158f
to
3aba61f
Compare
looking forward to your feedback! |
circuit_maintenance_parser/parser.py
Outdated
_llm_question = ( | ||
"Can you extract the maintenance_id, the account_id, the impact, the status " | ||
"(e.g., confirmed, cancelled, rescheduled), the summary, the circuit ids (also defined as service or order), " | ||
"and the global start_time and end_time as EPOCH timestamps in JSON format with keys in " | ||
"lowercase underscore format? Reply with only the answer in JSON form and " | ||
"include no other commentary" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved to the OpenAPI parser specifically or at least made overridable? I could see that different LLMs might need variant phrasings of the question to get the desired outcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea is to have a generic one that is used by default, and then, every LLM could overwrite as needed
|
||
@staticmethod | ||
def get_text_hook(raw: bytes) -> str: | ||
"""Can be overwritten by subclasses.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What purpose does this method serve? When would subclasses need/want to override it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it more explicit why it is used. the final parsers could have a different way to decode bytes to string
tests/unit/test_parsers.py
Outdated
parsed_notifications = parser_class().parse(raw_data) | ||
parsed_notifications = parser_class().parse( | ||
raw_data, parser_class._data_types[0] # pylint: disable=protected-access | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why this change? It feels weird to be passing a private attribute of the parser_class
into a method on said class - why doesn't the parser just look this up automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before this PR, the Parsers were only able to manage one data type: HTML, CSV, text, etc. The LLM Parser supports two types: HTML and text, but it requires a specific transform of the data into text (done here).
However, I should change this private access to the public method to get them.
Co-authored-by: Glenn Matthews <[email protected]>
Co-authored-by: Glenn Matthews <[email protected]>
pyproject.toml
Outdated
@@ -32,6 +32,7 @@ geopy = "^2.1.0" | |||
timezonefinder = "^6.0.1" | |||
backoff = "^1.11.1" | |||
chardet = "^5" | |||
openai = "^0.28.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How heavyweight is this dependency? Would it be preferable to make it an optional extra instead of a hard requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look crazy: https://github.com/openai/openai-python/blob/main/pyproject.toml#L10
However, I think that making it optional is an extra safeguard to make people aware of using it
And, I also noticed that there is a stable version!
de559e8
This PR tries to answer #225 , establishing a pattern to use LLM APIs (such as OpenAI) as last resource parsers.
TODO: