-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added unstructured io parsers #274
Conversation
…ons, removed older parsers, modified multimodal parser to support images
"additional_config": { | ||
"model_configuration": { | ||
"name": "truefoundry/openai-main/gpt-4-turbo" | ||
}, | ||
"prompt": "Given an image containing one or more charts/graphs, and texts, provide a detailed analysis of the data represented in the charts. Your task is to analyze the image and provide insights based on the data it represents. Specifically, the information should include but not limited to: - Title of the Image: Provide a title from the charts or image if any. - Type of Chart: Determine the type of each chart (e.g., bar chart, line chart, pie chart, scatter plot, etc.) and its key features (e.g., labels, legends, data points). - Data Trends: Describe any notable trends or patterns visible in the data. This may include increasing/decreasing trends, seasonality, outliers, etc. - Key Insights: Extract key insights or observations from the charts. What do the charts reveal about the underlying data? Are there any significant findings that stand out? - Data Points: Identify specific data points or values represented in the charts, especially those that contribute to the overall analysis or insights. - Comparisons: Compare different charts within the same image or compare data points within a single chart. Highlight similarities, differences, or correlations between datasets. - Conclude with a summary of the key findings from your analysis and any recommendations based on those findings." | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let's not stuff these things in additional_config
Can we please refactor
".pdf": {
"class": "MultiModalParser",
"kwargs": {
"...": "..."
}
}
self.session = requests.Session() | ||
self.retry_strategy = Retry( | ||
total=3, | ||
backoff_factor=1, | ||
status_forcelist=[429, 500, 502, 503, 504], | ||
allowed_methods=["POST"], | ||
) | ||
self.adapter = HTTPAdapter(max_retries=self.retry_strategy) | ||
self.session.mount("https://", self.adapter) | ||
self.session.mount("http://", self.adapter) | ||
self.headers = { | ||
"accept": "application/json", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this retrying should be made a common utility across the repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will take this up as a separate PR
except Exception as e: | ||
logger.exception(f"Final Exception: {e}") | ||
return final_texts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too aggressive error handling in my opinion, the caller should decide what they want to do this errors
This is general comment across all parsers that we maintain
Co-authored-by: Chirag Jain <[email protected]>
Co-authored-by: Chirag Jain <[email protected]>
Co-authored-by: Chirag Jain <[email protected]>
Co-authored-by: Chirag Jain <[email protected]>
Co-authored-by: Chirag Jain <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping comments which will be addressed later as unresolved
Merge pull request #274 from truefoundry/ps_unstructured
".txt", ".eml", ".msg", ".xml", ".html", ".md", ".rst", ".json", ".rtf", ".jpeg", ".png", ".doc", ".docx", ".ppt", ".pptx", ".pdf", ".odt", ".epub", ".csv", ".tsv", ".xlsx"
".png", ".jpeg", ".jpg"