Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for other unstructured data like pdf, word etc., #309

Open
gnthaker opened this issue Jun 5, 2024 · 4 comments
Open

Support for other unstructured data like pdf, word etc., #309

gnthaker opened this issue Jun 5, 2024 · 4 comments
Labels

Comments

@gnthaker
Copy link

gnthaker commented Jun 5, 2024

It would be good if we can provide PDF or other unstructured data from which we can generate synthetic data.

@lhawthorn
Copy link
Member

Thank you for filing this issue, @gnthaker! Helps us keep track of it. We discussed in last night's triage meeting that we desire to also create better tooling for PDF --> Markdown conversion and generally make data ingestion a less cumberson process. As we are moving fast and a young project, I am not sure where on our roadmap this will land timing-wise.

Once again, thank you for filing this issue and assuring we don't lose track of this clear need.

@lhawthorn
Copy link
Member

The people in the community who I know who have talked the most about this need are on the Triage team. If you want to talk to them about scoping this work, you can find them in #triage on InstructLab Slack.

@jjasghar
Copy link
Member

Yep, @gnthaker, please reach out; we have some thoughts and suggestions to get something off the ground, but nothing formalized in a pipeline or anything.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.

@github-actions github-actions bot added the stale label Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants