-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MarkdownSplitter: Smarter Table Splitting with Header Preservation #422
Comments
Thanks for reaching out @hburrichter ! Would it be fine for you if it was returned as context, like I would do with headers? #116 This would allow you to choose what you want to do with it, but would require that you have enough buffer in your chunk to add it. Basically your request is that if the chunk starts with a table row that is not the first row, that the first row gets added? |
Yes @benbrandt , that is correct! This would fix any markdown rendering issues and add valuable context information to each chunk. I think for compatibility reasons (merging chunks should return the original text), returning the heading row as context/metadata might be a good solution as you have already pointed out. Another option might be to make this feature opt-in and put it behind a configuration parameter in the |
something similar to this https://youtu.be/s_Vh9HIeLVg?list=PLNUVZZ6hfXX1Y4Is-SbbMF_HutRDJBwiO&t=1691 |
Hello @benbrandt,
First, thank you for your work on the text-splitter library!
I would like to propose a feature enhancement for smarter table splitting that preserves the header in all chunks.
Feature Request:
Use Case:
This feature would be useful for markdown documents with large tables to maintain the readability and formatting consistency in each table chunk.
Example:
Consider a markdown table that is too large to fit in a single section:
If the table needs to be split after the second row, the desired output would be:
Split 1:
Split 2:
Thank you for considering this feature request!
The text was updated successfully, but these errors were encountered: