You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One benefit of having the extra Markdown structure, other than having better split points, is we can provide extra context to a given chunk from the headings that are relevant to a given chunk.
It would be great to have an alternate chunk method, that not only returns the chunk, but also any relevant context. Something like:
with the corresponding header text of the most recent heading at each level.
This would traverse the document until it gets to the offset of a given chunk, keeping a reference to each level it encounters. But if it encounters a level it has already seen, then it will replace it with the new one and also remove any references to lower heading levels.
Todo:
Define how context should be returned. i.e. Should this just be a hashmap with headings?
This should be opt-in, so that if it isn't desired, the extra computation isn't performed.
The text was updated successfully, but these errors were encountered:
We’ll use Nike’s 2023 10-K to illustrate this. Here are the first 10 sections we identified:
Add contextual chunk headers
The purpose of the chunk header is to add context to the chunk text. Rather than using the chunk text by itself when embedding and reranking the chunk, we use the concatenation of the chunk header and the chunk text, as shown in the image above. This helps the ranking models (embeddings and rerankers) retrieve the correct chunks
One benefit of having the extra Markdown structure, other than having better split points, is we can provide extra context to a given chunk from the headings that are relevant to a given chunk.
It would be great to have an alternate chunk method, that not only returns the chunk, but also any relevant context. Something like:
Where
Context
is something like:with the corresponding header text of the most recent heading at each level.
This would traverse the document until it gets to the offset of a given chunk, keeping a reference to each level it encounters. But if it encounters a level it has already seen, then it will replace it with the new one and also remove any references to lower heading levels.
Todo:
The text was updated successfully, but these errors were encountered: