Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect document language across all partitioners <- Ingest test fixtures update #1652

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@
},
"filename": "ideas-page.html",
"filetype": "text/html",
"languages": [
"eng"
],
"page_number": 1,
"text_as_html": "<table><br><tbody><br><tr><td>January 2023</td><td>(</td><td>Someone</td><td>fed my essays into GPT to make something that could answer<br>questions based on them, then asked it where good ideas come from. The<br>answer was ok, but not what I would have said. This is what I would have said.)</td><td>The way to get new ideas is to notice anomalies: what seems strange,<br>or missing, or broken? You can see anomalies in everyday life (much<br>of standup comedy is based on this), but the best place to look for<br>them is at the frontiers of knowledge.</td><td>Knowledge grows fractally.<br>From a distance its edges look smooth, but when you learn enough<br>to get close to one, you&#x27;ll notice it&#x27;s full of gaps. These gaps<br>will seem obvious; it will seem inexplicable that no one has tried<br>x or wondered about y. In the best case, exploring such gaps yields<br>whole new fractal buds.</td></tr><br></tbody><br></table>"
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"filename": "stanley-cups.xlsx",
"filetype": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"languages": [
"tur"
"eng"
],
"page_number": 1,
"page_name": "Stanley Cups",
Expand All @@ -41,7 +41,7 @@
"filename": "stanley-cups.xlsx",
"filetype": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"languages": [
"tur"
"eng"
],
"page_number": 1,
"page_name": "Stanley Cups",
Expand All @@ -66,7 +66,7 @@
"filename": "stanley-cups.xlsx",
"filetype": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"languages": [
"tur"
"eng"
],
"page_number": 2,
"page_name": "Stanley Cups Since 67",
Expand All @@ -91,7 +91,7 @@
"filename": "stanley-cups.xlsx",
"filetype": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"languages": [
"tur"
"eng"
],
"page_number": 2,
"page_name": "Stanley Cups Since 67",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@
},
"filename": "Home.html",
"filetype": "text/html",
"languages": [
"eng"
],
"page_number": 1
},
"text": "Documents"
Expand All @@ -35,6 +38,9 @@
},
"filename": "Home.html",
"filetype": "text/html",
"languages": [
"eng"
],
"page_number": 1
},
"text": "Events"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@
},
"filename": "This-is-a-title.html",
"filetype": "text/html",
"languages": [
Coniferish marked this conversation as resolved.
Show resolved Hide resolved
"cat",
"fra"
],
"page_number": 1
},
"text": "This is a plain text site page for testing purposes"
Expand All @@ -35,6 +39,10 @@
},
"filename": "This-is-a-title.html",
"filetype": "text/html",
"languages": [
"cat",
"fra"
],
"page_number": 1
},
"text": "These are bullet points meant for testing"
Expand All @@ -55,6 +63,10 @@
},
"filename": "This-is-a-title.html",
"filetype": "text/html",
"languages": [
"cat",
"fra"
],
"page_number": 1
},
"text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam ex tellus, sodales non nulla et, sodales consequat turpis. Etiam vestibulum nisl placerat risus elementum, a sodales purus rhoncus. Sed eget velit pharetra, pretium nisi nec, laoreet ligula. Duis luctus mi in ligula cursus, vel lacinia tortor ultricies. Aenean sit amet sodales odio, a maximus elit. Pellentesque vehicula diam sit amet leo placerat placerat. Integer varius elementum accumsan. Donec posuere elit mauris, eget efficitur nisl viverra vitae."
Expand All @@ -75,6 +87,10 @@
},
"filename": "This-is-a-title.html",
"filetype": "text/html",
"languages": [
"cat",
"fra"
],
"page_number": 1
},
"text": "Integer at dictum nisi. Cras venenatis non velit in posuere. Curabitur tristique, eros eget tristique pellentesque, neque metus ullamcorper ligula, nec posuere neque lacus nec felis. Nulla a libero eget eros consectetur hendrerit. Pellentesque interdum, diam eget tristique pretium, quam lorem pulvinar lorem, a eleifend nisl lectus at ex. Praesent pulvinar ex ut consequat condimentum. Sed rutrum, erat a hendrerit blandit, urna mauris posuere est, at porttitor risus diam non leo. Nullam rutrum vehicula dolor, quis venenatis ligula rutrum sit amet. Nam massa justo, fermentum in dui lacinia, tincidunt imperdiet nunc. Nam posuere tortor ac lectus elementum, non mollis urna consequat. In interdum non tellus sed pellentesque."
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading