Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚧 Colour based tagging & non-ocr page_to_text #111

Open
wants to merge 48 commits into
base: main
Choose a base branch
from
Open

Conversation

seanmcguire12
Copy link
Contributor

WIP: implemented colour based tagging & page_to_text_new which doesnt use OCR.

seanmcguire12 and others added 9 commits June 25, 2024 17:24
…r to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page.
…ase tagging from page_to_image() to page_to_text(), added method combine_annotations() to decouple the sorting logic from page_to_text().
✨ Added functionality for taking screenshot of original/raw page prior to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page.
Updated APE-76 with double tagging fix.
@asim-shrestha
Copy link
Contributor

h4q2uwr0z0sVFM0q5AV7n

  • Some headers missing (Contact, login, signup)
  • Large white space gap. (We can maybe ensure there are only ever 10 new lines in a row or something after render)
  • All elements under the same group as [85] Generous Gear Credit is in a completely different/incorrect position

@asim-shrestha
Copy link
Contributor

1JWoJWs3uZMt8Wa5ql6pr

  • Cookie disclaimer rendered at the top instead of the bottm

@asim-shrestha
Copy link
Contributor

4KGjHFZbEpB345rOxuIzv
image

@asim-shrestha
Copy link
Contributor

4KGjHFZbEpB345rOxuIzv
Duplicate text in sections:
image

Also a lot of horizontal space since there is some really large text at the top that doesn't actually exist

@asim-shrestha
Copy link
Contributor

4KGjHFZbEpB345rOxuIzv
Bad text positions
image

@asim-shrestha
Copy link
Contributor

484mHWaGAH0l8tgW95Hvv
Missing header elements
image

@asim-shrestha
Copy link
Contributor

aLnmhAeCwsHCd3dM53rwG
Really wide page

@asim-shrestha
Copy link
Contributor

aLnmhAeCwsHCd3dM53rwG
Missing elements
image

seanmcguire12 and others added 11 commits July 22, 2024 15:41
…n so that it doesn't return elements with bounding boxes that occupy no space (i.e., elements that do not actually appear on the page).
…formatting issues. Fixed issue where elements with tagged children were getting rendered twice & creating collisions.
…the next screenshot.

- Added getNextColors function to generate a diverse list of RGB colors based on the number of elements.
- Added colorDistance function to calculate the Euclidean distance between two colors.
- Added assignColors function to assign colors to elements, ensuring maximum color distance.
- Improvements to colourBasedTagify:
  - Tag and collect elements with bounding boxes > 0 and within the viewport.
  - Apply colors to element borders & set opacity to 1
  - Handle special cases for links
  - Set visibility of non-tagged child elements to hidden.
- Added function for disabling/enabling transitions and animation to be used before taking screenshots
- Added boundingBoxX and boundingBoxY attributes in ColouredElem to track element positions
- Updated `page_to_text` to handle recoloring of undetected elements:
  - Added functions to disable and enable transitions/animations during screenshots.
  - Added recoloring logic to improve element detection.
  - Ensured missing elements are recolored and re-checked for visibility.
- Added new _check_colours_ method to compare colors within a threshold to be used on the first pass.
# Conflicts:
#	poetry.lock
#	tarsier/core.py
#	tarsier/tag_utils.ts
# Conflicts:
#	tarsier-snapshots/snapshots/05W3ZEmj8pbuYSHArYUkz/ocr.txt
#	tarsier-snapshots/snapshots/07wOwFaw3aGekjCBpZkg0/ocr.txt
#	tarsier-snapshots/snapshots/0fdyKSMbc3kVUgL9RGiEk/ocr.txt
#	tarsier-snapshots/snapshots/0orFfEesEVpe1BN7B114a/ocr.txt
#	tarsier-snapshots/snapshots/11u8vZX9JHQsOrXVSWfJd/ocr.txt
#	tarsier-snapshots/snapshots/1JWoJWs3uZMt8Wa5ql6pr/ocr.txt
#	tarsier-snapshots/snapshots/1N0FTiHE53vO1j0nHDNG5/ocr.txt
#	tarsier-snapshots/snapshots/1qkOHewUy0Kqq9RVVSOoQ/ocr.txt
#	tarsier-snapshots/snapshots/1z50b4syZzf7J1kQt2k7W/ocr.txt
#	tarsier-snapshots/snapshots/24SLE3KnDhtOYYgIM4ote/ocr.txt
#	tarsier-snapshots/snapshots/2ErcEyBkupKnoHkQAhJCk/ocr.txt
#	tarsier-snapshots/snapshots/2HauD8zfTdDq75G7WRJzB/ocr.txt
#	tarsier-snapshots/snapshots/2tmwuvYVJ9KqVgHIOgctI/ocr.txt
#	tarsier-snapshots/snapshots/3MAIydQKH2qHnl1cuLmNc/ocr.txt
#	tarsier-snapshots/snapshots/3vDNIHFXtjcnarvJQWdd7/ocr.txt
#	tarsier-snapshots/snapshots/47D2wwbE0WZOV6obQbYA7/ocr.txt
#	tarsier-snapshots/snapshots/484mHWaGAH0l8tgW95Hvv/ocr.txt
#	tarsier-snapshots/snapshots/4Hmgj9cuidpeiVpWdXVBf/ocr.txt
#	tarsier-snapshots/snapshots/4Je6qSd4YFoyLxVZLQRb7/ocr.txt
#	tarsier-snapshots/snapshots/4KGjHFZbEpB345rOxuIzv/ocr.txt
#	tarsier-snapshots/snapshots/Ey2q7uEroarG84e6YZnym/ocr.txt
#	tarsier-snapshots/snapshots/F5AaImEw3SHGkneXd36eH/ocr.txt
#	tarsier-snapshots/snapshots/FIECXMTasC96yBFr7BcN1/ocr.txt
#	tarsier-snapshots/snapshots/FSfE85pVbn96ntVl1qEGp/ocr.txt
#	tarsier-snapshots/snapshots/FeRDLVQyg3Y1l62axB6az/ocr.txt
#	tarsier-snapshots/snapshots/FmpHbDna6mnBNe0hLyTYZ/ocr.txt
#	tarsier-snapshots/snapshots/Fw6hoBmn7nm2KAy4YDzv9/ocr.txt
#	tarsier-snapshots/snapshots/G9Xy74ZxrdukPaChWTWAo/ocr.txt
#	tarsier-snapshots/snapshots/GAeRa1QK7BcoGKelpEOA9/ocr.txt
#	tarsier-snapshots/snapshots/GNekmizdgssA6t94zWOId/ocr.txt
#	tarsier-snapshots/snapshots/GQfYTjppPhTgYtsuFUbXF/ocr.txt
#	tarsier-snapshots/snapshots/GcW0Q862yCbKr28CQTg2c/ocr.txt
#	tarsier-snapshots/snapshots/GuFznteaPUy3yrETrOh4Y/ocr.txt
#	tarsier-snapshots/snapshots/HCrPjvyx0XaLvNHxVBPZt/ocr.txt
#	tarsier-snapshots/snapshots/HixdQTqLbSa6zIaKmxxE1/ocr.txt
#	tarsier-snapshots/snapshots/HleEA9DcP1jBVN5cBEmFT/ocr.txt
#	tarsier-snapshots/snapshots/Hramb0PgtU7wHEj0D5OKj/ocr.txt
#	tarsier-snapshots/snapshots/I8Bj6okah8nrfPEinahWH/ocr.txt
#	tarsier-snapshots/snapshots/IUnyfHVheJUrv8frQYQib/ocr.txt
#	tarsier-snapshots/snapshots/JAiFFb1qWlEVk48Ny32ND/ocr.txt
#	tarsier-snapshots/snapshots/JNOSAEEZO4j2unWHPFBdO/ocr.txt
#	tarsier-snapshots/snapshots/JaVENaBu8Iu7yYoUNrORW/ocr.txt
#	tarsier-snapshots/snapshots/JwUW9qdzk0NgtnK2Y2BSS/ocr.txt
#	tarsier-snapshots/snapshots/Jxv57Kbqw1AP4qv1zvlqg/ocr.txt
#	tarsier-snapshots/snapshots/K88O7OW0FJoCfdVUD4xXH/ocr.txt
#	tarsier-snapshots/snapshots/KGEFdtgwltNKXKHGOkkaF/ocr.txt
#	tarsier-snapshots/snapshots/KNyomEvINtDSbA7cKRr1F/ocr.txt
#	tarsier-snapshots/snapshots/KTcoPSidqLGESp29nQtLv/ocr.txt
#	tarsier-snapshots/snapshots/KuDD2GuMDlbuKO4ozdbDA/ocr.txt
#	tarsier-snapshots/snapshots/KypCMQmDQ2XZ2GIbMKacI/ocr.txt
#	tarsier-snapshots/snapshots/L3uXGoAVL6YpRHGBCnlB8/ocr.txt
#	tarsier-snapshots/snapshots/L6BOPpJEJhhN5JHfWj4g1/ocr.txt
#	tarsier-snapshots/snapshots/LN4K9AZwaPC50Z4e513su/ocr.txt
#	tarsier-snapshots/snapshots/LNMVWWtQRcjkj54ONLebI/ocr.txt
#	tarsier-snapshots/snapshots/LOnORRBp7zDQifntNAcFO/ocr.txt
#	tarsier-snapshots/snapshots/LuM2bHYg5mnBvjhttTDlh/ocr.txt
#	tarsier-snapshots/snapshots/Ly1DY8GL7cV5mWnxr1DH5/ocr.txt
#	tarsier-snapshots/snapshots/MIZDQx8G6Gn562lO5hFQb/ocr.txt
#	tarsier-snapshots/snapshots/MP4p6ibb3PLD3i8AmBrZ3/ocr.txt
#	tarsier-snapshots/snapshots/MQOPSYI3SU7EEQRbHUMHr/ocr.txt
#	tarsier-snapshots/snapshots/MQrMR8W7oJtlUc056qZ6L/ocr.txt
#	tarsier-snapshots/snapshots/MRD347sMiS2vlw091LAqK/ocr.txt
#	tarsier-snapshots/snapshots/NHWkSmdwXKQb9oe9vVGZf/ocr.txt
#	tarsier-snapshots/snapshots/NJouZuI4JTRsMz3KYK1cV/ocr.txt
#	tarsier-snapshots/snapshots/NLtUSUexaGqmRUBomWj9R/ocr.txt
#	tarsier-snapshots/snapshots/NSVMR9p35Pku7LUyPCMHY/ocr.txt
#	tarsier-snapshots/snapshots/NUkrUYwOJuYfv5SC3GHTE/ocr.txt
#	tarsier-snapshots/snapshots/NV6JL1wEHaTPuK65dKt6t/ocr.txt
#	tarsier-snapshots/snapshots/NZoqFzLNm1OJsS96Pyxbi/ocr.txt
#	tarsier-snapshots/snapshots/O3kSfBi6P0CQBJTCmjV7B/ocr.txt
#	tarsier-snapshots/snapshots/O3t7Of3CTP2WUj71YddFO/ocr.txt
#	tarsier-snapshots/snapshots/OWLWiq0ePIJmx5VmtquOD/ocr.txt
#	tarsier-snapshots/snapshots/Ofe0weKbJ9yl5vEwkalCS/ocr.txt
#	tarsier-snapshots/snapshots/OlYrsJi04Czdu7Uvl1mIF/ocr.txt
#	tarsier-snapshots/snapshots/OmJeRJARVmguS9uMWU1Xb/ocr.txt
#	tarsier-snapshots/snapshots/P7dY0WRzR4PCfWZNuSeBf/ocr.txt
#	tarsier-snapshots/snapshots/PiQlpch5uQzWNXiEEvjX3/ocr.txt
#	tarsier-snapshots/snapshots/PthtpZsDczvCOCFYIogKI/ocr.txt
#	tarsier-snapshots/snapshots/PzN7n57ArAxzcJHzx63NY/ocr.txt
#	tarsier-snapshots/snapshots/QIOkg628A7yzKluLVB8of/ocr.txt
#	tarsier-snapshots/snapshots/QJ1O4XyX7e3CpAPQ3Bonw/ocr.txt
#	tarsier-snapshots/snapshots/QOZFTfvesXGZxgsmHqnrL/ocr.txt
#	tarsier-snapshots/snapshots/QWwSzGV7QMgJprOY5cOpP/ocr.txt
#	tarsier-snapshots/snapshots/Ql2B37FdNugeJ09WjopGa/ocr.txt
#	tarsier-snapshots/snapshots/QlAWMyjvSxPHh4E5Fkjfs/ocr.txt
#	tarsier-snapshots/snapshots/QuUpyX6Z5U2HUUQWJV3S4/ocr.txt
#	tarsier-snapshots/snapshots/QwiRD9fjb4YuRaY3Ypz3f/ocr.txt
#	tarsier-snapshots/snapshots/QxSSau0T34NCk6O1bq4Cd/ocr.txt
#	tarsier-snapshots/snapshots/R99SMT2jvCjJRqRGra2g6/ocr.txt
#	tarsier-snapshots/snapshots/RIqXLn8bSaFN0AG4DdoHO/ocr.txt
#	tarsier-snapshots/snapshots/RVotqLcMUyKXULUTqYCvm/ocr.txt
#	tarsier-snapshots/snapshots/RpjyEXqtmEQDFWgojBJMU/ocr.txt
#	tarsier-snapshots/snapshots/S8AKJlRl5F8Vci1UiLU1a/ocr.txt
#	tarsier-snapshots/snapshots/SEyENcYHqerkt0nmJZjl7/ocr.txt
#	tarsier-snapshots/snapshots/STPTr6OhlruneOtA24xi9/ocr.txt
#	tarsier-snapshots/snapshots/SjzTipa4JUYx4Ocn5VkCV/ocr.txt
#	tarsier-snapshots/snapshots/SlMfqkoK2KeAp31dHr88F/ocr.txt
#	tarsier-snapshots/snapshots/Sqb7SeHvAcouDW5rFl9yu/ocr.txt
#	tarsier-snapshots/snapshots/Std6TTbgilRTiLDGJOezx/ocr.txt
#	tarsier-snapshots/snapshots/T1pTeE6hYcFsaZ84no4GM/ocr.txt
#	tarsier-snapshots/snapshots/TG8dn0Xi3SJC0VHjWRH1P/ocr.txt
#	tarsier-snapshots/snapshots/TKUFwwdmB0ioMyUXvozpu/ocr.txt
#	tarsier-snapshots/snapshots/TLxVvFZ6MRB0nbSBWl8ym/ocr.txt
#	tarsier-snapshots/snapshots/TQyvtLuRcbSStSHq1seCq/ocr.txt
#	tarsier-snapshots/snapshots/U5wOXA13nV6xyogmib6uL/ocr.txt
#	tarsier-snapshots/snapshots/UEQ5bJeIeTst0YVL8ga9Z/ocr.txt
#	tarsier-snapshots/snapshots/UPCNbyQNGulQpM6v6sxUo/ocr.txt
#	tarsier-snapshots/snapshots/UjsF3B4ihFcZjXEcZCnm1/ocr.txt
#	tarsier-snapshots/snapshots/VPIrl5m9IfNLKS03UyzNH/ocr.txt
#	tarsier-snapshots/snapshots/Vba6zNQZmxgxA8byjpmaA/ocr.txt
#	tarsier-snapshots/snapshots/Vo8MreF9aVq5bE45XqaMz/ocr.txt
#	tarsier-snapshots/snapshots/VogIUZw1FJlCEiBzTUwYR/ocr.txt
#	tarsier-snapshots/snapshots/VqSaCh7ffPXKh1IymN8Oo/ocr.txt
#	tarsier-snapshots/snapshots/W8QTUDItaXJSOaBOZGAE8/ocr.txt
#	tarsier-snapshots/snapshots/WDGGGgqdb1RGaoGlseBJk/ocr.txt
#	tarsier-snapshots/snapshots/WEVQJfQEWky3KR7Hc2kuK/ocr.txt
#	tarsier-snapshots/snapshots/WyQg7esKNNds3EYMZCx2J/ocr.txt
#	tarsier-snapshots/snapshots/XSzc3ewTsGRYwwdHvb6LK/ocr.txt
#	tarsier-snapshots/snapshots/Xixe0WiedsLB1KFcKpv2r/ocr.txt
#	tarsier-snapshots/snapshots/Xnuxii49OIfjWntcihbjX/ocr.txt
#	tarsier-snapshots/snapshots/XsNkGYeq1DTAnyKuuvHPZ/ocr.txt
#	tarsier-snapshots/snapshots/Xu7Q49cgzMsp4cgMR0qqS/ocr.txt
#	tarsier-snapshots/snapshots/XxXTjDH2qRuu4n5BSLM5d/ocr.txt
#	tarsier-snapshots/snapshots/Yb4ug21SFYfiN4ENjJCcz/ocr.txt
#	tarsier-snapshots/snapshots/YuBInhOP8OdQAfy4Htvre/ocr.txt
#	tarsier-snapshots/snapshots/ZW0ihimOJEReeseRBrI5i/ocr.txt
#	tarsier-snapshots/snapshots/ZYBqV9WrmYmyFExthpKLD/ocr.txt
#	tarsier-snapshots/snapshots/a0pJxHhxIHFKcoFjkORnG/ocr.txt
#	tarsier-snapshots/snapshots/aLnmhAeCwsHCd3dM53rwG/ocr.txt
#	tarsier-snapshots/snapshots/aQZGYIDkaa6JY6aXv6wXQ/ocr.txt
#	tarsier-snapshots/snapshots/aa3t8r3kAlp9FYx2uSOFz/ocr.txt
#	tarsier-snapshots/snapshots/abgIXICPIttq3MhkmSVdV/ocr.txt
#	tarsier-snapshots/snapshots/ahEBAfuWtiZ8HM77W2d2D/ocr.txt
#	tarsier-snapshots/snapshots/aivDVkwH92hQdu5cDr4nv/ocr.txt
#	tarsier-snapshots/snapshots/apscD5vWHBV1dvAX6K7Vt/ocr.txt
#	tarsier-snapshots/snapshots/awL4PUmAj9TIIqR6L95fq/ocr.txt
#	tarsier-snapshots/snapshots/bOVNaNsrc6UrCdlhHLxGy/ocr.txt
#	tarsier-snapshots/snapshots/bOlARasPXtWAjEDfxtk2L/ocr.txt
#	tarsier-snapshots/snapshots/bZPREHVg723XRC2I6z9MQ/ocr.txt
#	tarsier-snapshots/snapshots/bwwko5J7aFk5K8qz61jBI/ocr.txt
#	tarsier-snapshots/snapshots/c3s1dYwKWMEJHKGyP3qnr/ocr.txt
#	tarsier-snapshots/snapshots/cAeniCN923UcmnXuOOIBJ/ocr.txt
#	tarsier-snapshots/snapshots/cFcnDQSGQgDeyHBnZtrU8/ocr.txt
#	tarsier-snapshots/snapshots/cMPCNSczVAPhdXJxBIBEd/ocr.txt
#	tarsier-snapshots/snapshots/cdFPVICHIa5evhnj1OiMx/ocr.txt
#	tarsier-snapshots/snapshots/cohMcyz81B0NHA04Qeik2/ocr.txt
#	tarsier-snapshots/snapshots/ct6PuXzujbOlM9zaARUpa/ocr.txt
#	tarsier-snapshots/snapshots/cv3sq0A9o3VHmD1UvEWse/ocr.txt
#	tarsier-snapshots/snapshots/e7iDpCvvfiq3oU1UAvxTC/ocr.txt
#	tarsier-snapshots/snapshots/eE46U0AMRoczeDL2eOcgf/ocr.txt
#	tarsier-snapshots/snapshots/eKKvQ3OZG6H0jjTIRINPs/ocr.txt
#	tarsier-snapshots/snapshots/eSG6HgfI2R9JpZQRozsSV/ocr.txt
#	tarsier-snapshots/snapshots/ecqQm32DLMtTUWt2AQxhm/ocr.txt
#	tarsier-snapshots/snapshots/f41Dz5iiwe5QjVbXqWpJJ/ocr.txt
#	tarsier-snapshots/snapshots/fJPQwUD42zT2WKhdBJLnN/ocr.txt
#	tarsier-snapshots/snapshots/fJWonTvHgvl7Ex9DdB1Px/ocr.txt
#	tarsier-snapshots/snapshots/gHXZyrqL7qpmKMFYM6oGE/ocr.txt
#	tarsier-snapshots/snapshots/gKfAQGripVAFa87dehr5m/ocr.txt
#	tarsier-snapshots/snapshots/gd2iNA5INcT66penKY175/ocr.txt
#	tarsier-snapshots/snapshots/gdtUqXUos3CdM6zVlMbbC/ocr.txt
#	tarsier-snapshots/snapshots/gg5AAaFekWGXPdKtYBoer/ocr.txt
#	tarsier-snapshots/snapshots/ggdDF9CwmrmiBHsQvZcDk/ocr.txt
#	tarsier-snapshots/snapshots/h4q2uwr0z0sVFM0q5AV7n/ocr.txt
#	tarsier-snapshots/snapshots/ijJbuKPqEOkA4OK0BzLPk/ocr.txt
#	tarsier-snapshots/snapshots/jCYLQBT1114BBW83zKQdt/ocr.txt
#	tarsier-snapshots/snapshots/jH56yUizuVbTYWAIwSJkM/ocr.txt
#	tarsier-snapshots/snapshots/k1I07SwT7Clry1xxPODfa/ocr.txt
#	tarsier-snapshots/snapshots/kZVEvHT3kuBfZtNUY8rC2/ocr.txt
#	tarsier-snapshots/snapshots/kbd8qO9tx1Efbf08MqZWQ/ocr.txt
#	tarsier-snapshots/snapshots/ke6newcCWvPhsxeZ5TCZ4/ocr.txt
#	tarsier-snapshots/snapshots/kfueRbnkKCdJwC0BRiggp/ocr.txt
#	tarsier-snapshots/snapshots/kvcH8Q2BG1SPgWSAN3f2h/ocr.txt
#	tarsier-snapshots/snapshots/kx3CBXYC9YUyRIFIMYTcD/ocr.txt
#	tarsier-snapshots/snapshots/l3mMTs6gZa1GvpGjknIFT/ocr.txt
#	tarsier-snapshots/snapshots/l8QvEOlveFkWUVYu1HNgD/ocr.txt
#	tarsier-snapshots/snapshots/lBTRjkiZqEdNvCSjTmoWG/ocr.txt
#	tarsier-snapshots/snapshots/lHjLewJTfQKFSAmGE5Wr1/ocr.txt
#	tarsier-snapshots/snapshots/lSwsaU5jAVRddpYTCsWEd/ocr.txt
#	tarsier-snapshots/snapshots/n1VHZA0AkvnKB3Qy2hqvB/ocr.txt
#	tarsier-snapshots/snapshots/n1zh09obI7c51LUTBNNBE/ocr.txt
#	tarsier-snapshots/snapshots/n28tTMFEZfIyMXsCxO6Ra/ocr.txt
#	tarsier-snapshots/snapshots/n7LTn5tVJ2B3IvDopFTFO/ocr.txt
#	tarsier-snapshots/snapshots/nAXVoJDSuul938vtPvfFB/ocr.txt
#	tarsier-snapshots/snapshots/nXWHr3UoycfzFqubWTUpn/ocr.txt
#	tarsier-snapshots/snapshots/njhgFq4h4BcMTdaRxtElY/ocr.txt
#	tarsier-snapshots/snapshots/nxkcxrThdmaRX01YRXtho/ocr.txt
#	tarsier-snapshots/snapshots/o28cv918RSdVcg2P55tGq/ocr.txt
#	tarsier-snapshots/snapshots/oBJMkbpRqNM02wNlOTP3N/ocr.txt
#	tarsier-snapshots/snapshots/oEAjw9fv6UXmS63CIzZlU/ocr.txt
#	tarsier-snapshots/snapshots/oaDAf9SeUsVwpDeKajNrs/ocr.txt
#	tarsier-snapshots/snapshots/ogRf0dLwJKiDJUQnzz4pn/ocr.txt
#	tarsier-snapshots/snapshots/pAObMNn95uFVSll7pCXpg/ocr.txt
#	tarsier-snapshots/snapshots/pNsTF6muOdSesbhNTFI9g/ocr.txt
#	tarsier-snapshots/snapshots/pXL6ojrOhW79o92e8IXw0/ocr.txt
#	tarsier-snapshots/snapshots/pk7eEZ2sweN4YzzFVK217/ocr.txt
#	tarsier-snapshots/snapshots/prf1dSczRpaoWLrEMseB1/ocr.txt
#	tarsier-snapshots/snapshots/q3jMY8P01UJCw3ggDs1OJ/ocr.txt
#	tarsier-snapshots/snapshots/q72iVxzE9cGatHU1cLKJX/ocr.txt
#	tarsier-snapshots/snapshots/qgEjcl77WINh8ltNc9NoC/ocr.txt
#	tarsier-snapshots/snapshots/qrWALKWSykHxTLuVy0Rl7/ocr.txt
#	tarsier-snapshots/snapshots/qtRibcsG6iq09TyGQoYhv/ocr.txt
#	tarsier-snapshots/snapshots/qyZjOcbaiHuVq4FpOB26b/ocr.txt
#	tarsier-snapshots/snapshots/rFp4CQs5ZxAebcIM0d62U/ocr.txt
#	tarsier-snapshots/snapshots/rGFdlkuftF7L1VlFL7LbS/ocr.txt
#	tarsier-snapshots/snapshots/rKCkTGVbx4Mpi0BAnKCRd/ocr.txt
#	tarsier-snapshots/snapshots/rZQpVHDs30D7WbTFIiXCr/ocr.txt
#	tarsier-snapshots/snapshots/ranUaEMdxbjMltYPt2AX7/ocr.txt
#	tarsier-snapshots/snapshots/rgCTp6HulNEsEqEupEUZN/ocr.txt
#	tarsier-snapshots/snapshots/rmMxc6dEoyE1WpLLWqTHV/ocr.txt
#	tarsier-snapshots/snapshots/t8biLN0RgFBPYO2hv2JYJ/ocr.txt
#	tarsier-snapshots/snapshots/tIowzAEvZcWH9ukP4Aofa/ocr.txt
#	tarsier-snapshots/snapshots/tV4VsHCiYAA3o6oKYyXVk/ocr.txt
#	tarsier-snapshots/snapshots/tVBOUnrTSDIHQbsMw2WgS/ocr.txt
#	tarsier-snapshots/snapshots/tbRxihP0jtq5O12zVhvEF/ocr.txt
#	tarsier-snapshots/snapshots/token_statistics.txt
#	tarsier-snapshots/snapshots/u2IEvb9Ke4lKLaD4LtJYE/ocr.txt
#	tarsier-snapshots/snapshots/u3fjwZRjKUEcvr8kkmy5v/ocr.txt
#	tarsier-snapshots/snapshots/u7I1P6OC5xX8f3u8Fwjvf/ocr.txt
#	tarsier-snapshots/snapshots/uOmbtFqUSqItS8CKmyi51/ocr.txt
#	tarsier-snapshots/snapshots/uPrnCohCwLCrVvwN8eXWZ/ocr.txt
#	tarsier-snapshots/snapshots/uibGV6FB4gcYvY93AIWJe/ocr.txt
#	tarsier-snapshots/snapshots/v7hgryy94evdLb0aHzDtY/ocr.txt
#	tarsier-snapshots/snapshots/vELUj6wGf96coJAqt0x5D/ocr.txt
#	tarsier-snapshots/snapshots/vVJc0PFcYOzKHHL1v1hev/ocr.txt
#	tarsier-snapshots/snapshots/vgTQTZN0Efl4vXQ0I9Iy8/ocr.txt
#	tarsier-snapshots/snapshots/wUHnayH90bjRjjjdCT0r2/ocr.txt
#	tarsier-snapshots/snapshots/wXhQ0YobLZ4z1BAZesBUF/ocr.txt
#	tarsier-snapshots/snapshots/wjmMahVNX7T1jH9GmVW9r/ocr.txt
#	tarsier-snapshots/snapshots/wqGtmRYz4PWe4LCxAW4UI/ocr.txt
#	tarsier-snapshots/snapshots/x9tCDlr2WOazDKVrF3njD/ocr.txt
#	tarsier-snapshots/snapshots/xCHAOXtOYz47HfNY9LeZq/ocr.txt
#	tarsier-snapshots/snapshots/xZCsA0eNaR7OMmhcBlsOv/ocr.txt
#	tarsier-snapshots/snapshots/xgnNjPdOMUY0LZ1GJdEsE/ocr.txt
#	tarsier-snapshots/snapshots/xh7zxFmYI3du3PWBnEjQ4/ocr.txt
#	tarsier-snapshots/snapshots/xkEtVvkl3HDnC827Flk3g/ocr.txt
#	tarsier-snapshots/snapshots/xkINPY1INO91Jv5ZokNGu/ocr.txt
#	tarsier-snapshots/snapshots/yXLMF4nocYqJnql2dt71R/ocr.txt
#	tarsier-snapshots/snapshots/yoqTH08pW464eBIPYwd5r/ocr.txt
#	tarsier-snapshots/snapshots/yzwuXotaBr52CyG4mUDhy/ocr.txt
#	tarsier-snapshots/snapshots/zKVOGYYHXR3uskE0WcG1A/ocr.txt
#	tarsier-snapshots/snapshots/zPfbTSTbZ3sOGYDiqwyj0/ocr.txt
#	tarsier-snapshots/snapshots/zRdqy27hn5RdNqJqnjzaA/ocr.txt
#	tarsier-snapshots/tarsier_snapshots/snapshots.py
@seanmcguire12 seanmcguire12 marked this pull request as ready for review August 19, 2024 16:36
pyproject.toml Outdated
@@ -1,6 +1,6 @@
[tool.poetry]
name = "tarsier"
version = "0.6.3"
version = "0.6.39"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 9?

Comment on lines +9 to +12
cd ./tarsier-snapshots || exit 1
poetry install
poetry run bananalyze --download
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really relevant for settuing up tarsier. Would delete

@asim-shrestha asim-shrestha changed the base branch from API-33 to main August 23, 2024 01:34
asim-shrestha and others added 25 commits August 22, 2024 18:53
# Conflicts:
#	poetry.lock
#	tarsier-snapshots/tarsier_snapshots/snapshots.py
…eturn value to match that of original page_to_text
# Conflicts:
#	poetry.lock
#	tarsier/core.py
#	tarsier/tag_utils.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants