You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed an unusual Grobid ERROR in HAL for just one file.
07/11/2023 15:59:44 INFO /home/issa/ISSA-2/data/hal/pdf_cache/tel-03125685v1.pdf already exists
07/11/2023 16:00:14 ERROR descriptor 'strip' requires a 'str' object but received a 'NoneType'
Traceback (most recent call last):
File "./extract_text_from_pdf.py", line 522, in download_and_process_all
process_pdf(f_pdf, f_json, pdf_content=pdf_content)
File "./extract_text_from_pdf.py", line 412, in process_pdf
pdf_dict = xml_to_dict(paper_id, xml)
File "./extract_text_from_pdf.py", line 360, in xml_to_dict
body = [{'text': get_all_text_as_one(root, xml_path, sep=cfg.MERGE_SEPARATOR)}]
File "./extract_text_from_pdf.py", line 119, in get_all_text_as_one
text_list = get_all_text_as_list(root, element_name_or_path)
File "./extract_text_from_pdf.py", line 102, in get_all_text_as_list
text_list = [t for t in list(map(str.strip, text_list)) if t]
TypeError: descriptor 'strip' requires a 'str' object but received a 'NoneType'
To figure out what causes the error we need to check the XML /home/issa/ISSA-2/data/hal/dataset-2-0/20231107/xml
The text was updated successfully, but these errors were encountered:
I noticed an unusual Grobid ERROR in HAL for just one file.
To figure out what causes the error we need to check the XML /home/issa/ISSA-2/data/hal/dataset-2-0/20231107/xml
The text was updated successfully, but these errors were encountered: