You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the following code returns “Error while parsing the file: Failed to parse the file:"
const file = storage.bucket(process.env.GCP_BUCKET).file(fileId);
const [buffer] = await file.download();
const getMetadata = await file.getMetadata();
console.log(getMetadata[0].contentType)
const reader = new LlamaParseReader({
resultType: "markdown",
skipDiagonalText: true,
verbose: true,
});
const unt8Array = new Uint8Array(buffer);
const documents = await reader.loadDataAsContent(unt8Array); // -> Error while parsing the file: Failed to parse the file: c8e8b079-f3aa-4786-bfe3-e9b3981812cf, status: ERROR
Additional context:
Digging into the SDK, I added a breakpoint on LlamaParseReader function loadDataAsContent which led to createJob. In this screenshot, the PPTX file MimeType applied is application/vnd.oasis.opendocument.spreadsheet, this does not sound right.
Files:
wget "https://meetings.wmo.int/Cg-19/PublishingImages/SitePages/FINAC-43/7%20-%20EC-77-Doc%205%20Financial%20Statements%20for%202022%20(FINAC).pptx" -O data/presentation.ppt ppx source file from llamaParse github
ajpanyteam
changed the title
Error parsing MSFT DOCX and PPTX File Types - I suspect LlamaParseReader is applying wrong MIME Type
Error parsing MSFT DOCX and PPTX File Types - suspect LlamaParseReader is applying wrong MIME Type
Oct 9, 2024
ajpanyteam
changed the title
Error parsing MSFT DOCX and PPTX File Types - suspect LlamaParseReader is applying wrong MIME Type
LlamaParse Error parsing MSFT DOCX and PPTX File Types - suspect LlamaParseReader is applying wrong MIME Type
Oct 9, 2024
ajpanyteam
changed the title
LlamaParse Error parsing MSFT DOCX and PPTX File Types - suspect LlamaParseReader is applying wrong MIME Type
LlamaParse error parsing MSFT DOCX and PPTX files - suspect LlamaParseReader is applying wrong MIME Type
Oct 9, 2024
For certain files, this example code does not fail as an error but instead produces a garbage parse based on data which is not at all related to the original file. See example repo for reproducing
@himself65 I just validate docx upload via the website. So, assuming they're both using the same API, it seems unlikely this is on the llama parse side.
Describe the bug:
LLamaParse is failing for PPTX and DOCX files. But works for PDF and XLSX files.
Using the llamaIndex TS SDK v0.6.17.
I suspect this is related to #1007
Write a concise description of what the bug is:
Using the following code returns “Error while parsing the file: Failed to parse the file:"
Additional context:
loadDataAsContent
which led tocreateJob
. In this screenshot, the PPTX file MimeType applied is application/vnd.oasis.opendocument.spreadsheet, this does not sound right.Files:
wget "https://meetings.wmo.int/Cg-19/PublishingImages/SitePages/FINAC-43/7%20-%20EC-77-Doc%205%20Financial%20Statements%20for%202022%20(FINAC).pptx" -O data/presentation.ppt
ppx source file from llamaParse github
Job ID:
The text was updated successfully, but these errors were encountered: