Improve performance #394

mgirlich · 2023-06-01T11:22:56Z

I use the paws package to work with S3, e.g. list objects in a bucket. As this took quite a lot of time I did some profiling and noticed most of the time is spend in parsing the XML response (it uses/used as_list()). I created a PR (paws-r/paws#621) that improves the performance quite a bit but is still really slow (like 90% of the time is spend in parsing).
To further improve the performance without trying to use/abuse xpath further, it is probably easier to improve the performance of xml2 in general.

The text was updated successfully, but these errors were encountered:

D3SL · 2023-11-08T09:45:00Z

I work with XML files constantly and ran into this exact issue earlier this year as well. XML2 takes roughly a minute to extract data from a ~350kb-1.5mb xml file into a dataframe. For comparison I can process 600 files in the same amount of time by reading the file as a single column table with fread(), reformatting each row with stringr, flattening the table to a JSON string, converting it to a json and then back to a table, and then going through a series of unnest_wider and unnest_longer operations to populate parent data to child nodes.

mgirlich mentioned this issue Jun 1, 2023

Replace structure() by class() #393

Merged

hadley added the upkeep maintenance, infrastructure, and similar label Oct 30, 2023

hadley removed the upkeep maintenance, infrastructure, and similar label Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance #394

Improve performance #394

mgirlich commented Jun 1, 2023

D3SL commented Nov 8, 2023

Improve performance #394

Improve performance #394

Comments

mgirlich commented Jun 1, 2023

D3SL commented Nov 8, 2023