You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use the paws package to work with S3, e.g. list objects in a bucket. As this took quite a lot of time I did some profiling and noticed most of the time is spend in parsing the XML response (it uses/used as_list()). I created a PR (paws-r/paws#621) that improves the performance quite a bit but is still really slow (like 90% of the time is spend in parsing).
To further improve the performance without trying to use/abuse xpath further, it is probably easier to improve the performance of xml2 in general.
The text was updated successfully, but these errors were encountered:
I work with XML files constantly and ran into this exact issue earlier this year as well. XML2 takes roughly a minute to extract data from a ~350kb-1.5mb xml file into a dataframe. For comparison I can process 600 files in the same amount of time by reading the file as a single column table with fread(), reformatting each row with stringr, flattening the table to a JSON string, converting it to a json and then back to a table, and then going through a series of unnest_wider and unnest_longer operations to populate parent data to child nodes.
I use the paws package to work with S3, e.g. list objects in a bucket. As this took quite a lot of time I did some profiling and noticed most of the time is spend in parsing the XML response (it uses/used
as_list()
). I created a PR (paws-r/paws#621) that improves the performance quite a bit but is still really slow (like 90% of the time is spend in parsing).To further improve the performance without trying to use/abuse xpath further, it is probably easier to improve the performance of xml2 in general.
The text was updated successfully, but these errors were encountered: