We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When loading a fairly large XML file (~500MB), if I print() the document it takes a long time, and it is not interruptible.
print()
However printing the children nodes individually is fast.
I believe the reprex below eventually calls show_nodes which calls as.character here, that takes a long time and blocks the interpreter.
show_nodes
as.character
xml2/R/xml_nodeset.R
Line 73 in ab73051
library(xml2) # Download 490 MB: if (!file.exists("cellosaurus.xml")) download.file("https://ftp.expasy.org/databases/cellosaurus/cellosaurus.xml", "cellosaurus.xml") # Read XML: cellosaurus_xml <- xml2::read_xml("cellosaurus.xml") # My print (a fast version, closer to what I would expect) cat(format(cellosaurus_xml)) #> <Cellosaurus> children <- xml2:::xml_children(cellosaurus_xml) for (child in children) { cat(format(child), "\n") xml2:::show_nodes(xml2:::xml_children(child)) } #> <header> #> [1] <terminology-name>Cellosaurus</terminology-name> #> [2] <description>Cellosaurus: a controlled vocabulary of cell lines</descript ... #> [3] <release version="48.0" updated="2024-01-30" nb-cell-lines="152231" nb-pu ... #> [4] <terminology-list>\n <terminology name="NCBI-Taxonomy" source="National ... #> <cell-line-list> #> [1] <cell-line category="Hybridoma" created="2021-09-23" last-updated="2024- ... #> [2] <cell-line category="Hybridoma" created="2021-09-23" last-updated="2024- ... #> [3] <cell-line category="Transformed cell line" created="2012-10-22" last-up ... #> [4] <cell-line category="Hybridoma" created="2017-08-22" last-updated="2023- ... #> [5] <cell-line category="Cancer cell line" created="2017-05-15" last-updated ... #> [6] <cell-line category="Hybridoma" created="2012-06-06" last-updated="2023- ... #> [7] <cell-line category="Hybridoma" created="2014-07-17" last-updated="2023- ... #> [8] <cell-line category="Hybridoma" created="2022-12-15" last-updated="2023- ... #> [9] <cell-line category="Transformed cell line" created="2012-10-22" last-up ... #> [10] <cell-line category="Hybridoma" created="2013-02-11" last-updated="2023- ... #> [11] <cell-line category="Cancer cell line" created="2018-05-14" last-updated ... #> [12] <cell-line category="Finite cell line" created="2012-04-04" last-updated ... #> [13] <cell-line category="Finite cell line" created="2012-04-04" last-updated ... #> [14] <cell-line category="Finite cell line" created="2013-11-05" last-updated ... #> [15] <cell-line category="Finite cell line" created="2012-04-04" last-updated ... #> [16] <cell-line category="Cancer cell line" created="2012-04-04" last-updated ... #> [17] <cell-line category="Cancer cell line" created="2012-04-04" last-updated ... #> [18] <cell-line category="Spontaneously immortalized cell line" created="2019 ... #> [19] <cell-line category="Transformed cell line" created="2021-12-16" last-up ... #> [20] <cell-line category="Cancer cell line" created="2024-01-30" last-updated ... #> ... #> <publication-list> #> [1] <publication date="2005" type="article" journal-name="AAPS J." volume="7 ... #> [2] <publication date="2011" type="article" journal-name="AAPS J." volume="1 ... #> [3] <publication date="2011" type="article" journal-name="AAPS J." volume="1 ... #> [4] <publication date="2016" type="article" journal-name="AAPS J." volume="1 ... #> [5] <publication date="2000" type="article" journal-name="AAPS PharmSci" vol ... #> [6] <publication date="2004" type="article" journal-name="AAPS PharmSci" vol ... #> [7] <publication date="2008" type="article" journal-name="ACS Chem. Biol." v ... #> [8] <publication date="2014" type="article" journal-name="ACS Chem. Biol." v ... #> [9] <publication date="2018" type="article" journal-name="ACS Infect. Dis." ... #> [10] <publication date="2023" type="article" journal-name="ACS Materials Au" ... #> [11] <publication date="2022" type="article" journal-name="ACS Omega" volume= ... #> [12] <publication date="2017" type="article" journal-name="ACS Synth. Biol." ... #> [13] <publication date="2001" type="article" journal-name="Acta Astronaut." v ... #> [14] <publication date="2013" type="article" journal-name="Acta Astronaut." v ... #> [15] <publication date="2005" type="article" journal-name="Acta Biochim. Biop ... #> [16] <publication date="2004" type="article" journal-name="Acta Biochim. Pol. ... #> [17] <publication date="1988" type="article" journal-name="Acta Biol. Hung." ... #> [18] <publication date="2015" type="article" journal-name="Acta Biol. Hung." ... #> [19] <publication date="2016" type="article" journal-name="Acta Crystallogr. ... #> [20] <publication date="2001" type="article" journal-name="Acta Cytol." volum ... #> ... #> <copyright> # This is extremely slow, and non-interruptible: # print(cellosaurus_xml)
Created on 2024-03-12 with reprex v2.1.0
Is this expected? Or should the print() function scale better with larger XML files?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
When loading a fairly large XML file (~500MB), if I
print()
the document it takes a long time, and it is not interruptible.However printing the children nodes individually is fast.
I believe the reprex below eventually calls
show_nodes
which callsas.character
here, that takes a long time and blocks the interpreter.xml2/R/xml_nodeset.R
Line 73 in ab73051
Created on 2024-03-12 with reprex v2.1.0
Is this expected? Or should the
print()
function scale better with larger XML files?The text was updated successfully, but these errors were encountered: