From 4a8111ee698a17038ffbc3f43ceef128f027de81 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sun, 8 Mar 2020 17:53:37 -0600 Subject: [PATCH] api ref: change XML example in open to use SAX per https://github.com/iterative/dvc.org/pull/908#discussion_r388043786 --- public/static/docs/api-reference/open.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/public/static/docs/api-reference/open.md b/public/static/docs/api-reference/open.md index 99602624c7..3a90062e40 100644 --- a/public/static/docs/api-reference/open.md +++ b/public/static/docs/api-reference/open.md @@ -96,21 +96,28 @@ using this API. For example, an XML file tracked in a public DVC repo on Github can be processed directly in your Python app with: ```py -from xml.dom.minidom import parse +from xml.sax import parse import dvc.api +from mymodule import mySAXHandler with dvc.api.open( 'get-started/data.xml', repo='https://github.com/iterative/dataset-registry' ) as fd: - xmldom = parse(fd) - # ... Process DOM + parse(fd, mySAXHandler) ``` -> Notice that if you just need to load the complete file contents to memory, you -> can use `dvc.api.read()` instead: +Notice that we want to use a [SAX](http://www.saxproject.org/) XML parser here +because `dvc.api.open()` is able to stream the file, the `mySAXHandler` object +must handle the event-driven parsing of the document in this case. + +> If you just need to load the complete file contents to memory, you can use +> `dvc.api.read()` instead: > > ```py +> from xml.dom.minidom import parse +> import dvc.api +> > xmldata = dvc.api.read('get-started/data.xml', > repo='https://github.com/iterative/dataset-registry') > xmldom = parse(xmldata)