Preliminary XML support #224

retrography · 2016-02-20T21:43:58Z

This commit adds preliminary support for XML dataset import and export (no databook yet). The code uses only Python's internal libraries and works on both Python 2 and 3. It supports reading XML datasets with data saved as element or as attributes. I am sure the code has a lot of room for improvement, but I prefer to get some early feedback before finishing up.

kennethreitz · 2016-02-20T23:13:44Z

This library has had a joke in the documentation that "xml will never be supported". :)

I enjoy this joke, and it would entertain me to see it remain true. But, it can easily be removed instead.

kennethreitz · 2016-02-20T23:14:47Z

I haven't executed the code, but the approach looks relatively sensible. Do you think having xml in/out will be useful, even though the various forms it takes are so variable?

retrography · 2016-02-20T23:56:00Z

Haha, I am aware of the joke. We will try to find another format to denigrate once this is done (let's say RDF)!

Actually most datasets provided in XML format come in two pretty simple flavours: 1) One record per line, fields as attributes (like Stack Exchange data dump), and 2) Records as elements, fields as sub-elements. In my experience many XML data dumps are either already in one of these formats or can be reduced to one of these with a simple XPath.

My objective, if I have time, is to support these two formats first. Then we can add XPath support in a later version. That is exactly what Google Sheets does, and I have found it largely sufficient for most data imports.

For now the data read code snippet is a bit buggy, but it reads 70-80 percent of the files I have tried in the two formats I just mentioned. The XML writer must be more robust -- I spent some time on it this morning.

I sincerely hate dealing with XML files, and that is why I am writing this: I just want to be able to turn them into other less finicky formats with as little hassle as possible. I am more of a data analyst than a programmer, and I think such a tool can be very useful for people like me.

Try it with some data from Stack Exchange. It doesn't work with every dataset yet, but the outcomes is pretty cool.

kennethreitz · 2016-02-21T00:39:50Z

We support RDF too! Don't worry, we'll think of something ;)

This is great work – I'm excited about it. If there's anything I can do to support the process, please let me know!

retrography · 2016-02-21T00:44:33Z

Just let me know if I have to respect some conventions that you have abided by in the code up to now. I am pretty excited too: Finally I will have one data interchange package for all my needs (or most of them at least...)!

Preliminary XML support

a0e7591

Fixed crash on variable field sets

289235c

DeepSpace2 mentioned this pull request Apr 8, 2020

WIP: preliminary support for XML, 2020 #464

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preliminary XML support #224

Preliminary XML support #224

retrography commented Feb 20, 2016

kennethreitz commented Feb 20, 2016

kennethreitz commented Feb 20, 2016

retrography commented Feb 20, 2016

kennethreitz commented Feb 21, 2016

retrography commented Feb 21, 2016

Preliminary XML support #224

Are you sure you want to change the base?

Preliminary XML support #224

Conversation

retrography commented Feb 20, 2016

kennethreitz commented Feb 20, 2016

kennethreitz commented Feb 20, 2016

retrography commented Feb 20, 2016

kennethreitz commented Feb 21, 2016

retrography commented Feb 21, 2016