Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

namespace support #7

Open
gedaiu opened this issue May 4, 2018 · 10 comments
Open

namespace support #7

gedaiu opened this issue May 4, 2018 · 10 comments
Labels
enhancement New feature or request

Comments

@gedaiu
Copy link

gedaiu commented May 4, 2018

will you consider in the future to add namespace support for the xml tags?

@jmdavis
Copy link
Owner

jmdavis commented May 4, 2018

What would you consider to be necessary for that? I'm not very familiar with XML namespacing other than the fact that the names then have a namespace in them which then tells the application something about what the tag is for. As it stands, the name is provided as-is, so it should be trivial for anyone to get the namespace by calling split or splitter on it. Is there something beyond that that's really needed for whatever folks normally do with namespaces?

@jmdavis jmdavis added the enhancement New feature or request label May 4, 2018
@gedaiu
Copy link
Author

gedaiu commented May 4, 2018

I would like to replace my dummy xml implementation with yours in this library: https://github.com/gedaiu/vibe.dav

The problem is that the dav protocol uses a lot of namespaces

@jmdavis
Copy link
Owner

jmdavis commented May 4, 2018

Okay. But what does "namespace support" mean to you? As I understand it, the namespace is just part of the name where the part before a colon is the namespace - e.g. <foo:bar> is in the namespace "foo". Protocols or specifications may then treat that namespace as meaning something (e.g. to differentiate between <foo:bar> and <other:bar>), but from what I can tell, from the standpoint of parsing, there really isn't anything special about them. They're just names with colons in them. dxml provides the name of the start and end tags to the program using it, so the namespace is there in the name and can be trivially pulled out of the full tag name using std.array.split or std.algorithm.splitter.

Is the problem that you want an easier or more idiomatic way to pull out the namespace where you do something like range.front.name.namespace rather than calling split yourself? Or are you talking about adding some sort of validation related to namespaces? Or something else?

@gedaiu
Copy link
Author

gedaiu commented May 4, 2018

You could use a split for the tag namebut I don't think it's enough, because you could have this xmls documents which are equivalent:

<a:table xmlns:a="http://somedefinition.com">
  <a:name>some name</a:name>
</a:table>
<b:table xmlns:b="http://somedefinition.com">
  <b:name>some name</a:name>
</b:table>

And every DAV client has their own way for sending the prefix name. It could be definitely be handled by the client library but it would be nice if something like this would be possible with your library:

assert(range.front.namespace == "http://somedefinition.com") /// instead of `a` or `b`

@jmdavis
Copy link
Owner

jmdavis commented May 4, 2018

So, when you say that you want the namespace, you don't mean the name of the namespace that goes in a tag name, you mean the URL associated with the namespace, because that uniquely identifies the namespace, whereas its name doesn't? That's definitely harder. It would be easy enough to provide a function for splitting the tag name into the namespace name and the local tag name, but the only place where the URL would be is in the start tag with the xlmns attribute. For the parser to provide that information, it would have to store it, which would mean allocating storage for it somewhere, which doesn't really make sense in the default case, especially since the parser is designed to allocate as little as possible. So, I'll have to think about a reasonable way to solve this.

My first thought is to provide a wrapper range that examines each start tag in popFront to see if it's a namespace declaration and adds it to the list of namespaces that it knows about, and it can then use that to provide the information. But I'll definitely have to study the XML namespace spec and think about this.

@gedaiu
Copy link
Author

gedaiu commented May 4, 2018

It's exactly what I was thinking... I think this might be the best approach. I'll watch the project for this feature :)

@bubnenkoff
Copy link

I need to parse a lot of documents some tags in them may have and some may not have namsespaces. How I can iterate trow them without specifying namespased:
eg:

auto r2 = result.skipToPath("fcsProtocolEF3");

instead of:

auto r2 = result.skipToPath("ns2:fcsProtocolEF3");

@jmdavis
Copy link
Owner

jmdavis commented Mar 12, 2019

I need to parse a lot of documents some tags in them may have and some may not have namsespaces. How I can iterate trow them without specifying namespased:

dxml doesn't currently understand anything about namespaces. "fcsProtocolEF3" and "ns2:fcsProtocolEF3" are different names, because the entire string is the name. A function like skipToPath requires that the name be an exact match. If you're looking for a partial match, then you'll have to do something like

auto r2 = result.find!(a => a.type == EntityType.elementStart &&
                       (a.name == "fcsProtocolEF3" || a.name.endsWith(":fcsProtocolEF3"))();

though that's going to be reading through all of the entities linearly and would give you any tag anywhere in the document after the current entity which had a matching name, regardless of its depth or relation to the current entity. So, it's not really equivalent to skipToPath. To do what skipToPath does, you would have to navigate to each start tag and check it, using skipContents to skip any child tags of the start tag. So, something like

// assuming that you're on a start tag and that SplitEmpty.yes is used
while(true)
{
    if(range.front.name == "fcsProtocolEF3" || range.front.name.endsWith(":fcsProtocolEF3")
        break;
    range = range.skipContents(); // skips to the corresponding end tag
    range.popFront(); // skips the corresponding end tag
    switch(range.front.type)
    {
        case EntityType.elementStart: continue;
        case EntityType.elementEnd: break; // we've gone up a level
        default:
        {
            range = range.skipToEntityType(EntityType.elementStart, EntityType.elementEnd);
            if(range.front.type == EntityType.elementEnd)
                break; // we've gone up a level
            continue;
        }
    }
    /+ do whatever you do when the tag isn't there +/
}

Alternatively, if you don't care about the memory consumption, you could call parseDOM on the parent tag and get the DOM tree for that section of the tree and then just check each of its direct children.

But really, as things stand, dxml doesn't really have any good helper functions for searching for tags based on their names unless you're looking for the exact name.

@bubnenkoff
Copy link

@jmdavis could you add ns support in future?

@jmdavis
Copy link
Owner

jmdavis commented Mar 12, 2019

I intend to add something, but I don't know exactly what it will look like yet. It will probably involve a helper wrapper around the existing functionality, but I have to find time to sit down and work out what's really needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants