Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserializing Vec fails if there's something in between #55

Closed
mloc opened this issue Oct 31, 2017 · 31 comments · Fixed by #143
Closed

Deserializing Vec fails if there's something in between #55

mloc opened this issue Oct 31, 2017 · 31 comments · Fixed by #143

Comments

@mloc
Copy link

mloc commented Oct 31, 2017

Using this code...

#[macro_use] extern crate serde_derive;
extern crate serde_xml_rs;

use std::io::stdin;

#[derive(Debug, Deserialize)]
struct Root {
    foo: Vec<String>,
    bar: Vec<String>
}

fn main() {
    let res: Root = match serde_xml_rs::deserialize(stdin()) {
        Ok(r) => r,
        Err(e) => panic!("{:?}", e),
    };

    println!("{:?}", res);
}

...to deserialize this file...

<root>
    <foo>abc</foo>
    <foo>def</foo>

    <bar>lmn</bar>
    <bar>opq</bar>

    <foo>ghi</foo>
</root>

...gives this error...

thread 'main' panicked at 'duplicate field `foo`', src/bin/bug.rs:15:18

This doesn't happen if the elements in the root are contiguous, like...

<root>
    <foo>abc</foo>
    <foo>def</foo>
    <foo>ghi</foo>

    <bar>lmn</bar>
    <bar>opq</bar>
</root>

Expected output in both cases is:

Root { foo: ["abc", "def", "ghi"], bar: ["lmn", "opq"] }

Am I doing something wrong here? Is there any way around this? The data I'm working with is formatted like the first example.

@oli-obk
Copy link
Contributor

oli-obk commented Oct 31, 2017

This is really hard to do and if implemented, completely messes up how serde operates and requires heap allocations.

Additionally to considering solutions on the serde-end, please consider talking to the provider of your data whether they can produce a format that doesn't require a lot of state during deserialization.

@mloc
Copy link
Author

mloc commented Nov 1, 2017

Changing the output from the original program probably isn't going to happen, it's old and the order represents something from the underlying data.

For now I'm using a separate processing pass to group children together, I guess that's the best option for now. Thanks!

@Binero
Copy link

Binero commented Dec 6, 2017

I opened an issue for this on serde's end, but they seem to point to this crate to handle it.

@cetra3
Copy link

cetra3 commented Feb 27, 2018

You can use enums for this:

#[macro_use] extern crate serde_derive;
extern crate serde_xml_rs;

#[derive(Debug, Deserialize)]
struct Root {
    #[serde(rename = "$value")]
    items: Vec<Elems>,
}

#[derive(Debug, Deserialize)]
#[serde(rename_all = "lowercase")]
enum Elems {
    Foo(String),
    Bar(String)
}


fn main() {

    let input: &[u8] = b"<root>
        <foo>abc</foo>
        <foo>def</foo>

        <bar>lmn</bar>
        <bar>opq</bar>

        <foo>ghi</foo>
        </root>";

    let res: Root = match serde_xml_rs::deserialize(input) {
        Ok(r) => r,
        Err(e) => panic!("{:?}", e),
    };

    println!("{:?}", res);
}

@e-oz
Copy link

e-oz commented Sep 18, 2018

This is really hard to do and if implemented, completely messes up how serde operates and requires heap allocations.

Additionally to considering solutions on the serde-end, please consider talking to the provider of your data whether they can produce a format that doesn't require a lot of state during deserialization.

It actually means your approach is unusable for XML, because OP gave absolutely valid XML and any valid parser should parse it. If this design doesn't work, parser should be redesigned. It's not some "extra feature", it's basic functionality.

@cetra3
Copy link

cetra3 commented Sep 18, 2018

@e-oz I've provided working code as above. It's not an issue with this crate, just how serde works. You can always implement a custom deserializer as well.

@e-oz
Copy link

e-oz commented Sep 18, 2018

@cetra3 I wasn't answering to author of this crate. It's issue with serde design. And we all can implement something, but only if basic functionality is covered we can call it "xml deserializer".

@cetra3
Copy link

cetra3 commented Sep 18, 2018

@e-oz You can use serde to deserialize data, you just need to instruct it how to construct your struct. Examples are given for map and struct.

But if you are using the serde_derive crate, then you will need to conform to what shape it expects the data in, but it will deserialize. If you tried the same thing with any other magical deserialization library (i.e, jackson) you'd have the same problem.

@e-oz
Copy link

e-oz commented Sep 18, 2018

@cetra3 thank you for trying to help me.

@Jonesey13
Copy link

Jonesey13 commented Mar 17, 2019

Just to say this bit me as well. For reference for those reading this up in future here's a tldr:

  • There were discussions for customizing duplicate handling (Attribute to control behavior when deserializing duplicate fields serde-rs/serde#690) but for some reason this was de-scoped from the project and no workaround is available atm (to my knowledge)
  • Because serde-xml-rs does not load the entire document in memory (it reads it as a stream), it's unable to read forward without skipping over the items in between (so without a re-write + an agreement on reducing performance to support this it won't be happening anytime soon/ever)

The #[serde(rename = "$value")] + Enum approach doesn't work for me either as it expects to handle every child element it encounters and requires putting every possible tag into the enum (any attempt to declare it in the parent struct instead is ignored and triggers an error "unknown variant xxxx")

  • OK I could put every possible element into the enum but it will get annoying going through a list of enums just to a get a singleton child element that should have its own field

The only real option atm is to either hack up your own solution or to modify your input xml.

  • (FYI this isn't just a problem with this serde deserializer, serde-json arguably has a worse problem with this, although repeated keys are much rarer!)

@ralpha
Copy link

ralpha commented Apr 20, 2020

Because I came across this issue multiple times now, I have tried to solve it.
I first tried using the Enum example above, but this had other limitation that I could not work around and it made the code look terrible.
So I implemented my own deserializer. And got it working! 😃
I also tried implementing a derive macro. And after a lot of trail and error I got it working.
The result is not pretty, but you can find it here:
https://github.com/ralpha/serde_deserializer_best_effort
(it will probably not solve your problem, but might help you)

I included both the manual implementation and the derive trait.
The code could be so much nicer if it was included in the serde crate.
But I was not very familiar with writing macro's before this.
(A good thing that compilers course from a few year ago came in handy 😉 )
I don't know if this behavior could be implement in this or the serde crate/repo.

TL;DR: Solved it, but with its limitation: https://github.com/ralpha/serde_deserializer_best_effort

@punkstarman
Copy link
Collaborator

@Jonesey13 and @ralpha, thank you for these contributions to the discussion.

Please, would you mind contributing example schemas or documents?

@ralpha
Copy link

ralpha commented Apr 23, 2020

@punkstarman Could you be a bit more specific? What part would you like me to clarify?
I looked even more into this and see that the problem is pity hard-coded into the serde crate.
So I think it should be changed there. (with possible changes here afterwords depending on how this is solved).

I put a longer and more detailed explanation of how this could be solved here: serde-rs/serde#1725 (comment)

@punkstarman
Copy link
Collaborator

punkstarman commented Apr 25, 2020

Could you be a bit more specific? What part would you like me to clarify?

@ralpha, the examples you and others have provided are all minimalistic unless I have overlooked something. This is great for reproducing the problem and checking whether it has been addressed or not, but it makes it difficult to convince that the problem is valid.

My difficulty currently is that I cannot find a compelling real-world use case where deserializing as an enum is unwarranted.

One case mentioned here is that when there is a mixture of singleton and repeatable children. In the few XML schemas I know that allow this, the singleton elements must appear first. Other schemas avoid mixing singleton and repeated elements as siblings by putting the latter into a collection element.

On the other hand, I feel that the order is important when repeated elements can be interleaved as siblings. It seems odd to me to want to deserialize into segregated arrays or vectors and lose the order.

For instance in HTML order is important,

<body>
  <p>...</p>
  <ul>...</ul>
  <p>...</p>
  <ul>...</ul>
</body>

Would not be serialized as

Body {
  ps: [ ..., ... ]
  uls: [ ..., ... ]
}

The fact that I cannot find a case doesn't there isn't one, so if anyone can provide one, I would be grateful.

I looked even more into this and see that the problem is [pretty] hard-coded into the serde crate. So I think it should be changed there.

I agree

I think one or several compelling use cases would go a long way in motivating the change.

@ralpha
Copy link

ralpha commented Apr 25, 2020

Okay, I'm deserializing xml data that was created by an other program that is outside my control, so I can not change the data or structure it gives me. (and there are multiple versions of the software that give slightly different exports)

The program that deserializes the data (that I'm creating) is going to be used by clients that have there own export from the program. They can input there xml file(s) and it is deserialized right there and the data is used by the program.

The data consist of a bunch of lists with objects that with a lot of optional tags. So I have many structs with about 30-40 tags in them, with sub struct nested in them. So changing this to an Enum make is much much harder to manage the code. The order of the items in the objects is not important to me, just the data. So it is not like the html you put above.

The program that creates the xml (NOT my program) that I'm using probably has some code like this:

for x in list {
    if x.type == "type1" {
        println!("<type1>...<a>1</a><b>2</b>...With a lot of data in here with nested tags...</type1>");
    } else {
        println!("<type2>...<c>3</c><d>4</d>...With a lot of data in here with nested tags...</type2>");
    }
}

This (I think) is a common thing that happens in a lot of programs and this creates interleaved tags. Which is valid xml so they do this. And there is no way for me to know this up front if it does this. (code is closed source, so can not inspect it) So I just test it out on about 20+ different exports and see where it gives me errors.

But this means I have to rewrite my whole codebase just because there is/could be 1 interleaved tag in the whole xml.
The files I'm loading could be as large as 700 MB if not larger, so it could take a while to find 1 very infrequent tag that just happens to be interleaved with a probability of 1/2 000 000 or so. This will then throw an error and stop all the parsing of the whole file.

Summary
The enum solution might work in some cases but when you have a struct with 40 optional fields this quickly gets complicated. And when I want to use the data I have to write massive match blocks to capture all optional fields.

That is why if this problem is fixed I will make to SO much easier to actually use it. Writing that custom deserializer was less work (even though it took me quite a while) then changing the program to use enums.

So I think there are very valid use-cases for this.
And you only find out you have this problem when you have already created a large part of your code. Which is not fun in any way.
If this problem is solved in Serde it will also solve problems with json and others.

Here are reported issues I found where people has this problem (not trying to count people double):
serde-rs/serde#1725 (2 people)
tafia/quick-xml#177 (1 new person)
#55 (this issue, 4 people)
serde-rs/serde#1113
serde-rs/serde#690
faebser/beautiful-wallpaper-every-day#1
#5
serde-rs/serde#1661
... And there are probably many, many more that did not report the issue or just read it...
( In some of these cases Enums solved the problem or the problem could be fixed with other options like serde_with )

@Jonesey13
Copy link

Jonesey13 commented May 5, 2020

@punkstarman The use case I was working against was for parsing SVG files. For example when generating SVG in inkscape it's quite common to output interspaced <rect> and <path> elements.

Your point about the ordering of the elements is a very good one and tbh I can't think of a good counterpoint (as it's also important in the SVG case like it is with HTML; technically you could argue layers should be used for the draw order but it's a bit of a moot point as it's part of the SVG spec).

However I do agree with @ralpha that having to include all possible tag in the enum is annoying to work with at best when you can't choose to just ignore them.

@ralpha
Copy link

ralpha commented May 10, 2020

Well seems like it is not getting fixed in Serde: serde-rs/serde#1725 (comment)
And as it is currently hard coded in serder_derive It is probably not going to be fixed there unless serde(_derive) is forked and changed for XML. Which sadly splits the ecosystem a bit.
But maybe only the derive traits could be changed and included in this crate?
@punkstarman What do you think? What should happen, if anything at all?

@punkstarman
Copy link
Collaborator

@ralpha

Well seems like it is not getting fixed in Serde: serde-rs/serde#1725 (comment)
And as it is currently hard coded in serder_derive It is probably not going to be fixed there unless serde(_derive) is forked and changed for XML. Which sadly splits the ecosystem a bit.

Someone already did this, see https://github.com/media-io/yaserde.

But maybe only the derive traits could be changed and included in this crate?

I don't believe that this is technically possible.

I think that packaging a custom deserializer in its own crate would be the way to go.

@punkstarman What do you think? What should happen, if anything at all?

With regards to this problem, I don't see how anything can be done in the serde-xml-rs crate. The serde and serde_derive crates paint us into a corner and were never really designed with XML in mind.

@sapessi
Copy link

sapessi commented May 22, 2020

I'm trying to use enums to parse a WSDL definition that has body/headers in the wrong order (example below). It seems that the parser fails to successfully close to element or "peek" at the next one. Have you encountered this before?

XML sample

<wsdl:operation name="ResolveNames">
  <soap:operation soapAction="http://schemas.microsoft.com/exchange/services/2006/messages/ResolveNames"/>
  <wsdl:input>
    <soap:body parts="request" use="literal"/>
    <soap:header message="tns:ResolveNamesSoapIn" part="Impersonation" use="literal"/>
    <soap:header message="tns:ResolveNamesSoapIn" part="MailboxCulture" use="literal"/>
    <soap:header message="tns:ResolveNamesSoapIn" part="RequestVersion" use="literal"/>
  </wsdl:input>
  <wsdl:output>
    <soap:body parts="ResolveNamesResult" use="literal"/>
    <soap:header message="tns:ResolveNamesSoapOut" part="ServerVersion" use="literal"/>
  </wsdl:output>
</wsdl:operation>

Rust structs

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "wsdl:operation")]
pub struct OperationBinding {
    name: String,
    operation: SoapOperation,
    #[serde(rename = "input")]
    input: Vec<SoapDataBinding>,
    #[serde(rename = "output")]
    output: Vec<SoapDataBinding>,
}

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "soap:operation")]
pub struct SoapOperation {
    #[serde(rename = "soapAction")]
    soap_action: String,
}

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename_all = "lowercase")]
pub enum SoapDataBinding {
    Body(BodyBinding),
    Header(HeaderBinding),
}

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "soap:body")]
pub struct BodyBinding {
    parts: String,
    #[serde(rename = "use")]
    body_use: String,
}

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "soap:header")]
pub struct HeaderBinding {
    message: String,
    part: String,
    #[serde(rename = "use")]
    header_use: String,
}

Log output

2020-05-22 08:30:00,076 DEBUG [serde_xml_rs::de] Peeked EndElement({http://schemas.xmlsoap.org/wsdl/soap/}soap:operation)
2020-05-22 08:30:00,076 DEBUG [serde_xml_rs::de] Fetched EndElement({http://schemas.xmlsoap.org/wsdl/soap/}soap:operation)
2020-05-22 08:30:00,076 DEBUG [serde_xml_rs::de] Peeked StartElement({http://schemas.xmlsoap.org/wsdl/}wsdl:input, {"": "", "s": "http://www.w3.org/2001/XMLSchema", "soap": "http://schemas.xmlsoap.org/wsdl/soap/", "t": "http://schemas.microsoft.com/exchange/services/2006/types", "tns": "http://schemas.microsoft.com/exchange/services/2006/messages", "wsdl": "http://schemas.xmlsoap.org/wsdl/", "xml": "http://www.w3.org/XML/1998/namespace", "xmlns": "http://www.w3.org/2000/xmlns/"})
2020-05-22 08:30:00,076 DEBUG [serde_xml_rs::de] Peeked StartElement({http://schemas.xmlsoap.org/wsdl/}wsdl:input, {"": "", "s": "http://www.w3.org/2001/XMLSchema", "soap": "http://schemas.xmlsoap.org/wsdl/soap/", "t": "http://schemas.microsoft.com/exchange/services/2006/types", "tns": "http://schemas.microsoft.com/exchange/services/2006/messages", "wsdl": "http://schemas.xmlsoap.org/wsdl/", "xml": "http://www.w3.org/XML/1998/namespace", "xmlns": "http://www.w3.org/2000/xmlns/"})
2020-05-22 08:30:00,076 DEBUG [serde_xml_rs::de] Peeked StartElement({http://schemas.xmlsoap.org/wsdl/}wsdl:input, {"": "", "s": "http://www.w3.org/2001/XMLSchema", "soap": "http://schemas.xmlsoap.org/wsdl/soap/", "t": "http://schemas.microsoft.com/exchange/services/2006/types", "tns": "http://schemas.microsoft.com/exchange/services/2006/messages", "wsdl": "http://schemas.xmlsoap.org/wsdl/", "xml": "http://www.w3.org/XML/1998/namespace", "xmlns": "http://www.w3.org/2000/xmlns/"})
2020-05-22 08:30:00,076 DEBUG [serde_xml_rs::de] Peeked StartElement({http://schemas.xmlsoap.org/wsdl/}wsdl:input, {"": "", "s": "http://www.w3.org/2001/XMLSchema", "soap": "http://schemas.xmlsoap.org/wsdl/soap/", "t": "http://schemas.microsoft.com/exchange/services/2006/types", "tns": "http://schemas.microsoft.com/exchange/services/2006/messages", "wsdl": "http://schemas.xmlsoap.org/wsdl/", "xml": "http://www.w3.org/XML/1998/namespace", "xmlns": "http://www.w3.org/2000/xmlns/"})
2020-05-22 08:30:00,076 DEBUG [serde_xml_rs::de] Fetched StartElement({http://schemas.xmlsoap.org/wsdl/}wsdl:input, {"": "", "s": "http://www.w3.org/2001/XMLSchema", "soap": "http://schemas.xmlsoap.org/wsdl/soap/", "t": "http://schemas.microsoft.com/exchange/services/2006/types", "tns": "http://schemas.microsoft.com/exchange/services/2006/messages", "wsdl": "http://schemas.xmlsoap.org/wsdl/", "xml": "http://www.w3.org/XML/1998/namespace", "xmlns": "http://www.w3.org/2000/xmlns/"})
2020-05-22 08:30:00,076 DEBUG [serde_xml_rs::de] Peeked StartElement({http://schemas.xmlsoap.org/wsdl/soap/}soap:body, {"": "", "s": "http://www.w3.org/2001/XMLSchema", "soap": "http://schemas.xmlsoap.org/wsdl/soap/", "t": "http://schemas.microsoft.com/exchange/services/2006/types", "tns": "http://schemas.microsoft.com/exchange/services/2006/messages", "wsdl": "http://schemas.xmlsoap.org/wsdl/", "xml": "http://www.w3.org/XML/1998/namespace", "xmlns": "http://www.w3.org/2000/xmlns/"}, [parts -> request, use -> literal])
2020-05-22 08:30:00,076 DEBUG [serde_xml_rs::de] Fetched StartElement({http://schemas.xmlsoap.org/wsdl/soap/}soap:body, {"": "", "s": "http://www.w3.org/2001/XMLSchema", "soap": "http://schemas.xmlsoap.org/wsdl/soap/", "t": "http://schemas.microsoft.com/exchange/services/2006/types", "tns": "http://schemas.microsoft.com/exchange/services/2006/messages", "wsdl": "http://schemas.xmlsoap.org/wsdl/", "xml": "http://www.w3.org/XML/1998/namespace", "xmlns": "http://www.w3.org/2000/xmlns/"}, [parts -> request, use -> literal])
2020-05-22 08:30:00,076 DEBUG [serde_xml_rs::de] Peeked EndElement({http://schemas.xmlsoap.org/wsdl/soap/}soap:body)
2020-05-22 08:30:00,076 DEBUG [serde_xml_rs::de] Fetched EndElement({http://schemas.xmlsoap.org/wsdl/soap/}soap:body)
2020-05-22 08:30:00,077 DEBUG [serde_xml_rs::de] Fetched StartElement({http://schemas.xmlsoap.org/wsdl/soap/}soap:header, {"": "", "s": "http://www.w3.org/2001/XMLSchema", "soap": "http://schemas.xmlsoap.org/wsdl/soap/", "t": "http://schemas.microsoft.com/exchange/services/2006/types", "tns": "http://schemas.microsoft.com/exchange/services/2006/messages", "wsdl": "http://schemas.xmlsoap.org/wsdl/", "xml": "http://www.w3.org/XML/1998/namespace", "xmlns": "http://www.w3.org/2000/xmlns/"}, [message -> tns:ResolveNamesSoapIn, part -> Impersonation, use -> literal])
ERROR : Could not parse WSDL: Expected token XmlEvent::EndElement { name, .. }, found StartElement({http://schemas.xmlsoap.org/wsdl/soap/}soap:header, {"": "", "s": "http://www.w3.org/2001/XMLSchema", "soap": "http://schemas.xmlsoap.org/wsdl/soap/", "t": "http://schemas.microsoft.com/exchange/services/2006/types", "tns": "http://schemas.microsoft.com/exchange/services/2006/messages", "wsdl": "http://schemas.xmlsoap.org/wsdl/", "xml": "http://www.w3.org/XML/1998/namespace", "xmlns": "http://www.w3.org/2000/xmlns/"}, [message -> tns:ResolveNamesSoapIn, part -> Impersonation, use -> literal])

@punkstarman punkstarman self-assigned this May 24, 2020
@punkstarman
Copy link
Collaborator

@sapessi, I can reproduce this with the current version. I'm looking into how to fix this.

I'll also see when this bug was introduced because to my recollection this used to work.

@punkstarman
Copy link
Collaborator

@sapessi, actually the only problem is maybe the error message.

The solution is to modify the type definitions to the following.

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "operation")]
pub struct OperationBinding {
    name: String,
    operation: SoapOperation,
    #[serde(rename = "input")]
    input: SoapDataBindings,
    #[serde(rename = "output")]
    output: SoapDataBindings,
}

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "operation")]
pub struct SoapOperation {
    #[serde(rename = "soapAction")]
    soap_action: String,
}

#[derive(Debug, Serialize, Deserialize, PartialEq)]
pub struct SoapDataBindings {
    #[serde(rename = "$value")]
    bindings: Vec<SoapDataBinding>
}

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename_all = "lowercase")]
pub enum SoapDataBinding {
    Body(BodyBinding),
    Header(HeaderBinding),
}

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "body")]
pub struct BodyBinding {
    parts: String,
    #[serde(rename = "use")]
    body_use: String,
}

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "header")]
pub struct HeaderBinding {
    message: String,
    part: String,
    #[serde(rename = "use")]
    header_use: String,
}

@tobz1000
Copy link
Contributor

tobz1000 commented Oct 3, 2020

Just as a heads up, I've started work on a potential solution to this issue. I hope to have a PR in the coming weeks.

@ralpha
Copy link

ralpha commented Oct 4, 2020

@tobz1000 https://github.com/ralpha/serde_deserializer_best_effort I created this some time ago, it works but is not clean by any means. But maybe it helps.
If you actually solve this problem that would be wonderful! 😃

emmanueltouzery pushed a commit to emmanueltouzery/zbus that referenced this issue Nov 6, 2020
Unfortunately, serde-xml-rs doesn't handle nicely interleaved elements:
RReverser/serde-xml-rs#55

Implement the suggested workaround by using an enum.

That means for now we will fail to parse XML with unknown elements
though.
@fulara
Copy link

fulara commented Dec 26, 2020

fyi, I am happy that this wasnt solved anyhow - thanks to this error I realised that I have dependency on the order of xml children, and if this would be /working/ I would spent hours trying to find the rootcause! so the enum solution here is actually what I really wanted, thanks!

@SeedyROM
Copy link

SeedyROM commented Jan 27, 2021

Just why? Having a reusable tag is not insane, especially if the types are the same! Am I missing something?
This seems like a cop out.

@lovasoa
Copy link

lovasoa commented Jan 28, 2021

The best workaround I found at the moment is to use an enum with #[serde(other) and #[serde(deserialize_with)] to deserialize unknown tags:

use serde::{Deserialize, Deserializer};

#[derive(Deserialize, Debug)]
#[serde(rename_all = "lowercase")]
enum Item {
    Foo(String),
    Bar(String),
    #[serde(other, deserialize_with = "deserialize_ignore_any")]
    Other,
}

fn deserialize_ignore_any<'de, D: Deserializer<'de>>(deserializer: D) -> Result<(), D::Error> {
    serde::de::IgnoredAny::deserialize(deserializer)?;
    Ok(())
}

#[derive(Deserialize, Debug)]
struct Root{
    #[serde(rename="$value")]
    items: Vec<Item>
}

fn main() {
    let xml = r#"<root> <foo>a</foo> <bar>b</bar> <foo>c</foo>  <unknown>d</unknown> </root>"#;
    let v: Root = serde_xml_rs::from_str(xml).unwrap();
    println!("{:?}", v); // prints: Root { items: [Foo("a"), Bar("b"), Foo("c"), Other] }
}

@punkstarman
Copy link
Collaborator

Thank you @lovasoa.

IMHO, this isn't a workaround but a solution.

@lovasoa
Copy link

lovasoa commented Jan 28, 2021

I made a pr to serde_with, so that deserialize_ignore_any can be imported instead of copy-pasted : jonasbb/serde_with#251

@1sra3l
Copy link

1sra3l commented Jan 16, 2022

@punkstarman I can provide valid xml examples that are libre sourced to test.

Joe's Window Manager

The Standard XDG menu is another example.
I tried all your solutions, but may have missed something. In the XDG menu, multiple Merge elements can exist, or not as a part of the standard example:

<-- snip-->
            <DefaultLayout>
              <Merge type="menus"/>
              <Merge type="files"/>
              <Separator/>
              <Menuname>More</Menuname>
            </DefaultLayout>
<-- snip-->

@1sra3l
Copy link

1sra3l commented Jan 18, 2022

@RReverser this issue still exists, as far as I can tell.
Specifically with both of the files I provide.
I have tried the enum, with the Other as well as using:

let mut de = serde_xml_rs::Deserializer::new_from_reader(file_string.as_bytes())
                                             .non_contiguous_seq_elements(true);

What am I missing if this is fixed?
JWM specifically uses out of place items:

    <Tray x="0" y="-1" autohide="off" delay="1000">
       <!-- Tray button 0 -->
       <TrayButton label="JWM">root:1</TrayButton>
       <!-- Spacer 0 -->
        <Spacer width="2"/>
       <!-- Tray button 1 -->
        <TrayButton label="_">showdesktop</TrayButton>
       <!-- Spacer 1 -->
        <Spacer width="2"/>
        <Pager labeled="true"/>
        <TaskList maxwidth="256"/>
        <Swallow width="32" height="32" name="xclock">xclock</Swallow>
        <Dock/>
        <Clock format="%l:%M %p"><Button mask="123">exec:xclock</Button></Clock>
    </Tray>

I have tried your tricks and they do not work. Which likely means I am missing something.
The Vec<enum> method does not work even with:

pub enum Stuff {
   //my enums
   #[serde(other, deserialize_with = "deserialize_ignore_any")]
   Other,
}
pub struct File {
  #[serde(rename = "$value")]
  pub items:Vec<Stuff>, // still get error "custom: duplicate field `<whatever>`"
}

@Mingun
Copy link

Mingun commented May 21, 2022

I've implemented parsing overlapped sequences in tafia/quick-xml#387 under a feature flag overlapped-lists. Be noted, that enabling that feature could lead to quadratic parsing and high memory consumption, so use it at your risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.