-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserializing Vec fails if there's something in between #55
Comments
This is really hard to do and if implemented, completely messes up how serde operates and requires heap allocations. Additionally to considering solutions on the serde-end, please consider talking to the provider of your data whether they can produce a format that doesn't require a lot of state during deserialization. |
Changing the output from the original program probably isn't going to happen, it's old and the order represents something from the underlying data. For now I'm using a separate processing pass to group children together, I guess that's the best option for now. Thanks! |
I opened an issue for this on serde's end, but they seem to point to this crate to handle it. |
You can use enums for this:
|
It actually means your approach is unusable for XML, because OP gave absolutely valid XML and any valid parser should parse it. If this design doesn't work, parser should be redesigned. It's not some "extra feature", it's basic functionality. |
@e-oz I've provided working code as above. It's not an issue with this crate, just how serde works. You can always implement a custom deserializer as well. |
@cetra3 I wasn't answering to author of this crate. It's issue with serde design. And we all can implement something, but only if basic functionality is covered we can call it "xml deserializer". |
@e-oz You can use But if you are using the |
@cetra3 thank you for trying to help me. |
Just to say this bit me as well. For reference for those reading this up in future here's a tldr:
The #[serde(rename = "$value")] + Enum approach doesn't work for me either as it expects to handle every child element it encounters and requires putting every possible tag into the enum (any attempt to declare it in the parent struct instead is ignored and triggers an error "unknown variant xxxx")
The only real option atm is to either hack up your own solution or to modify your input xml.
|
Because I came across this issue multiple times now, I have tried to solve it. I included both the manual implementation and the derive trait. TL;DR: Solved it, but with its limitation: https://github.com/ralpha/serde_deserializer_best_effort |
@Jonesey13 and @ralpha, thank you for these contributions to the discussion. Please, would you mind contributing example schemas or documents? |
@punkstarman Could you be a bit more specific? What part would you like me to clarify? I put a longer and more detailed explanation of how this could be solved here: serde-rs/serde#1725 (comment) |
@ralpha, the examples you and others have provided are all minimalistic unless I have overlooked something. This is great for reproducing the problem and checking whether it has been addressed or not, but it makes it difficult to convince that the problem is valid. My difficulty currently is that I cannot find a compelling real-world use case where deserializing as an enum is unwarranted. One case mentioned here is that when there is a mixture of singleton and repeatable children. In the few XML schemas I know that allow this, the singleton elements must appear first. Other schemas avoid mixing singleton and repeated elements as siblings by putting the latter into a collection element. On the other hand, I feel that the order is important when repeated elements can be interleaved as siblings. It seems odd to me to want to deserialize into segregated arrays or vectors and lose the order. For instance in HTML order is important,
Would not be serialized as
The fact that I cannot find a case doesn't there isn't one, so if anyone can provide one, I would be grateful.
I agree I think one or several compelling use cases would go a long way in motivating the change. |
Okay, I'm deserializing xml data that was created by an other program that is outside my control, so I can not change the data or structure it gives me. (and there are multiple versions of the software that give slightly different exports) The program that deserializes the data (that I'm creating) is going to be used by clients that have there own export from the program. They can input there xml file(s) and it is deserialized right there and the data is used by the program. The data consist of a bunch of lists with objects that with a lot of optional tags. So I have many structs with about 30-40 tags in them, with sub struct nested in them. So changing this to an Enum make is much much harder to manage the code. The order of the items in the objects is not important to me, just the data. So it is not like the html you put above. The program that creates the xml (NOT my program) that I'm using probably has some code like this:
This (I think) is a common thing that happens in a lot of programs and this creates interleaved tags. Which is valid xml so they do this. And there is no way for me to know this up front if it does this. (code is closed source, so can not inspect it) So I just test it out on about 20+ different exports and see where it gives me errors. But this means I have to rewrite my whole codebase just because there is/could be 1 interleaved tag in the whole xml. Summary That is why if this problem is fixed I will make to SO much easier to actually use it. Writing that custom deserializer was less work (even though it took me quite a while) then changing the program to use enums. So I think there are very valid use-cases for this. Here are reported issues I found where people has this problem (not trying to count people double): |
@punkstarman The use case I was working against was for parsing SVG files. For example when generating SVG in inkscape it's quite common to output interspaced <rect> and <path> elements. Your point about the ordering of the elements is a very good one and tbh I can't think of a good counterpoint (as it's also important in the SVG case like it is with HTML; technically you could argue layers should be used for the draw order but it's a bit of a moot point as it's part of the SVG spec). However I do agree with @ralpha that having to include all possible tag in the enum is annoying to work with at best when you can't choose to just ignore them. |
Well seems like it is not getting fixed in Serde: serde-rs/serde#1725 (comment) |
Someone already did this, see https://github.com/media-io/yaserde.
I don't believe that this is technically possible. I think that packaging a custom deserializer in its own crate would be the way to go.
With regards to this problem, I don't see how anything can be done in the |
I'm trying to use enums to parse a WSDL definition that has body/headers in the wrong order (example below). It seems that the parser fails to successfully close to element or "peek" at the next one. Have you encountered this before? XML sample<wsdl:operation name="ResolveNames">
<soap:operation soapAction="http://schemas.microsoft.com/exchange/services/2006/messages/ResolveNames"/>
<wsdl:input>
<soap:body parts="request" use="literal"/>
<soap:header message="tns:ResolveNamesSoapIn" part="Impersonation" use="literal"/>
<soap:header message="tns:ResolveNamesSoapIn" part="MailboxCulture" use="literal"/>
<soap:header message="tns:ResolveNamesSoapIn" part="RequestVersion" use="literal"/>
</wsdl:input>
<wsdl:output>
<soap:body parts="ResolveNamesResult" use="literal"/>
<soap:header message="tns:ResolveNamesSoapOut" part="ServerVersion" use="literal"/>
</wsdl:output>
</wsdl:operation> Rust structs#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "wsdl:operation")]
pub struct OperationBinding {
name: String,
operation: SoapOperation,
#[serde(rename = "input")]
input: Vec<SoapDataBinding>,
#[serde(rename = "output")]
output: Vec<SoapDataBinding>,
}
#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "soap:operation")]
pub struct SoapOperation {
#[serde(rename = "soapAction")]
soap_action: String,
}
#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename_all = "lowercase")]
pub enum SoapDataBinding {
Body(BodyBinding),
Header(HeaderBinding),
}
#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "soap:body")]
pub struct BodyBinding {
parts: String,
#[serde(rename = "use")]
body_use: String,
}
#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "soap:header")]
pub struct HeaderBinding {
message: String,
part: String,
#[serde(rename = "use")]
header_use: String,
} Log output
|
@sapessi, I can reproduce this with the current version. I'm looking into how to fix this. I'll also see when this bug was introduced because to my recollection this used to work. |
@sapessi, actually the only problem is maybe the error message. The solution is to modify the type definitions to the following.
|
Just as a heads up, I've started work on a potential solution to this issue. I hope to have a PR in the coming weeks. |
@tobz1000 https://github.com/ralpha/serde_deserializer_best_effort I created this some time ago, it works but is not clean by any means. But maybe it helps. |
Unfortunately, serde-xml-rs doesn't handle nicely interleaved elements: RReverser/serde-xml-rs#55 Implement the suggested workaround by using an enum. That means for now we will fail to parse XML with unknown elements though.
fyi, I am happy that this wasnt solved anyhow - thanks to this error I realised that I have dependency on the order of xml children, and if this would be /working/ I would spent hours trying to find the rootcause! so the enum solution here is actually what I really wanted, thanks! |
Just why? Having a reusable tag is not insane, especially if the types are the same! Am I missing something? |
The best workaround I found at the moment is to use an enum with use serde::{Deserialize, Deserializer};
#[derive(Deserialize, Debug)]
#[serde(rename_all = "lowercase")]
enum Item {
Foo(String),
Bar(String),
#[serde(other, deserialize_with = "deserialize_ignore_any")]
Other,
}
fn deserialize_ignore_any<'de, D: Deserializer<'de>>(deserializer: D) -> Result<(), D::Error> {
serde::de::IgnoredAny::deserialize(deserializer)?;
Ok(())
}
#[derive(Deserialize, Debug)]
struct Root{
#[serde(rename="$value")]
items: Vec<Item>
}
fn main() {
let xml = r#"<root> <foo>a</foo> <bar>b</bar> <foo>c</foo> <unknown>d</unknown> </root>"#;
let v: Root = serde_xml_rs::from_str(xml).unwrap();
println!("{:?}", v); // prints: Root { items: [Foo("a"), Bar("b"), Foo("c"), Other] }
} |
Thank you @lovasoa. IMHO, this isn't a workaround but a solution. |
I made a pr to serde_with, so that |
@punkstarman I can provide valid xml examples that are libre sourced to test. The Standard XDG menu is another example. <-- snip-->
<DefaultLayout>
<Merge type="menus"/>
<Merge type="files"/>
<Separator/>
<Menuname>More</Menuname>
</DefaultLayout>
<-- snip--> |
@RReverser this issue still exists, as far as I can tell. let mut de = serde_xml_rs::Deserializer::new_from_reader(file_string.as_bytes())
.non_contiguous_seq_elements(true); What am I missing if this is fixed? <Tray x="0" y="-1" autohide="off" delay="1000">
<!-- Tray button 0 -->
<TrayButton label="JWM">root:1</TrayButton>
<!-- Spacer 0 -->
<Spacer width="2"/>
<!-- Tray button 1 -->
<TrayButton label="_">showdesktop</TrayButton>
<!-- Spacer 1 -->
<Spacer width="2"/>
<Pager labeled="true"/>
<TaskList maxwidth="256"/>
<Swallow width="32" height="32" name="xclock">xclock</Swallow>
<Dock/>
<Clock format="%l:%M %p"><Button mask="123">exec:xclock</Button></Clock>
</Tray> I have tried your tricks and they do not work. Which likely means I am missing something. pub enum Stuff {
//my enums
#[serde(other, deserialize_with = "deserialize_ignore_any")]
Other,
}
pub struct File {
#[serde(rename = "$value")]
pub items:Vec<Stuff>, // still get error "custom: duplicate field `<whatever>`"
} |
I've implemented parsing overlapped sequences in tafia/quick-xml#387 under a feature flag |
Using this code...
...to deserialize this file...
...gives this error...
This doesn't happen if the elements in the root are contiguous, like...
Expected output in both cases is:
Am I doing something wrong here? Is there any way around this? The data I'm working with is formatted like the first example.
The text was updated successfully, but these errors were encountered: