-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document a couple of recommended patterns of usage #269
Comments
I agree more documentation is always better. I am not sure I'll find time to write it soon but in a sketch:
fn parse_items<R>(reader: R) -> Result<Vec<(String, String, Vec<String>)>, Error> {
#[derive(Debug)]
enum State {
Start,
Level0,
Level1(String),
Level2(String, String, Vec<String>),
}
let mut items = Vec::new();
let mut state = State::Start;
let mut buf = Vec::new();
let mut txt_but = Vec::new();
fn att_to_string(reader: &Reader<R>, event: BytesStart, name: &[u8]) -> Result<String, Error> {
for a in event.attributes() {
let a = a?;
if a.key == name {
return Ok(a.unescape_and_decode_value(reader)?);
}
}
Ok(String::new())
}
loop {
state = match (state, reader.read_event(buf)?) {
(State::Start, Event::Start(e)) if e.name == b"level0" => State::Level0,
(State::Level0, Event::Start(e)) if e.name == b"level1" => {
State::Level1(att_to_string(reader, event, b"attr1")?)
}
(State::Level1(att1), Event::Start(e)) if e.name == b"level2" => {
State::Level2(att1, att_to_string(reader, event, b"attr2")?, Vec::new())
}
(State::Level2(att1, att2, lev3), Event::Start(e)) if e.name == b"level3" => {
lev3.push(reader.read_text(b"level3", &mut txt_buf)?);
txt_buf.clear();
State::Level2(att1, att2, lev3)
}
(State::Level2(att1, att2, lev3), Event::End(e)) if e.name() == b"level2" => {
items.push((att1.clone(), att2, lev3)); // flatten level1
State::Level1(att1)
}
(State::Level1(_), Event::End(e)) if e.name() == b"level1" => {
State::Level0
}
(State::Level0, Event::End(e)) if e.name() == b"level0" => return Ok(items),
(state, Event::Eof) => return Err(Error::UnexpectedEof(state)),
state => state,
};
buf.clear();
}
}
In terms of occurrence I believe 1 >> 2 >> 3. Thank you also for the sidenote, these functions are indeed very common and we would benefit having them implemented by default. |
Thanks, that is helpful! What about quick-xml without a state machine, just nested readers? I've seen a couple of projects doing it, and it's the way my code is written atm, but are there downsides? I haven't gotten around to strict validation or anything like that yet, if that is where it becomes helpful. https://github.com/dralley/rpmrepo_rs/blob/master/src/metadata/repomd.rs#L235-L319 |
Nested readers are good when there are really lot of levels. I find them more complicated than simple state machines but this is subjective (matching the state and the event at once really shows what we're expecting). One potential drawback of nested parsers is that it is hard to reuse the same |
I am interested by your comment that performance intensive code would be better served using Reader/Writer APIs rather than Serde. I have been using Serde for speed of development but am coming to realize that I should probably use these lower-level tools instead (the objects themselves are fairly simple but it's performance critical). However, I have a very large number of things that have to be parsed (the full protocol includes probably ~100). Do you have recommendations for implementing these Additionally, does something spring to mind for good patterns when writing with nested elements? |
Hi, I've started using this library for a personal project, and I've found that it's difficult to figure out how my code should be structured. I think it would be great if there were some docs that were a little more prescriptive about certain patterns you can use to accomplish certain goals (such as when you might want to use a state machine) or to provide clean abstractions in a larger non-trivial codebase.
One example: A pattern like this is really great for parsing nested objects using nested readers. (The provided nested reader example uses no abstractions - if you try to make it more sophisticated than it already is it would get messy very quickly).
Another example could be the state machine pattern used in this blog post: https://usethe.computer/posts/14-xmhell.html. The issue68 example is somewhat similar but the namespaces make it more difficult to understand what the general case might look like.
If you're not keen on putting too much detail in the quick-xml docs, then maybe just linking to a few projects / blog posts which use quick-xml "well" would be a good idea, or explain some of the general principles.
A sidenote:
Nearly every project I've looked at has some kind of implementation of get_element_text or get_attribute (#146) or write_text_element. I actually think it might be a good idea to include them in the library outright, but otherwise, showing some basic helpers like these in the examples would be great as well.
The text was updated successfully, but these errors were encountered: