-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge consequent text and CDATA events into one string #520
Conversation
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## master #520 +/- ##
==========================================
+ Coverage 61.25% 63.20% +1.94%
==========================================
Files 32 32
Lines 15654 16439 +785
==========================================
+ Hits 9589 10390 +801
+ Misses 6065 6049 -16
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
I've also checked, that XmlBeans is able to parse splitted numbers: 12<!--comment--><![CDATA[34]]> is a valid representation for integer |
I found, that, because of lookahead, we can cache trimmed event and return incorrect results (= trim spaces inside the string). More work needed |
87e2dc6
to
e2ee98a
Compare
Ok, it's ready for review now. Of course, it would be great, if quick-xml will provide an API to read correctly trimmed text events from So I do not provide a low-level API, but provide a |
|
||
### Bug Fixes | ||
|
||
- [#537]: Restore ability to deserialize attributes that represents XML namespace | ||
mappings (`xmlns:xxx`) that was broken since [#490] | ||
- [#520]: Merge consequent (delimited only by comments and processing instructions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this only applies to the serde deserializer, that should probably be noted here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if this is something that isn't practical to handle in the low-level API for the time being, then it would be good to document the limitation. And if it may be possible to do in the future (perhaps requiring API changes to do so) then we should clone #474.
I appreciate the detailed writeup of the issue, by the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clearly indicated that this applies only to the deserializer. The documentation thing probably solved by the last commit in this branch (16fca58)?
16fca58
to
43634b2
Compare
43634b2
to
30636d3
Compare
…ate events to DeEvents Actual conversion will be added in follow up commits
We want to merge consequent text and CDATA content into one text, and because only text parts could be escaped, now we unescape them always
failures (36): de::tests::merge_text::cdata_and_cdata de::tests::merge_text::cdata_and_text de::tests::merge_text::comment_between::cdata de::tests::merge_text::comment_between::cdata_and_cdata de::tests::merge_text::comment_between::cdata_and_text de::tests::merge_text::comment_between::empty_cdata_and_text de::tests::merge_text::comment_between::text de::tests::merge_text::comment_between::text_and_cdata de::tests::merge_text::comment_between::text_and_empty_cdata de::tests::merge_text::empty_cdata_and_text de::tests::merge_text::pi_between::cdata de::tests::merge_text::pi_between::cdata_and_cdata de::tests::merge_text::pi_between::cdata_and_text de::tests::merge_text::pi_between::empty_cdata_and_text de::tests::merge_text::pi_between::text de::tests::merge_text::pi_between::text_and_cdata de::tests::merge_text::pi_between::text_and_empty_cdata de::tests::merge_text::text_and_cdata de::tests::merge_text::text_and_empty_cdata de::tests::triples::cdata::cdata::cdata de::tests::triples::cdata::cdata::end de::tests::triples::cdata::cdata::eof de::tests::triples::cdata::cdata::start de::tests::triples::cdata::cdata::text de::tests::triples::cdata::text::cdata de::tests::triples::cdata::text::end de::tests::triples::cdata::text::eof de::tests::triples::cdata::text::start de::tests::triples::start::cdata::cdata de::tests::triples::start::cdata::text de::tests::triples::start::text::cdata de::tests::triples::text::cdata::cdata de::tests::triples::text::cdata::end de::tests::triples::text::cdata::eof de::tests::triples::text::cdata::start de::tests::triples::text::cdata::text
fixed: de::tests::merge_text::comment_between::text_and_empty_cdata de::tests::merge_text::pi_between::text_and_empty_cdata de::tests::merge_text::text_and_empty_cdata
30636d3
to
ac1ad0c
Compare
Hello, since my question is quite related and issues have been opened for this before, I put my message here, feels more relevant. To be sure, I am only talking here about |
It is possible to add a setting to enable / disable trimming. I'll accept such a PR. To be clear: |
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [quick-xml](https://github.com/tafia/quick-xml) | dependencies | minor | `0.27.1` -> `0.28.0` | --- ### Release Notes <details> <summary>tafia/quick-xml</summary> ### [`v0.28.0`](https://github.com/tafia/quick-xml/blob/HEAD/Changelog.md#​0280----2023-03-13) [Compare Source](tafia/quick-xml@v0.27.1...v0.28.0) ##### New Features - [#​541]: (De)serialize specially named `$text` enum variant in [externally tagged] enums to / from textual content - [#​556]: `to_writer` and `to_string` now accept `?Sized` types - [#​556]: Add new `to_writer_with_root` and `to_string_with_root` helper functions - [#​520]: Add methods `BytesText::inplace_trim_start` and `BytesText::inplace_trim_end` to trim leading and trailing spaces from text events - [#​565]: Allow deserialize special field names `$value` and `$text` into borrowed fields when use serde deserializer - [#​568]: Rename `Writter::inner` into `Writter::get_mut` - [#​568]: Add method `Writter::get_ref` - [#​569]: Rewrite the `Reader::read_event_into_async` as an async fn, making the future `Send` if possible. - [#​571]: Borrow element names (`<element>`) when deserialize with serde. This change allow to deserialize into `HashMap<&str, T>`, for example - [#​573]: Add basic support for async byte writers via tokio's `AsyncWrite`. ##### Bug Fixes - [#​537]: Restore ability to deserialize attributes that represents XML namespace mappings (`xmlns:xxx`) that was broken since [#​490] - [#​510]: Fix an error of deserialization of `Option<T>` fields where `T` is some sequence type (for example, `Vec` or tuple) - [#​540]: Fix a compilation error (probably a rustc bug) in some circumstances. `Serializer::new` and `Serializer::with_root` now accepts only references to `Write`r. - [#​520]: Merge consequent (delimited only by comments and processing instructions) texts and CDATA when deserialize using serde deserializer. `DeEvent::Text` and `DeEvent::CData` events was replaced by `DeEvent::Text` with merged content. The same behavior for the `Reader` does not implemented (yet?) and should be implemented manually - [#​562]: Correctly set minimum required version of memchr dependency to 2.1 - [#​565]: Correctly set minimum required version of tokio dependency to 1.10 - [#​565]: Fix compilation error when build with serde <1.0.139 [externally tagged]: https://serde.rs/enum-representations.html#externally-tagged [#​490]: tafia/quick-xml#490 [#​510]: tafia/quick-xml#510 [#​520]: tafia/quick-xml#520 [#​537]: tafia/quick-xml#537 [#​540]: tafia/quick-xml#540 [#​541]: tafia/quick-xml#541 [#​556]: tafia/quick-xml#556 [#​562]: tafia/quick-xml#562 [#​565]: tafia/quick-xml#565 [#​568]: tafia/quick-xml#568 [#​569]: tafia/quick-xml#569 [#​571]: tafia/quick-xml#571 [#​573]: tafia/quick-xml#573 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS42LjAiLCJ1cGRhdGVkSW5WZXIiOiIzNS42LjAifQ==--> Co-authored-by: cabr2-bot <[email protected]> Co-authored-by: crapStone <[email protected]> Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1818 Reviewed-by: crapStone <[email protected]> Co-authored-by: Calciumdibromid Bot <[email protected]> Co-committed-by: Calciumdibromid Bot <[email protected]>
This PR fixes #474 and introduces a way to read current parser configuration, which was impossible before that.
I've changed the way how configuration is accessed and changed: instead of having functions to change configuration flags, readers now provides a reference to a
Config
object. Immutable and mutable references are provided. This new feature is used to temporary disable trimming while read text events in serdeDeserializer
.After fixing #516, all configuration flags are safe to changed at any time, because their does not change the internal state of a reader in a user-visible way (for example, the
expand_empty_elements
changes an internal state of a reader, but that change is rolled back after next call toread_event
, so user cannot see it consequences. It is safe to disable this setting just after read fakeStart
event and still get a fakeEnd
event after that).