Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async support v2 #417

Closed
wants to merge 7 commits into from
Closed

async support v2 #417

wants to merge 7 commits into from

Conversation

999eagle
Copy link
Contributor

As discussed in #314, this is a new attempt to add support for async reading.

This PR is still WIP and open for discussion on implementation details.

@Mingun
Copy link
Collaborator

Mingun commented Jul 11, 2022

Response to #314 (comment)

Actually, I'll plan to merge similar traits XmlRead and XmlSource into one and use approach with a special intermediate struct SliceReader / IoReader instead of directly implementing XmlSource for BufRead and &[u8]. With that approach it should be possible to have an AsyncSliceReader / AsyncIoReader (which would wrap only a inner reader) for which new traits will be implemented.

So it is good time for you to investigate that surface and how it can be integrated with async support

@999eagle
Copy link
Contributor Author

Okay, I've moved a lot of stuff around to hopefully separate some things better. I've also implemented async support on the existing Reader. The following things have been changed as of now:

  • The Reader is now generic over the parser in use. This separates namespaced access and non-namespaced access through type checking at compile time.
  • To implement this, there is now a Parser-trait, implemented by a DefaultParser and a NamespacedParser.
  • The Reader is now built via a builder-pattern so no changes to the parser settings (like whether whitespace is trimmed or comments/end tags are checked) are possible after being built. This does change the public API and thus is a breaking change but allows the Parser separation and thus eliminates a lot of duplicate code.
  • There is now an AsyncXmlSource trait that's automatically implemented for everything that implements tokio's AsyncBufRead trait.
  • The Reader now has some read_..._async methods. Using the same names for sync and async methods sadly won't work as that would require to tell Rust that no type can ever implement both XmlSource and AsyncXmlSource (or BufRead and AsyncBufRead) at the same time which is (for now) impossible.
  • There is a new test_async test which is rather bare-bones as of now but I'll work on that.
  • Currently, tests with the serialize feature are broken because I haven't changed the deserialization to work with the new Reader yet.

@Mingun
Copy link
Collaborator

Mingun commented Jul 12, 2022

1852c70 is a pretty big change which is hard to review. Could you split this commit into several? At least I can suggest extract introducing builder pattern into a separate commit. Also, can you rebase your work and squash up all fixup changes as much as possible before the finishing work (for example, you've changed AsyncXmlSource trait in the commit where you are actually implement it. Actually, it would be better to introduce this trait in the same commit where you implement it).

Also, some changes seems to a little complicated to me. Actually I was thinking about something like (signatures approximate, just to illustrate the idea):

struct SliceReader<'i>(&'i [u8]);
struct IoReader<R: BufRead>(R);
struct AsyncIoReader<R: AsyncBufRead>(R);

/// The existing Reader struct
pub struct Reader<R> {
  reader: R,
  // ...
}

impl<R> Reader<R> {
  // common parsing code, setup code
}

impl<'i> Reader<SliceReader<'i>> {
  pub fn read_event(&mut self) { /* ... */ }
  pub fn read_event_async(&mut self) { /* ... */ }
}

impl<R: BufRead> Reader<IoReader<R>> {
  pub fn read_event_into(&mut self, buf: &mut Vec<u8>) { /* ... */ }
}

impl<R: AsyncBufRead> Reader<AsyncIoReader<R>> {
  pub fn read_event_into_async(&mut self, buf: &mut Vec<u8>) { /* ... */ }
}

////////////////////////////////////////////////////////////////////////////

/// The struct that I would introduce in my namespaces PR
pub struct NsReader<R> {
  reader: Reader<R>,
  // ...
}

impl<R> NsReader<R> {
  // common parsing code, setup code
}

impl<'i> NsReader<SliceReader<'i>> {
  pub fn read_event(&mut self) { /* ... */ }
  pub fn read_event_async(&mut self) { /* ... */ }
}

impl<R: BufRead> NsReader<IoReader<R>> {
  pub fn read_event_into(&mut self, buf: &mut Vec<u8>) { /* ... */ }
}

impl<R: AsyncBufRead> NsReader<AsyncIoReader<R>> {
  pub fn read_event_into_async(&mut self, buf: &mut Vec<u8>) { /* ... */ }
}

@999eagle
Copy link
Contributor Author

Okay, so you'd prefer having the NsReader<R> wrap the "standard" Reader<R> and then implement the basic parsing functions like that. I think that would work too for async support. Your example code also greatly simplifies reading as the generic bound for the buffer in XmlSource doesn't exist anymore and the different wrapper structs allow for strictly separating sync/async access.
With that approach, I think it's better to abandon my current code entirely and start anew. I'll try again from the current master using your approach.

@999eagle 999eagle force-pushed the feature/async branch 6 times, most recently from ad259c9 to 208d801 Compare July 13, 2022 15:04
@999eagle
Copy link
Contributor Author

@Mingun I think I've reimplemented this now the way you suggested and with minimal unrelated changes. The builder pattern can still be added again in a later PR. I also didn't separate the namespaced read methods in this PR but it should be relatively easy to add.

@codecov-commenter
Copy link

codecov-commenter commented Jul 13, 2022

Codecov Report

Merging #417 (6b35382) into master (a6588c2) will increase coverage by 0.90%.
The diff coverage is 73.44%.

@@            Coverage Diff             @@
##           master     #417      +/-   ##
==========================================
+ Coverage   49.58%   50.48%   +0.90%     
==========================================
  Files          22       26       +4     
  Lines       13935    14429     +494     
==========================================
+ Hits         6909     7284     +375     
- Misses       7026     7145     +119     
Flag Coverage Δ
unittests 50.48% <73.44%> (+0.90%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
benches/macrobenches.rs 0.00% <0.00%> (ø)
benches/microbenches.rs 0.00% <0.00%> (ø)
examples/custom_entities.rs 0.00% <0.00%> (ø)
examples/read_buffered.rs 0.00% <0.00%> (ø)
examples/read_texts.rs 0.00% <0.00%> (ø)
src/writer.rs 49.01% <0.00%> (-0.28%) ⬇️
src/lib.rs 12.74% <16.66%> (+0.41%) ⬆️
src/de/mod.rs 77.38% <50.00%> (+0.09%) ⬆️
src/reader/async_reader.rs 65.66% <65.66%> (ø)
src/reader/io_reader.rs 77.89% <77.89%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a6588c2...6b35382. Read the comment docs.

Copy link
Collaborator

@Mingun Mingun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big change, so now I made only a quick overview. I'll plan to made a second review, probably after my remarks will be addressed.

Fist of all, this variant looks much more simpler and better that a previous attempt. For the maintenance purposes I prefer that each commit was as smaller as possible and focusing on the one thing. Your f9cdf27 commit actually does the much more that just splitting reader. I suggest to split it for a several commits:

  • the one that changes usage of buffered methods in tests and examples into borrowing ones
  • the one that introduces a helper macros which conditionally includes await and async keywords
  • the one that moves code to the separate files
  • the one that actually implements async reader. The adding tests commit then probably can be included in that commit
  • any other that you'll decide to made to keep the review burden low

Please ensure, that after each commit code in the buildable state by running commands that CI run and additionally cargo doc --all-features. If some tests will temporary fail, please include the list of failed tests to the commit message (just copy output of cargo test after "failures:")

Try to keep copies of code low. Use helper functions / traits / macros if needed. For example, the original read_bang_element probably contains a bug, because read are always >=1 but there is read == 0 check, I do not want to copy it over the whole project.

Also add an entry to the Changelog.md.

It also will be a great to run benchmarks for the async code and compare results with the sync code, so it will be a plus if you add these benchmarks.

@dralley, I also would to hear your opinion

src/reader.rs Outdated

#[cfg(feature = "async")]
pub use self::async_reader::AsyncReader;
pub use self::{io_reader::IoReader, slice_reader::SliceReader};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to not group imports

Suggested change
pub use self::{io_reader::IoReader, slice_reader::SliceReader};
pub use self::io_reader::IoReader;
pub use self::slice_reader::SliceReader;

/// loop {
/// match reader.read_event_into(&mut buf) {
/// match reader.read_event() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea to use borrowing reading in tests and examples, but I also want to show the buffering variant somewhere, so the users know, how to use it.

Besides, this is a quite isolated change and I prefer to have it as a separate commit. This will make the history much more clean and allow to focus on really important things when doing commit-by-commit review

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with this, the README does have an example, but it would be helpful for the documentation to have one readily available as well.

src/reader.rs Outdated
use crate::errors::Error;
use crate::reader::{BangType, XmlSource};
use crate::reader::{BangType};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use crate::reader::{BangType};
use crate::reader::BangType;

and in other similar places

Comment on lines 8 to 17
use tokio::{
fs::File,
io::{self, AsyncBufRead, AsyncBufReadExt, AsyncRead, BufReader},
};

use crate::{
events::{BytesText, Event},
name::{QName, ResolveResult},
Error, Result,
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to not group imports

Suggested change
use tokio::{
fs::File,
io::{self, AsyncBufRead, AsyncBufReadExt, AsyncRead, BufReader},
};
use crate::{
events::{BytesText, Event},
name::{QName, ResolveResult},
Error, Result,
};
use tokio::fs::File;
use tokio::io::{self, AsyncBufRead, AsyncBufReadExt, AsyncRead, BufReader};
use crate::events::{BytesText, Event};
use crate::name::{QName, ResolveResult};
use crate::{Error, Result};

Also, could you add a brief documentation for a module?

let available = match self.fill_buf().await {
Ok(n) if n.is_empty() => break,
Ok(n) => n,
Err(ref e) if e.kind() == io::ErrorKind::Interrupted => continue,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this ref required?

Suggested change
Err(ref e) if e.kind() == io::ErrorKind::Interrupted => continue,
Err(e) if e.kind() == io::ErrorKind::Interrupted => continue,

Comment on lines 1 to 12
use std::{
fs::File,
io::{self, BufRead, BufReader, Read},
ops::{Deref, DerefMut},
path::Path,
};

use crate::{
events::{BytesText, Event},
name::{QName, ResolveResult},
Error, Result,
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to not group imports

Suggested change
use std::{
fs::File,
io::{self, BufRead, BufReader, Read},
ops::{Deref, DerefMut},
path::Path,
};
use crate::{
events::{BytesText, Event},
name::{QName, ResolveResult},
Error, Result,
};
use std::fs::File;
use std::io::{self, BufRead, BufReader, Read};
use std::ops::{Deref, DerefMut};
use std::path::Path;
use crate::events::{BytesText, Event};
use crate::name::{QName, ResolveResult};
use crate::{Error, Result};

Also, could you add a brief documentation for a module?

Comment on lines 6 to 10
use crate::{
events::{BytesText, Event},
name::{QName, ResolveResult},
Error, Result,
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to not group imports

Suggested change
use crate::{
events::{BytesText, Event},
name::{QName, ResolveResult},
Error, Result,
};
use crate::events::{BytesText, Event};
use crate::name::{QName, ResolveResult};
use crate::{Error, Result};

Also, could you add a brief documentation for a module?

let mut buf = Vec::new();
let mut count = 0;
loop {
match reader.read_event_into(&mut buf).await.unwrap() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer that function was read_event_into_async, because a reader usually will not see a reader type

#[tokio::test]
async fn test_sample() {
let path = PathBuf::from(env!("CARGO_MANIFEST_DIR"));
let mut reader = Reader::from_file_async(path.join("tests/documents/sample_rss.xml"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use include_bytes! or include_str! unless this test tests the from_file_async method.

Cargo.toml Outdated
@@ -94,6 +97,9 @@ serialize = ["serde"]
## Enables support for recognizing all [HTML 5 entities](https://dev.w3.org/html5/html-author/charref)
escape-html = []

## Enable support for asynchronous reading
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more details here? References to corresponding builder methods would be enough

@Mingun Mingun requested a review from dralley July 13, 2022 19:15
@999eagle 999eagle force-pushed the feature/async branch 2 times, most recently from 3c2b0a6 to 6b35382 Compare July 14, 2022 10:57
@999eagle
Copy link
Contributor Author

Okay, I think I have addressed all your concerns now. Feel free to mark your discussions as resolved where it applies and keep them open otherwise. I've added some docs, updated examples and split the commits as best as I could.

@999eagle 999eagle marked this pull request as ready for review July 14, 2022 12:29
@999eagle 999eagle force-pushed the feature/async branch 2 times, most recently from 9df6f20 to 2af8199 Compare July 15, 2022 11:14
@999eagle
Copy link
Contributor Author

Rebased onto current master to resolve conflicts.

@999eagle 999eagle changed the title WIP: async support v2 async support v2 Jul 15, 2022
@dralley
Copy link
Collaborator

dralley commented Jul 16, 2022

@999eagle Apologies for all the breakages!

Thank you for splitting apart the commits - I actually suggest to go a step further and break off anything in this PR which is independent of the fundamental changes (e.g. maybe commits #1 and #4) into a separate PR that can be merged easily. Likewise if there are fundamental changes which aren't tied directly to the async-ones, we can possibly review and merge that separately.

The motivation is just to reduce the amount of the unmerged work and prevent merge conflicts. There are other fundamental changes being worked on simultaneously right now (see #158) and reconciling the massive patch sets together will not be a pleasant experience unless it is done in smaller pieces.

@999eagle
Copy link
Contributor Author

Hi @dralley, while those two commits are somewhat independent of the other changes, they don't really do anything useful on their own. The first commit is nothing but a requirement for testing the changes of the third commit, which in turn requires an example for explicitly buffered access for files as most examples just read strings or byte slices and thus don't have buffered reading anymore.
In my opinion, it'd make sense to split this PR into two separate PRs at most: Splitting the Reader into the explicit SliceReader and IoReader (that is, commits 1 through 4 from this current PR) for the first PR and all the async-specific implementation in a second PR. If that's what you want to do, tell me and I'll open a new PR for the non-async changes.

@dralley
Copy link
Collaborator

dralley commented Jul 17, 2022

@999eagle Yes, that makes perfect sense, and it would be helpful.

}
}

/// Private functions for a [`Reader`] based on an [`IoReader`].
Copy link
Collaborator

@dralley dralley Jul 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo I believe?

Suggested change
/// Private functions for a [`Reader`] based on an [`IoReader`].
/// Private functions for a [`Reader`] based on a [`SliceReader`].

@dralley
Copy link
Collaborator

dralley commented Jul 17, 2022

Almost done reviewing the first 4 commits (which as you said will be split into the new PR), and it is looking fine so far. I really appreciate all the work you've put in to get this cleaned up.

@999eagle
Copy link
Contributor Author

@dralley I've opened #425 for the first few changes. I've also rebased onto the current master. As this PR is now based on #425, it must be merged afterwards.

@dralley
Copy link
Collaborator

dralley commented Jul 27, 2022

This is relevant: https://blog.rust-lang.org/inside-rust/2022/07/27/keyword-generics.html

Anything that comes of that initiative would be months or years down the road, but it might be worthwhile to provide feedback and examples early in the process.

@Mingun Mingun mentioned this pull request Jul 31, 2022
@Mingun Mingun closed this in #450 Aug 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants