Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change serialization crate #105

Closed
pwinckles opened this issue Jul 27, 2020 · 9 comments
Closed

Change serialization crate #105

pwinckles opened this issue Jul 27, 2020 · 9 comments
Assignees

Comments

@pwinckles
Copy link

pwinckles commented Jul 27, 2020

First, thanks for writing this crate! I tried it after being disappointed with the get object performance in rusoto, and your implementation is significantly faster*.

Is your feature request related to a problem? Please describe.

Listing objects does not work against at least one non-AWS S3 implementation that I tested. rust-s3 fails to deserialize the results because the response XML contained Contents elements interleaved with CommonPrefixes elements. This is valid XML, however, it is not supported by serde.

rusoto supports this format through its own custom deserializer that you can see in this file, search for "impl ListObjectsV2OutputDeserializer".

Describe the solution you'd like

I fiddled around with it for a while, and was eventually able to get it to work by changing the serialization library to yaserde, a serde fork that specializes in XML.

I am happy to submit a PR using yaserde, but I'm not going to waste my time if there is no interest in changing crates. I understand that is likely not an issue when interacting with AWS S3, and is therefore unlikely to be very high priority.

Describe alternatives you've considered

I also tried using quick-xml integrated with serde, but it doesn't work either because the limitation is in the core serde crate.

[Edit] *Turns out the performance issues I noticed with rusoto are only present in debug builds. rusoto and rust-s3 perform similarly when rusoto is compiled as release.

@durch
Copy link
Owner

durch commented Jul 27, 2020

@pwinckles Thank you for the feedback, it feels good to provide value to folks :)

As for changing the crate, I'm not at all opposed. Its been a while since any work went into serialisation, and if another crate works better that is great. That goes double if you're interested in PRing the feature.

I'd suggest starting with the minimum change needed and putting it behind a feature flag. This would allow folks to opt out of an extra dependency if they're otherwise heavily invested into serde.

@pwinckles
Copy link
Author

@durch Cool, I'll put a PR together sometime this week using a feature flag.

There is a slight catch that you may be less excited about. yaserde does not presently support fields defined as Option<Vec<..>>, which is how common_prefixes is defined. The recommendation is to simply define the field as Vec<..>. I had to change the common_prefixes definition to get it to work. Leaving off the Option makes sense to me, but it would be a breaking change to your API.

@durch
Copy link
Owner

durch commented Jul 28, 2020

No worries, it is what it is, as long as its behind a feature flag we should be good :)

@pwinckles
Copy link
Author

@durch Sorry, I'm going to back out of doing this now. I was able to get the performance issues with Rusoto resolved, so I'm just going to use it because it's the path of least resistance.

Feel free to close this issue. I doubt anyone else cares because I suspect that AWS doesn't return interleaved results.

@durch
Copy link
Owner

durch commented Jul 30, 2020

No worries, I'll look into it, while it is a niche need no reason for it no to be served :)

@pwinckles
Copy link
Author

Okay, then to save you a little time because yaserde's documentation is lacking, here's what my ListBucketResult looks like:

/// The parsed result of a s3 bucket listing
#[derive(Deserialize, YaDeserialize, Debug, Clone)]
#[yaserde(root = "ListBucketResult", prefix = "s3", namespace = "s3: http://s3.amazonaws.com/doc/2006-03-01/")]
pub struct ListBucketResult {
    #[serde(rename = "Name")]
    #[yaserde(prefix = "s3", rename = "Name")]
    /// Name of the bucket.
    pub name: String,
    #[serde(rename = "NextMarker")]
    #[yaserde(prefix = "s3", rename = "NextMarker")]
    /// When the response is truncated (that is, the IsTruncated element value in the response
    /// is true), you can use the key name in this field as a marker in the subsequent request
    /// to get next set of objects. Amazon S3 lists objects in UTF-8 character encoding in
    /// lexicographical order.
    pub next_marker: Option<String>,
    #[serde(rename = "Delimiter")]
    #[yaserde(prefix = "s3", rename = "Delimiter")]
    /// A delimiter is a character you use to group keys.
    pub delimiter: Option<String>,
    #[serde(rename = "MaxKeys")]
    #[yaserde(prefix = "s3", rename = "MaxKeys")]
    /// Sets the maximum number of keys returned in the response body.
    pub max_keys: i32,
    #[serde(rename = "Prefix")]
    #[yaserde(prefix = "s3", rename = "Prefix")]
    /// Limits the response to keys that begin with the specified prefix.
    pub prefix: String,
    #[serde(rename = "Marker")]
    #[yaserde(prefix = "s3", rename = "Marker")]
    /// Indicates where in the bucket listing begins. Marker is included in the response if
    /// it was sent with the request.
    pub marker: Option<String>,
    #[serde(rename = "EncodingType")]
    #[yaserde(prefix = "s3", rename = "EncodingType")]
    /// Specifies the encoding method to used
    pub encoding_type: Option<String>,
    #[serde(rename = "IsTruncated", deserialize_with = "super::deserializer::bool_deserializer")]
    #[yaserde(prefix = "s3", rename = "IsTruncated")]
    ///  Specifies whether (true) or not (false) all of the results were returned.
    ///  If the number of results exceeds that specified by MaxKeys, all of the results
    ///  might not be returned.
    pub is_truncated: bool,
    #[serde(rename = "NextContinuationToken", default)]
    #[yaserde(prefix = "s3", rename = "NextContinuationToken")]
    pub next_continuation_token: Option<String>,
    #[serde(rename = "Contents", default)]
    #[yaserde(prefix = "s3", rename = "Contents")]
    /// Metadata about each object returned.
    pub contents: Vec<Object>,
    #[serde(rename = "CommonPrefixes", default)]
    #[yaserde(prefix = "s3", rename = "CommonPrefixes")]
    /// All of the keys rolled up into a common prefix count as a single return when
    /// calculating the number of returns.
    pub common_prefixes: Vec<CommonPrefix>,
}

As far as I could tell, every element must have its namespace explicitly defined, and every struct must be annotated with #[yaserde(prefix = "s3", namespace = "s3: http://s3.amazonaws.com/doc/2006-03-01/")]. I was unable to find a way to default it.

@durch
Copy link
Owner

durch commented Sep 25, 2022

@pwinckles could you please check if we're still seeing issues in 0.33, there have been some improvements to the deserialisation crate

@pwinckles
Copy link
Author

@durch Unfortunately, I am not able to verify the behavior at this time. I am no longer working for the organization that was using the non-AWS S3 implementation that was returning the interleaved elements. Sorry!

@durch
Copy link
Owner

durch commented Sep 26, 2022

@pwinckles no worries, thank you :), will close this for now

@durch durch closed this as completed Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants