Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional tooling access methods, trait impls, and a bug fix #780

Merged
merged 3 commits into from
May 31, 2024

Conversation

zslayton
Copy link
Contributor

This PR includes a variety of small fixes and improvements that I found myself needing as I went to upgrade the ion-cli to HEAD. I'd like to include these in the v1.0-rc4 release. I have completed the ion-cli upgrade in my local workspace using this branch, so I'm reasonably confident that these are the last items I need to get in.

  • Renames the RawReaderType enum to IonEncoding because it is generally useful in other contexts too.
  • Adds the ability for system and raw readers using AnyEncoding to report the encoding they're currently using.
  • Adds value_span and annotations_span methods to LazyValue. It already had a span method whose bytes included both the annotations and the value.
  • The binary 1.0 and 1.1 writers will now write f64s as f32s to save space when it can be done losslessly.
  • Adds WriteAsIon implementations for LazyList, LazySExp, and LazyStruct.
  • Adds IntoIterator impls for LazyList, LazySExp and LazyStruct. Previously, they only existed for borrowed (&) references to those types.
  • Fixes a bug in the StreamingRawReader that could cause a value's annotations span to be overwritten if reading that value consumed all of the data remaining in the buffer.
  • Renames the feature-gated LazyValue::lower method (which returns a LazyExpandedValue) to LazyValue::expanded() so I could add a raw() method alongside it that returns the underlying LazyRawValue when applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

* Renames the `RawReaderType` enum to `IonEncoding` because it is
  generally useful in other contexts too.
* Adds the ability for system and raw readers using `AnyEncoding`
  to report the encoding they're currently using.
* Adds `value_span` and `annotations_span` methods to `LazyValue`
  in addition to `span`, which includes both the annotations and
  the value.
* The binary 1.0 and 1.1 writers will now write `f64`s as `f32`s
  to save space when it can be done losslessly.
* Adds `WriteAsIon` implementations for `LazyList`, `LazySExp`,
  and `LazyStruct`.
* Adds `IntoIterator` impls for `LazyList`, `LazySExp` and
  `LazyStruct`. Previously, they only existed for borrowed (`&`)
  references to those types.
* Fixes a bug in the `StreamingRawReader` that could cause
  a value's annotations span to be overwritten if reading that
  value consumed all of the data remaining in the buffer.
* Renames the feature-gated `LazyValue::lower` method (which
  returns a `LazyExpandedValue`) to `LazyValue::expanded()`
  so I could add a `raw()` method alongside it that returns
  the underlying `LazyRawValue` when applicable.
@zslayton zslayton requested review from popematt and nirosys May 31, 2024 16:51
Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR tour

Comment on lines -70 to +68
type ReaderSavedState = RawReaderType;
type ReaderSavedState = IonEncoding;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This type (an enum of the four Ion encodings) is now returned by the new (feature gated) SystemReader<AnyEncoding, _>::detected_encoding() method, allowing users to determine what encoding is currently being read.

Comment on lines +755 to +771
fn annotations_span(&self) -> Span<'top> {
match &self.encoding {
LazyRawValueKind::Text_1_0(v) => v.annotations_span(),
LazyRawValueKind::Binary_1_0(v) => v.annotations_span(),
LazyRawValueKind::Text_1_1(v) => v.annotations_span(),
LazyRawValueKind::Binary_1_1(v) => v.annotations_span(),
}
}

fn value_span(&self) -> Span<'top> {
match &self.encoding {
LazyRawValueKind::Text_1_0(v) => v.value_span(),
LazyRawValueKind::Binary_1_0(v) => v.value_span(),
LazyRawValueKind::Text_1_1(v) => v.value_span(),
LazyRawValueKind::Binary_1_1(v) => v.value_span(),
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The span() method returns a byte slice containing the complete value--both annotations and data.

These methods allow you to access the annotations and data separately.

Comment on lines 982 to 990
let mut expected_encoding = vec![];
let float32 = *value as f32;
if float32 as f64 == *value {
expected_encoding.push(0x5C);
expected_encoding.extend_from_slice(&float32.to_le_bytes()[..]);
} else {
expected_encoding.push(0x5D);
expected_encoding.extend_from_slice(&value.to_le_bytes()[..]);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ We have a lot of unit tests where we re-invent the wheel for the sake of bootstrapping the encoder/decoder. We should look at migrating some of the more painful ones (like this one) to a roundtrip test using our recently established readers and writers.

Comment on lines +299 to +301
List(l) => value_writer.write(l),
SExp(s) => value_writer.write(s),
Struct(s) => value_writer.write(s),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Giving these types their own WriteAsIon implementations streamlined this code path nicely, but my real aim was to make passing them to a writer in other contexts easier.

@@ -60,6 +60,8 @@ pub trait Encoding: Encoder + Decoder {
) -> IonResult<W> {
Self::default_write_config().encode_all_to(output, values)
}

fn encoding(&self) -> IonEncoding;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Types implementing the Encoding trait now must report which variant of the IonEncoding enum they correspond to.

@@ -282,6 +297,15 @@ impl<'a, 'top, 'data: 'top, D: Decoder> IntoIterator for &'a LazySExp<'top, D> {
}
}

impl<'top, 'data: 'top, D: Decoder> IntoIterator for LazySExp<'top, D> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Now that LazyList, LazySExp, and LazyStruct are Copy, it makes sense to allow them to be iterated over without having to borrow them first.

Comment on lines -86 to -89
// If we've exhausted the buffer...
if bytes_read >= available_bytes.len() || matches!(result, Err(IonError::Incomplete(_)))
{
// ...try to pull more data from the data source. If we get more data, try again.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This was a subtle bug enabled by my use of unsafe. If result was Ok(_) but also consumed all of the data in available_bytes, then this code path would have input shift its bytes to the beginning and pull in more data. Because result contained an Ok(value) referring to a slice of that buffer, its data would get clobbered.

@@ -319,6 +352,14 @@ impl<'a> IonInput for StdinLock<'a> {
}
}

impl<T: Read, U: Read> IonInput for io::Chain<T, U> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The ion-cli uses the io::Chain type to link together the first few bytes of an input stream (used to detect gzip, zstd, etc) with the rest of the stream.

Copy link
Contributor

@popematt popematt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed there were some functions where you removed the feature gate, and others where you kept the feature gate. How did you decide for which functions to remove vs retain the feature gate?

src/lazy/encoder/binary/v1_0/value_writer.rs Show resolved Hide resolved
src/lazy/encoder/binary/v1_1/value_writer.rs Outdated Show resolved Hide resolved
src/lazy/streaming_raw_reader.rs Show resolved Hide resolved
src/lazy/streaming_raw_reader.rs Show resolved Hide resolved
@zslayton
Copy link
Contributor Author

I noticed there were some functions where you removed the feature gate, and others where you kept the feature gate. How did you decide for which functions to remove vs retain the feature gate?

Some methods are on a fuzzy line between being reader-writer and tooling-apis. For example, LazyExpandedValue::context returns the EncodingContextRef used to decode that value. It's definitely part of the reader-writer API, but whether it's specifically tooling-access is debatable.

I decided I was ok with removing the feature gate from methods like that because they're still behind some feature gate, so either way there's no guarantee of stability being made.

@zslayton zslayton requested a review from popematt May 31, 2024 19:13
if value == 0f32 && !value.is_sign_negative() {
self.push_byte(0x5A);
return Ok(());
match value.smallest_repr() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we add support for f16, we could also DRY it up like this:

fn write_float_repr(mut self, value: FloatRepr) -> IonResult<()> {
    match value {
        // ...
    }
    Ok(())
}

pub fn write_f16(mut self, value: f16) -> IonResult<()> {
    self.write_float_repr(value.smallest_repr())
}

pub fn write_f32(mut self, value: f32) -> IonResult<()> {
    self.write_float_repr(value.smallest_repr())
}

pub fn write_f64(mut self, value: f64) -> IonResult<()> {
    self.write_float_repr(value.smallest_repr())
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same thing 👍

@zslayton zslayton merged commit 80088b4 into main May 31, 2024
32 checks passed
@zslayton zslayton deleted the rc4-punch-list branch May 31, 2024 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants