-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional tooling access methods, trait impls, and a bug fix #780
Conversation
* Renames the `RawReaderType` enum to `IonEncoding` because it is generally useful in other contexts too. * Adds the ability for system and raw readers using `AnyEncoding` to report the encoding they're currently using. * Adds `value_span` and `annotations_span` methods to `LazyValue` in addition to `span`, which includes both the annotations and the value. * The binary 1.0 and 1.1 writers will now write `f64`s as `f32`s to save space when it can be done losslessly. * Adds `WriteAsIon` implementations for `LazyList`, `LazySExp`, and `LazyStruct`. * Adds `IntoIterator` impls for `LazyList`, `LazySExp` and `LazyStruct`. Previously, they only existed for borrowed (`&`) references to those types. * Fixes a bug in the `StreamingRawReader` that could cause a value's annotations span to be overwritten if reading that value consumed all of the data remaining in the buffer. * Renames the feature-gated `LazyValue::lower` method (which returns a `LazyExpandedValue`) to `LazyValue::expanded()` so I could add a `raw()` method alongside it that returns the underlying `LazyRawValue` when applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour
type ReaderSavedState = RawReaderType; | ||
type ReaderSavedState = IonEncoding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This type (an enum of the four Ion encodings) is now returned by the new (feature gated) SystemReader<AnyEncoding, _>::detected_encoding()
method, allowing users to determine what encoding is currently being read.
fn annotations_span(&self) -> Span<'top> { | ||
match &self.encoding { | ||
LazyRawValueKind::Text_1_0(v) => v.annotations_span(), | ||
LazyRawValueKind::Binary_1_0(v) => v.annotations_span(), | ||
LazyRawValueKind::Text_1_1(v) => v.annotations_span(), | ||
LazyRawValueKind::Binary_1_1(v) => v.annotations_span(), | ||
} | ||
} | ||
|
||
fn value_span(&self) -> Span<'top> { | ||
match &self.encoding { | ||
LazyRawValueKind::Text_1_0(v) => v.value_span(), | ||
LazyRawValueKind::Binary_1_0(v) => v.value_span(), | ||
LazyRawValueKind::Text_1_1(v) => v.value_span(), | ||
LazyRawValueKind::Binary_1_1(v) => v.value_span(), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The span()
method returns a byte slice containing the complete value--both annotations and data.
These methods allow you to access the annotations and data separately.
let mut expected_encoding = vec![]; | ||
let float32 = *value as f32; | ||
if float32 as f64 == *value { | ||
expected_encoding.push(0x5C); | ||
expected_encoding.extend_from_slice(&float32.to_le_bytes()[..]); | ||
} else { | ||
expected_encoding.push(0x5D); | ||
expected_encoding.extend_from_slice(&value.to_le_bytes()[..]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ We have a lot of unit tests where we re-invent the wheel for the sake of bootstrapping the encoder/decoder. We should look at migrating some of the more painful ones (like this one) to a roundtrip test using our recently established readers and writers.
List(l) => value_writer.write(l), | ||
SExp(s) => value_writer.write(s), | ||
Struct(s) => value_writer.write(s), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Giving these types their own WriteAsIon
implementations streamlined this code path nicely, but my real aim was to make passing them to a writer in other contexts easier.
@@ -60,6 +60,8 @@ pub trait Encoding: Encoder + Decoder { | |||
) -> IonResult<W> { | |||
Self::default_write_config().encode_all_to(output, values) | |||
} | |||
|
|||
fn encoding(&self) -> IonEncoding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Types implementing the Encoding
trait now must report which variant of the IonEncoding
enum they correspond to.
@@ -282,6 +297,15 @@ impl<'a, 'top, 'data: 'top, D: Decoder> IntoIterator for &'a LazySExp<'top, D> { | |||
} | |||
} | |||
|
|||
impl<'top, 'data: 'top, D: Decoder> IntoIterator for LazySExp<'top, D> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Now that LazyList
, LazySExp
, and LazyStruct
are Copy
, it makes sense to allow them to be iterated over without having to borrow them first.
// If we've exhausted the buffer... | ||
if bytes_read >= available_bytes.len() || matches!(result, Err(IonError::Incomplete(_))) | ||
{ | ||
// ...try to pull more data from the data source. If we get more data, try again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ This was a subtle bug enabled by my use of unsafe
. If result
was Ok(_)
but also consumed all of the data in available_bytes
, then this code path would have input
shift its bytes to the beginning and pull in more data. Because result
contained an Ok(value)
referring to a slice of that buffer, its data would get clobbered.
@@ -319,6 +352,14 @@ impl<'a> IonInput for StdinLock<'a> { | |||
} | |||
} | |||
|
|||
impl<T: Read, U: Read> IonInput for io::Chain<T, U> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ The ion-cli
uses the io::Chain
type to link together the first few bytes of an input stream (used to detect gzip, zstd, etc) with the rest of the stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed there were some functions where you removed the feature gate, and others where you kept the feature gate. How did you decide for which functions to remove vs retain the feature gate?
Some methods are on a fuzzy line between being I decided I was ok with removing the feature gate from methods like that because they're still behind some feature gate, so either way there's no guarantee of stability being made. |
if value == 0f32 && !value.is_sign_negative() { | ||
self.push_byte(0x5A); | ||
return Ok(()); | ||
match value.smallest_repr() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we add support for f16, we could also DRY it up like this:
fn write_float_repr(mut self, value: FloatRepr) -> IonResult<()> {
match value {
// ...
}
Ok(())
}
pub fn write_f16(mut self, value: f16) -> IonResult<()> {
self.write_float_repr(value.smallest_repr())
}
pub fn write_f32(mut self, value: f32) -> IonResult<()> {
self.write_float_repr(value.smallest_repr())
}
pub fn write_f64(mut self, value: f64) -> IonResult<()> {
self.write_float_repr(value.smallest_repr())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same thing 👍
This PR includes a variety of small fixes and improvements that I found myself needing as I went to upgrade the
ion-cli
toHEAD
. I'd like to include these in the v1.0-rc4 release. I have completed theion-cli
upgrade in my local workspace using this branch, so I'm reasonably confident that these are the last items I need to get in.RawReaderType
enum toIonEncoding
because it is generally useful in other contexts too.AnyEncoding
to report the encoding they're currently using.value_span
andannotations_span
methods toLazyValue
. It already had aspan
method whose bytes included both the annotations and the value.f64
s asf32
s to save space when it can be done losslessly.WriteAsIon
implementations forLazyList
,LazySExp
, andLazyStruct
.IntoIterator
impls forLazyList
,LazySExp
andLazyStruct
. Previously, they only existed for borrowed (&
) references to those types.StreamingRawReader
that could cause a value's annotations span to be overwritten if reading that value consumed all of the data remaining in the buffer.LazyValue::lower
method (which returns aLazyExpandedValue
) toLazyValue::expanded()
so I could add araw()
method alongside it that returns the underlyingLazyRawValue
when applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.