-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds read/write support for v1.1 system symbols #847
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ PR tour 🧭
// Before a reader using `AnyEncoding` begins reading, it expects text Ion v1.0. | ||
// At the outset of the stream, it inspects the first bytes to see if the stream is binary or text. | ||
// If it encounters a version marker, the expected version will change. | ||
const INITIAL_ENCODING_EXPECTED: IonEncoding = IonEncoding::Text_1_0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ When a reader is initialized for a given Decoder
, it needs to know what initial context to set up. For the four encoding-specific decoders, this is trivial; AnyEncoding
is the only one that needs an explicit hint.
reader.next(context_ref).expect_err("Ion 1.1 IVM should return an error."); | ||
return Ok(()) | ||
reader | ||
.next(context_ref) | ||
.expect_err("Ion 1.1 IVM should return an error."); | ||
return Ok(()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ rustfmt
, no logic change
#[case::single_address(&[0xE4, 0x07], 1, 1)] | ||
#[case::two_addresses(&[0xE5, 0x07, 0x09], 1, 2)] | ||
#[case::three_addresses(&[0xE6, 0x07, 0x07, 0x09, 0x0B], 2, 3)] | ||
#[case::single_flex_sym(&[0xE7, 0x07], 1, 1)] | ||
#[case::two_flex_syms(&[0xE8, 0x07, 0x09], 1, 2)] | ||
#[case::three_flex_syms(&[0xE9, 0x07, 0x07, 0x09, 0x0B], 2, 3)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ Previously, this test was only confirming that the annotations sequence parts were the expected length after parsing. I augmented the test to include confirming that the annotations being read were the correct ones.
assert_eq!( | ||
reader | ||
.next(context)? | ||
.expect_value()? | ||
.read()? | ||
.expect_symbol()?, | ||
"".into() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗺️ I rewrote this unit test to loop a list of expected symbols and compare them to the input. This made the test much more concise.
// In order to minimize the changes needed to introduce a second address space | ||
// for symbols in Ion 1.1, system symbol IDs are resolved eagerly and returned | ||
// as `Text`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's pretty much how I tried to do it when I added read support to ion-java
.
RawSymbolRef::SystemSymbol_1_1(_system_symbol) => { | ||
unreachable!("Ion 1.1 symbol in Ion 1.0 writer") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make this provably unreachable with the use of Never
or something like that, or would that add too much complexity in the way of generic parameters that are polluting signatures everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best we could do without pervasive generics is have a separate enum with only SymbolId
/Text
and 'downgrade' to that whenever we enter a 1.0-only code path.
src/raw_symbol_ref.rs
Outdated
} | ||
} | ||
|
||
pub(crate) trait SystemSymbol: Sized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this trait used? I recall seeing SystemSymbol_1_1
used somewhere, but not this trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I had a SystemSymbol
struct that held an (IonVersion, SymbolAddress)
tuple. SystemSymbolTable
instances would return SystemSymbol
s, and RawSymbolRef
had a SystemSymbol
variant that wasn't Ion-1.1-specific.
However, this led to a lot of situations in Ion 1.0 code where it was unclear whether an Ion 1.0 system symbol would be represented as a SymbolId
or a SystemSymbol
. To eliminate that ambiguity, I split SystemSymbol
into SystemSymbol_1_1
and SystemSymbol_1_0
and narrowed RawSymbolRef::SystemSymbol
to RawSymbolRef::SystemSymbol_1_1
. Then I turned the SystemSymbol
type into a trait.
While I think this was a reasonable way to preserve the abstraction, we aren't using the abstraction much in practice right now. In the spirit of YAGNI, I can strip it out for the moment and we can reintroduce it later if/when we have a more concrete use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to having it, necessarily, but if we're adding something that isn't used yet, we should probably document that in the code.
src/types/symbol.rs
Outdated
@@ -98,6 +99,12 @@ impl Symbol { | |||
} | |||
} | |||
|
|||
pub fn system(text: &'static str) -> Symbol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually something that maybe useful for non-system symbols, right? For example, if I'm implementing WriteAsIon
for Ion Schema Language models, there's a whole lot of symbol text that is statically known (i.e. all of the constraint names and other ISL keywords).
(Unfortunately, we can't update impl From<&str> for Symbol
in such a way that it specializes for static and non-static strings, so use of a constructor function is required to specifically construct a symbol with static text.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally named this method static
but... that's a keyword :(
I renamed it static_text
--howzat?
/// A static, read-only map of text to Ion v1.0 system symbol addresses. | ||
/// Because the set of string keys is known at compile time, this map is able to use a | ||
/// perfect hashing function (PHF) to optimize lookup operations for those keys. | ||
pub(crate) static SYSTEM_SYMBOL_TEXT_TO_ID: phf::Map<&str, usize> = phf_map! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL https://docs.rs/phf/latest/phf/index.html
This is pretty neat!
/// Creates a new system symbol if the provided address is non-zero and in range for the | ||
/// corresponding system symbol table. | ||
fn new(symbol_address: SymbolAddress) -> Option<Self> { | ||
match symbol_address { | ||
0 => None, | ||
address if address >= Self::system_symbol_table().len() => None, | ||
address => Some(Self::new_unchecked(address)), | ||
} | ||
} | ||
|
||
/// Constructs a system symbol from the provided address, returning `Ok(symbol)`. | ||
/// If the provided address is zero or out of bounds, returns an `Err`. | ||
#[inline] | ||
fn try_new(symbol_address: SymbolAddress) -> IonResult<Self> { | ||
Self::new(symbol_address).ok_or_else(|| { | ||
ice!(IonError::decoding_error(format!( | ||
"system symbol address {symbol_address} is out of bounds" | ||
))) | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://doc.rust-lang.org/nomicon/constructors.html
In concrete contexts, a type will provide a static
new
method for any kind of "default" constructor. This has no relation tonew
in other languages and has no special meaning. It's just a naming convention.
I was curious about the try_new
here, wondered what I was missing. Apparently it's just a convention amongst libraries? I see the pattern in several others including in the standard library.
Also seen here: https://dev.to/cthutu/rust-5-naming-conventions-3cjf
Nothing to do here, just commenting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently it's just a convention amongst libraries?
Yep. Methods that return a Result<T>
often have a name starting with try_
to distinguish themselves from sibling methods that return T
or Option<T>
. For example, into()
/try_into()
and from()
/try_from()
.
SystemSymbol_1_1
variant ofRawSymbolRef
RawSymbolRef::as_raw_symbol_token_ref
to shorteras_raw_symbol_ref
for consistency.0xEE
opcode, system symbol valuesFlexSym
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.