Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.3: Bigtable inner transaction plumbing is incompatible with v1.2 #12494

Closed
mvines opened this issue Sep 25, 2020 · 14 comments · Fixed by #12526
Closed

v1.3: Bigtable inner transaction plumbing is incompatible with v1.2 #12494

mvines opened this issue Sep 25, 2020 · 14 comments · Fixed by #12526
Assignees
Milestone

Comments

@mvines
Copy link
Member

mvines commented Sep 25, 2020

I'm pretty sure that the new field added to StoredConfirmedBlockTransactionStatusMeta here:

inner_instructions: Option<Vec<InnerInstructions>>,

will break the deserialization of existing StoredConfirmedBlock entries in BigTable:
// A serialized `StoredConfirmedBlock` is stored in the `block` table
//
// StoredConfirmedBlock holds the same contents as ConfirmedBlock, but is slightly compressed and avoids
// some serde JSON directives that cause issues with bincode
//
#[derive(Serialize, Deserialize)]
struct StoredConfirmedBlock {
previous_blockhash: String,
blockhash: String,
parent_slot: Slot,
transactions: Vec<StoredConfirmedBlockTransaction>,
rewards: Rewards,
block_time: Option<UnixTimestamp>,
}

It's impractical to reprocess the existing BigTable transaction contents, so we'll need two versions of StoredConfirmedBlockTransactionStatusMeta/StoredConfirmedBlock in the v1.3 branch. A legacy version for existing data, and the current version.

@mvines mvines added this to the v1.3.13 milestone Sep 25, 2020
@mvines
Copy link
Member Author

mvines commented Sep 25, 2020

To prevent this from accidentally occurring again, the #[frozen_abi] attribute should be added to all the structs in storage-bigtable/ that are written to BigTable as well

@jstarry
Copy link
Member

jstarry commented Sep 26, 2020

@mvines the whole point of option wrapping inner_instructions was to prevent this issue. The only issue I see is if the warehouse nodes are updated before the api nodes which would cause v1.2 api nodes to try to serialize v1.3 serialized transaction metas which would break.

@CriesofCarrots
Copy link
Contributor

I think inner_instructions might need a #[serde(default)] tag for v1.3 code to be able to deserialize existing StoredConfirmedBlock entries in BigTable.

@jstarry
Copy link
Member

jstarry commented Sep 26, 2020

Got it, fiddling with this now

@mvines
Copy link
Member Author

mvines commented Sep 26, 2020

Here's some easy STR:

$ solana-ledger-tool -l . bigtable block 37147442

Works on v1.2 but master dies with:

$ solana-ledger-tool -l . bigtable block 37147442
[2020-09-26T01:44:29.402773000Z INFO  solana_storage_bigtable::access_token] Requesting token for BigTableData scope
[2020-09-26T01:44:29.594210000Z INFO  solana_storage_bigtable::access_token] Token expires in 3599 seconds
[2020-09-26T01:44:30.652954000Z WARN  solana_storage_bigtable::bigtable] Failed to deserialize blocks/000000000236d332: invalid length 3, expected a multi-byte length
BigTableError(ObjectCorrupt("blocks/000000000236d332"))

@jstarry
Copy link
Member

jstarry commented Sep 26, 2020

This all comes down to our over-reliance on bincode which we aren't using in a future-proof way. The handling of Option wrapped fields that I was expecting from working with serde_json and the #[serde(default)] attribute that @CriesofCarrots suggested both don't work for bincode.

At this point it's difficult to differentiate an old StoredConfirmedBlockTransactionStatusMeta with a new one because the current meta is not versioned and has variable length. We could use a known epoch to indicate the switch over between binary formats or we could fall back to the old struct if the new struct fails to deserialize.

The same issue happens to blockstore. Deserialization will fail when using the solana-ledger-tool and when fetching transaction status over RPC. The RPC deserialization failures would be a problem for data that's accessed right after the switch from the legacy struct because the fallback to bigtable wouldn't be successful for recent data.

The only issue I see is if the warehouse nodes are updated before the api nodes which would cause v1.2 api nodes to try to serialize v1.3 serialized transaction metas which would break.

By the way, this is actually not an issue because old code will ignore extra fields as long as they are added at the end of the struct.

@jstarry
Copy link
Member

jstarry commented Sep 26, 2020

Does this feel ok? It would allow us to continue appending new fields

#[derive(Debug, Serialize, Deserialize)]
struct Test {
    pub field: u64,
    #[serde(deserialize_with = "default_on_eof")]
    pub inner: Option<Inner>,
}


fn default_on_eof<'de, T, D>(d: D) -> Result<Option<T>, D::Error> where D: Deserializer<'de>, T: Deserialize<'de> {
    let result = Option::<T>::deserialize(d);
    match result {
        Err(err) if err.to_string() == "io error: unexpected end of file" => {
            Ok(None)
        },
        result => result,
    }
}

@mvines
Copy link
Member Author

mvines commented Sep 26, 2020

Yep this seems pretty nice to me. Better than wrapping the struct in a versioned enum and a bunch of conversion code.

It'd be nice to check the actual error instead of checking err.to_string().
Also think this can be generalized from Option<T> to the Default trait, so that Ok(None) turns into Default::default()?

@CriesofCarrots
Copy link
Contributor

CriesofCarrots commented Sep 26, 2020

Ah, I see... Thanks for the detail, @jstarry !
I had the same thought that default_on_eof could be made generic for T: Deserialize + Default; then we wouldn't even need to wrap inner_instructions in the Option. Could make this method broadly useful for struct versioning.

@jstarry
Copy link
Member

jstarry commented Sep 27, 2020

deafult_on_eof actually won't work for BigTable because StoredConfirmedBlockTransactionStatusMeta is nested and I think the bincode deserializer would get corrupted when trying to deserialize data with missing fields rather than hit an EOF error. It would help blockstore though

@jstarry
Copy link
Member

jstarry commented Sep 27, 2020

I think the only option is using a known slot to switch to a new versioned or tagged format.

@jstarry
Copy link
Member

jstarry commented Sep 27, 2020

I just dug a bit into the BigTable code and it looks like we could switch over to a new "CellName" for versioned binary data. Instead of "bin", something like "vbin" could work I think.

For versioned binary data we could bite the bullet and switch to protobuf/capnproto or use an enum / add a version number to structs going forward

@jstarry jstarry self-assigned this Sep 28, 2020
@jstarry
Copy link
Member

jstarry commented Sep 28, 2020

My plan is to switch to protobuf to enable more future proof, but still compact data storage in bigtable. I'll add a new cell name called "proto" which will hold the protobuf serialized data. When fetching confirmed blocks, v1.3 code will check first for "proto" but will fall back to "bin" for older blocks. v1.2 code will not know anything about "proto" so we just need to make sure that warehouse nodes are not updated before api nodes.

Rather than add a build script which will require everyone to install protobuf, I'll add generated rust structs from prost into source control as part of a generated module.

@jstarry
Copy link
Member

jstarry commented Sep 28, 2020

FYI I just created a draft PR for what the implementation could look like at #12526

@mergify mergify bot closed this as completed in #12526 Sep 30, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants