Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(blockifier_reexecution): serialize reexecution data to file #1762

Merged
merged 4 commits into from
Nov 7, 2024

Conversation

aner-starkware
Copy link
Contributor

No description provided.

@aner-starkware aner-starkware self-assigned this Nov 3, 2024
@reviewable-StarkWare
Copy link

This change is Reviewable

Copy link

github-actions bot commented Nov 3, 2024

Artifacts upload triggered. View details here

Copy link

codecov bot commented Nov 3, 2024

Codecov Report

Attention: Patch coverage is 0% with 97 lines in your changes missing coverage. Please review.

Project coverage is 41.71%. Comparing base (e3165c4) to head (50ad0f4).
Report is 238 commits behind head on main.

Files with missing lines Patch % Lines
crates/blockifier_reexecution/src/main.rs 0.00% 62 Missing ⚠️
...s/blockifier_reexecution/src/state_reader/utils.rs 0.00% 31 Missing ⚠️
crates/blockifier/src/state/cached_state.rs 0.00% 3 Missing ⚠️
...ution/src/state_reader/reexecution_state_reader.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1762      +/-   ##
==========================================
+ Coverage   40.10%   41.71%   +1.60%     
==========================================
  Files          26      197     +171     
  Lines        1895    23369   +21474     
  Branches     1895    23369   +21474     
==========================================
+ Hits          760     9748    +8988     
- Misses       1100    13164   +12064     
- Partials       35      457     +422     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Nov 3, 2024

Artifacts upload triggered. View details here

Copy link

github-actions bot commented Nov 3, 2024

Artifacts upload triggered. View details here

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @aner-starkware and @AvivYossef-starkware)


crates/blockifier/src/state/cached_state.rs line 302 at r1 (raw file):

        Ok(self.cache.borrow().initial_reads.clone())
    }
}

can you solve this by injecting more code into the test state reader?
i.e. every get_X should check if this key was read before, and if not, populate self? like you do with compiled classes?

I do not want to touch blockifier code; this is only useful for reexecution

Code quote:

#[cfg(any(feature = "testing", test))]
impl<S: StateReader> CachedState<S> {
    pub fn get_initial_reads(&self) -> StateResult<StateMaps> {
        Ok(self.cache.borrow().initial_reads.clone())
    }
}

crates/blockifier_reexecution/src/main.rs line 104 at r1 (raw file):

        } => {
            let directory_path = directory_path
                .unwrap_or(format!("./crates/blockifier_reexecution/tmp/block_{block_number}/"));
  1. why give a temporary name?
  2. I suggest putting this in a resources dir; like the read_from_json function expects. blockifier has such a directory for artifacts

Code quote:

tmp

crates/blockifier_reexecution/src/main.rs line 121 at r1 (raw file):

            let transactions_next_block = next_block_state_reader.get_all_txs_in_block().unwrap();

            let blockifier_transactions_next_block = &last_block_state_reader

why are the next txs taken from the last block?

Code quote:

let blockifier_transactions_next_block = &last_block_state_reader

crates/blockifier_reexecution/src/main.rs line 156 at r1 (raw file):

                "RPC replies required for reexecuting block {block_number} written to json file."
            );
        }

code duplication with the non-dump command... please extract to a function that accepts the input data + boolean named dump_mode

Code quote:

        } => {
            let directory_path = directory_path
                .unwrap_or(format!("./crates/blockifier_reexecution/tmp/block_{block_number}/"));
            let config = RpcStateReaderConfig {
                url: node_url,
                json_rpc_version: JSON_RPC_VERSION.to_string(),
            };

            let ConsecutiveTestStateReaders { last_block_state_reader, next_block_state_reader } =
                ConsecutiveTestStateReaders::new(BlockNumber(block_number - 1), Some(config), true);

            let block_info_next_block = next_block_state_reader.get_block_info().unwrap();

            let starknet_version = next_block_state_reader.get_starknet_version().unwrap();

            let state_diff_next_block = next_block_state_reader.get_state_diff().unwrap();

            let transactions_next_block = next_block_state_reader.get_all_txs_in_block().unwrap();

            let blockifier_transactions_next_block = &last_block_state_reader
                .api_txs_to_blockifier_txs(transactions_next_block.clone())
                .unwrap();

            let mut transaction_executor = last_block_state_reader
                .get_transaction_executor(
                    next_block_state_reader.get_block_context().unwrap(),
                    None,
                )
                .unwrap();

            transaction_executor.execute_txs(blockifier_transactions_next_block);

            let block_state = transaction_executor.block_state.unwrap();
            let initial_reads = block_state.get_initial_reads().unwrap();

            let contract_class_mapping =
                block_state.state.get_contract_class_mapping_dumper().unwrap();

            let serializable_offline_reexecution_data = SerializableOfflineReexecutionData {
                state_maps: initial_reads.into(),
                block_info_next_block,
                starknet_version,
                transactions_next_block,
                contract_class_mapping,
                state_diff_next_block,
            };

            serializable_offline_reexecution_data
                .write_to_file(&directory_path, "reexecution_data.json")
                .unwrap();

            println!(
                "RPC replies required for reexecuting block {block_number} written to json file."
            );
        }

Copy link

github-actions bot commented Nov 4, 2024

Artifacts upload triggered. View details here

Copy link
Contributor Author

@aner-starkware aner-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 6 files reviewed, 4 unresolved discussions (waiting on @AvivYossef-starkware and @dorimedini-starkware)


crates/blockifier_reexecution/src/main.rs line 104 at r1 (raw file):

Previously, dorimedini-starkware wrote…
  1. why give a temporary name?
  2. I suggest putting this in a resources dir; like the read_from_json function expects. blockifier has such a directory for artifacts

Done. It's just a question of the behavior for the default path - should it ben written in a "real" location or a temporary one?


crates/blockifier_reexecution/src/main.rs line 121 at r1 (raw file):

Previously, dorimedini-starkware wrote…

why are the next txs taken from the last block?

See my answer in previous PR.


crates/blockifier_reexecution/src/main.rs line 156 at r1 (raw file):

Previously, dorimedini-starkware wrote…

code duplication with the non-dump command... please extract to a function that accepts the input data + boolean named dump_mode

I added a TODO. I think it's better to delay this refactoring until this PR (when it's ready): https://reviewable.io/reviews/starkware-libs/sequencer/1764.
The reason is that the command there has even more similar code, so this would avoid double refactoring. Also, this is not exactly code duplication, so I think it's better that the function corresponds to the other two commands, which are much more similar.

@aner-starkware aner-starkware force-pushed the aner/serialize_reexecution_data branch 2 times, most recently from 0b22118 to 77ed6ee Compare November 4, 2024 09:44
Copy link

github-actions bot commented Nov 4, 2024

Artifacts upload triggered. View details here

Copy link

github-actions bot commented Nov 4, 2024

Artifacts upload triggered. View details here

@aner-starkware
Copy link
Contributor Author

crates/blockifier/src/state/cached_state.rs line 302 at r1 (raw file):

Previously, dorimedini-starkware wrote…

can you solve this by injecting more code into the test state reader?
i.e. every get_X should check if this key was read before, and if not, populate self? like you do with compiled classes?

I do not want to touch blockifier code; this is only useful for reexecution

Switched the flag, as we agreed.

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @aner-starkware and @AvivYossef-starkware)


crates/blockifier/Cargo.toml line 13 at r3 (raw file):

[features]
blockifier_reexecution = []

you are in the blockifier crate: no need for this prefix in the feature name imo

Suggestion:

reexecution

crates/blockifier_reexecution/src/main.rs line 121 at r1 (raw file):

Previously, aner-starkware wrote…

See my answer in previous PR.

still relevant (right?)


crates/blockifier_reexecution/src/main.rs line 156 at r1 (raw file):

Previously, aner-starkware wrote…

I added a TODO. I think it's better to delay this refactoring until this PR (when it's ready): https://reviewable.io/reviews/starkware-libs/sequencer/1764.
The reason is that the command there has even more similar code, so this would avoid double refactoring. Also, this is not exactly code duplication, so I think it's better that the function corresponds to the other two commands, which are much more similar.

I think the other way around:

  1. open a PR to extract common logic to a separate function (replacing most of the Command::RpcTest implementation)
  2. change this current PR to use the shared code
  3. in PR 1764 also use common code

@aner-starkware aner-starkware reopened this Nov 4, 2024
@aner-starkware aner-starkware changed the base branch from aner/serialize_reexecution_data to main November 4, 2024 14:36
Copy link

github-actions bot commented Nov 4, 2024

Artifacts upload triggered. View details here

@aner-starkware
Copy link
Contributor Author

Previously, dorimedini-starkware wrote…

why do I see diffs from here

Is it fixed?

Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 5 files at r9, 2 of 2 files at r12, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @AvivYossef-starkware)

a discussion (no related file):

Previously, aner-starkware wrote…

Is it fixed?

yes, thanks


Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 8 files reviewed, 2 unresolved discussions (waiting on @aner-starkware and @AvivYossef-starkware)

a discussion (no related file):
please rebase


Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

aner-starkware and others added 3 commits November 7, 2024 14:02
* chore(blockifier_reexecution): offline reexecution from file

* chore(blockifier_reexecution): use pre_process_and_create
Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

Copy link
Contributor Author

@aner-starkware aner-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 2 unresolved discussions (waiting on @AvivYossef-starkware and @dorimedini-starkware)

a discussion (no related file):

Previously, dorimedini-starkware wrote…

please rebase

Done.


Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r14, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @aner-starkware and @AvivYossef-starkware)


crates/blockifier_reexecution/src/main.rs line 64 at r14 (raw file):

        // "./crates/blockifier_reexecution/resources/block_{block_number}".
        #[clap(long, short = 'd', default_value = None)]
        directory_path: Option<String>,

Suggestion:

        // Directory path to json files. Default:
        // "./crates/blockifier_reexecution/resources/block_{block_number}".
        // TODO: None should indicate using the bucket. Maybe need a commit_id arg.
        #[clap(long, short = 'd', default_value = None)]
        directory_path: Option<String>,
    },

    // Reexecutes the block from JSON files.
    ReexecuteBlock {
        /// Block number.
        #[clap(long, short = 'b')]
        block_number: u64,

        // Directory path to json files. Default:
        // "./crates/blockifier_reexecution/resources/block_{block_number}".
        // TODO: None should indicate using the bucket. Maybe need a commit_id arg.
        #[clap(long, short = 'd', default_value = None)]
        directory_path: Option<String>,

crates/blockifier_reexecution/src/main.rs line 141 at r14 (raw file):

            let full_file_path = directory_path.unwrap_or(format!(
                "./crates/blockifier_reexecution/resources/block_{block_number}"
            )) + "/reexecution_data.json";

Suggestion:

            let full_file_path = directory_path.unwrap_or(format!(
                "./crates/blockifier_reexecution/resources/block_{block_number}/reexecution_data.json"
            ));

Copy link
Contributor Author

@aner-starkware aner-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @AvivYossef-starkware and @dorimedini-starkware)


crates/blockifier_reexecution/src/main.rs line 64 at r14 (raw file):

        // "./crates/blockifier_reexecution/resources/block_{block_number}".
        #[clap(long, short = 'd', default_value = None)]
        directory_path: Option<String>,

Why? None just indicates using the default directory; The bucket will be downloaded to this directory using the script.


crates/blockifier_reexecution/src/main.rs line 141 at r14 (raw file):

            let full_file_path = directory_path.unwrap_or(format!(
                "./crates/blockifier_reexecution/resources/block_{block_number}"
            )) + "/reexecution_data.json";

It's different functionality - the first way you only need to specify the directory, this way you need to also need to specify the file name

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @aner-starkware and @AvivYossef-starkware)


crates/blockifier_reexecution/src/main.rs line 64 at r14 (raw file):

Previously, aner-starkware wrote…

Why? None just indicates using the default directory; The bucket will be downloaded to this directory using the script.

what if I have the files already, and I want to point to that location?
I was thinking there would be two (mutually exclusive) options:

  1. provide an explicit path, and the CLI will assume the relevant file already exists there
  2. provide a git commit, and the CLI will fetch the files from the bucket (to /tmp or something)

wdyt?


crates/blockifier_reexecution/src/main.rs line 141 at r14 (raw file):

Previously, aner-starkware wrote…

It's different functionality - the first way you only need to specify the directory, this way you need to also need to specify the file name

wdym? reexecution_data.json is still part of the string

Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

Copy link
Contributor Author

@aner-starkware aner-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 9 files reviewed, 2 unresolved discussions (waiting on @AvivYossef-starkware and @dorimedini-starkware)


crates/blockifier_reexecution/src/main.rs line 64 at r14 (raw file):

Previously, dorimedini-starkware wrote…

what if I have the files already, and I want to point to that location?
I was thinking there would be two (mutually exclusive) options:

  1. provide an explicit path, and the CLI will assume the relevant file already exists there
  2. provide a git commit, and the CLI will fetch the files from the bucket (to /tmp or something)

wdyt?

Done.


crates/blockifier_reexecution/src/main.rs line 141 at r14 (raw file):

Previously, dorimedini-starkware wrote…

wdym? reexecution_data.json is still part of the string

In the second option, if you specify the location yourself, you also need to specify the file name.

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r16, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aner-starkware and @AvivYossef-starkware)


crates/blockifier_reexecution/src/main.rs line 141 at r14 (raw file):

Previously, aner-starkware wrote…

In the second option, if you specify the location yourself, you also need to specify the file name.

ah, yes
if you specify a path, I'm assuming you're not downloading a file - the user can (and should) supply the full path in this case

@aner-starkware
Copy link
Contributor Author

crates/blockifier_reexecution/src/main.rs line 141 at r14 (raw file):

Previously, dorimedini-starkware wrote…

ah, yes
if you specify a path, I'm assuming you're not downloading a file - the user can (and should) supply the full path in this case

Done.

#1814)

* chore(blockifier_reexecution): refactor cli to reduce code duplication

* chore(blockifier_reexecution): explicitly state command args

* chore(blockifier_reexecution): separate old block hash

* chore(blockifier_reexecution): move reexecution to seperate function

* test(blockifier_reexecution): reexecution test on blocks
Copy link

github-actions bot commented Nov 7, 2024

Artifacts upload triggered. View details here

Copy link
Collaborator

@dorimedini-starkware dorimedini-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r17, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aner-starkware and @AvivYossef-starkware)


crates/blockifier_reexecution/src/state_reader/test_state_reader.rs line 90 at r17 (raw file):

impl SerializableOfflineReexecutionData {
    pub fn write_to_file(&self, full_file_path: &str) -> ReexecutionResult<()> {
        let file_path = full_file_path.rsplit_once('/').expect("Invalid file path.").0;

consider using Path and PathBuf objects, we have examples in code - abstracts this kind of logic (non-blocking)

Code quote:

 let file_path = full_file_path.rsplit_once('/').expect("Invalid file path.").0;

Copy link
Contributor Author

@aner-starkware aner-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @AvivYossef-starkware)


crates/blockifier_reexecution/src/state_reader/test_state_reader.rs line 90 at r17 (raw file):

Previously, dorimedini-starkware wrote…

consider using Path and PathBuf objects, we have examples in code - abstracts this kind of logic (non-blocking)

I tried - it's longer and not really any clearer.

@aner-starkware aner-starkware merged commit 9569d8d into main Nov 7, 2024
12 checks passed
@aner-starkware aner-starkware deleted the aner/serialize_block_to_file_poc branch November 7, 2024 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants