Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time based last_id for banks #1405

Closed
aeyakovenko opened this issue Sep 29, 2018 · 12 comments
Closed

Time based last_id for banks #1405

aeyakovenko opened this issue Sep 29, 2018 · 12 comments
Assignees
Milestone

Comments

@aeyakovenko
Copy link
Member

aeyakovenko commented Sep 29, 2018

The bank's last_id queue is based on number or entries that have passed through the bank's pipeline. Because of stalls its really easy to generate more entries then there are slots in the queue, and because of stalls the queue no longer represents a uniform passage of time.
This is pretty obvious in just the banking stage bencher

    bencher.iter(move || {
        for v in verified.chunks(verified.len() / NUM_THREADS) {
            verified_sender.send(v.to_vec()).unwrap();
        }
        check_txs(&signal_receiver, txes);
        bank.clear_signatures();
        // make sure the transactions are still valid
        bank.register_entry_id(&mint.last_id());
    });

The verified chunks loop can easily blow through the last_id queue before all the transactions have been processed.

I think that the bank's last_ids should only come from PoH Tick events, and not data entries. Which means that verifiers that are replaying the ledger should only register entries that represent Tick events.

Proposal 1

pub enum Entry {
    Data {
        /// The number of hashes since the previous Entry ID.
        pub num_hashes: u64,

        /// The SHA-256 hash `num_hashes` after the previous Entry ID.
        pub id: Hash,

        /// An unordered list of transactions that were observed before the Entry ID was
        /// generated. They may have been observed before a previous Entry ID but were
        /// pushed back into this list to ensure deterministic interpretation of the ledger.
        pub transactions: Vec<Transaction>,
    },
    Tick {
        /// The number of hashes since the previous Entry ID.
        pub num_hashes: u64,

        /// The SHA-256 hash `num_hashes` after the previous Entry ID.
        pub id: Hash,
    },
}

Proposal 2

enum EntryData {
    /// An unordered list of transactions that were observed before the Entry ID was
    /// generated. They may have been observed before a previous Entry ID but were
    /// pushed back into this list to ensure deterministic interpretation of the ledger.
    Transactions(Vec<Transaction>),
    Tick,
}

pub struct Entry {
        /// The number of hashes since the previous Entry ID.
        pub num_hashes: u64,

        /// The SHA-256 hash `num_hashes` after the previous Entry ID.
        pub id: Hash,

        pub entry_data: EntryData,
    },
}

@garious @rob-solana what do you think?

@garious
Copy link
Contributor

garious commented Sep 29, 2018

Proposal 2 looks great. Also consider defining zero transactions to mean "tick".

@rob-solana
Copy link
Contributor

My (perhaps naive) concern is that this may prevent an EntryData from representing a batch of transactions that can be executed in parallel.

@garious
Copy link
Contributor

garious commented Oct 1, 2018

Seems like a reasonable concern. If it's time to rotate leaders, I think the current leader would prefer to split the tx batch to squeeze them in and take the fees. Otherwise, I think the leader would simply write the Tick and then write full batch just after.

@rob-solana
Copy link
Contributor

I wasn't thinking economically, I was just saying that between this issue, mt_bank, and the proposal in #950 I don't understand how transactions are batched for parallel verification in validators. Perhaps this is a non-issue, now?

@aeyakovenko
Copy link
Member Author

@rob-solana an entry cannot contain non parallizable transactions. #950 ensures that as well

@rob-solana
Copy link
Contributor

Proposal 2 looks great. Also consider defining zero transactions to mean "tick".

proposal 2 with an empty transaction vector is exactly the definition of Entry today, right?

@aeyakovenko
Copy link
Member Author

aeyakovenko commented Oct 1, 2018

@rob-solana yea, it would be super easy to just use transactions.is_empty() as the flag.

@rob-solana
Copy link
Contributor

"the flag", do you mean the predicate for whether to call register_entry_id?

@rob-solana
Copy link
Contributor

another concern: the meaning of "bank.last_id()" is used in tests everywhere, and we'd be changing that?

@aeyakovenko
Copy link
Member Author

  1. @rob-solana yea, in bank.process_entries only call bank.register_entry_id if entry.transactions.is_empty()
  2. @rob-solana we can fix those.

@rob-solana
Copy link
Contributor

means that those chaining tests need to add their own calls to register_entry_id(), because there's no banking stage...

@garious garious added this to the v0.10 Pillbox milestone Oct 2, 2018
@garious
Copy link
Contributor

garious commented Oct 11, 2018

Fixed in #1441

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants