-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: gen-tx don't run in parallel for single node #1645
Conversation
Solution: - use multiprocessing library to do parallel tx gen
Signed-off-by: yihuang <[email protected]>
Signed-off-by: yihuang <[email protected]>
WalkthroughThe pull request updates the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (4)
testground/benchmark/benchmark/utils.py (1)
176-181
: LGTM! Consider a more specific function name.The
split
function is well-implemented and efficiently splits a range into n parts. The logic is correct and handles uneven divisions properly. The docstring and type hints improve readability and maintainability.Consider renaming the function to something more specific, like
split_range
orpartition_range
, to avoid potential confusion with the built-insplit
method for strings and to better describe its purpose.-def split(a: int, n: int): +def split_range(a: int, n: int):CHANGELOG.md (1)
7-8
: LGTM! Consider fixing the PR link format.The addition of parallel test transaction generation for single nodes is a valuable improvement that should enhance testing efficiency.
Consider updating the format of the second entry to match the first one:
-* (testground)[#1644](https://github.com/crypto-org-chain/cronos/pull/1644) load generator retry with backoff on error. +* [#1644](https://github.com/crypto-org-chain/cronos/pull/1644) load generator retry with backoff on error.This will make the PR link consistent with the other entries in the changelog.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~7-~7: Possible missing comma found.
Context: ...-chain/cronos/pull/1645) Gen test tx in parallel even in single node. * (testground)[#16...(AI_HYDRA_LEO_MISSING_COMMA)
testground/benchmark/benchmark/transaction.py (2)
68-69
: Use thelogging
module for thread-safe output in multiprocessingUsing
stdout
. Thelogging
module is thread-safe and provides better control over log messages.Consider replacing
logging
module:+ import logging + + # Configure logging at the beginning of your script or module + logging.basicConfig(level=logging.INFO) + for acct, txs in zip(accounts, acct_txs): for nonce in range(job.num_txs): txs.append(acct.sign_transaction(job.create_tx(nonce)).rawTransaction.hex()) total += 1 if total % 1000 == 0: - print("generated", total, "txs for node", job.global_seq) + logging.info(f"Generated {total} txs for node {job.global_seq}")This change ensures that log messages are properly managed and thread-safe during multiprocessing.
1-5
: Remove unused imports if not necessaryThe imports
asyncio
andaiohttp
are used for asynchronous operations later in the file, but please ensure that all imported modules are necessary for this script.If any imports are not used, consider removing them to keep the code clean.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- CHANGELOG.md (1 hunks)
- testground/benchmark/benchmark/transaction.py (3 hunks)
- testground/benchmark/benchmark/utils.py (1 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md
[uncategorized] ~7-~7: Possible missing comma found.
Context: ...-chain/cronos/pull/1645) Gen test tx in parallel even in single node. * (testground)[#16...(AI_HYDRA_LEO_MISSING_COMMA)
🔇 Additional comments (1)
testground/benchmark/benchmark/utils.py (1)
176-181
: Verify the usage of the new function in the codebase.The new
split
function looks good, but it's important to ensure it's being used correctly throughout the project.Let's check for its usage:
This will help us understand how and where the function is being used, ensuring it's implemented correctly in other parts of the codebase.
✅ Verification successful
Adjusting the search to target
.py
files explicitly.
The usage of the new
split
function has been verified and is correctly utilized in the codebase.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Search for usage of the split function in Python files. # Test: Look for import statements or direct usage of the split function rg --type python -e "from.*utils.*import.*split" -e "utils\.split\(" -e "split\([^\"']" -C 3Length of output: 127
Script:
#!/bin/bash # Description: Search for usage of the split function in Python files by targeting .py extensions. # Test: Look for import statements or direct usage of the split function rg -g "*.py" -e "from\s+.*utils\s+import\s+split" -e "utils\.split\(" -e "split\([^\"']" -C 3Length of output: 2769
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
testground/benchmark/benchmark/transaction.py (1)
74-89
: LGTM with a minor suggestion: Parallel transaction generationThe refactoring of the
gen
function to use multiprocessing is well-implemented and should significantly improve performance for transaction generation. The approach of splitting work into chunks based on CPU count is appropriate.One minor suggestion:
When combining results usingitertools.zip_longest
, you might introduceNone
values if job results have different lengths. Consider filtering out theseNone
values:all_txs = [] for txs in itertools.zip_longest(*itertools.chain(*acct_txs)): all_txs.extend(filter(None, txs))This change ensures that only valid transactions are added to
all_txs
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- testground/benchmark/benchmark/transaction.py (3 hunks)
🧰 Additional context used
🔇 Additional comments (3)
testground/benchmark/benchmark/transaction.py (3)
2-5
: LGTM: New imports for parallel processingThe added imports (itertools, multiprocessing, os, and namedtuple from collections) are appropriate for implementing parallel transaction generation. These additions align well with the PR's objective of enabling parallel execution for gen-tx.
55-57
: LGTM: Well-structured Job namedtupleThe Job namedtuple is well-defined with all necessary fields for parallel transaction generation. This structure will help in organizing and passing job-related data efficiently between processes.
Line range hint
1-91
: Overall: Excellent implementation of parallel transaction generationThe changes made to this file successfully implement parallel transaction generation, which aligns perfectly with the PR's objective. Key points:
- The use of multiprocessing should significantly improve performance, especially for large numbers of transactions.
- The code structure is clean and maintainable, with clear separation of concerns between job definition, execution, and result aggregation.
- The core logic of transaction creation remains intact, minimizing the risk of introducing new bugs.
These changes should result in a substantial performance improvement for the gen-tx process when running on a single node with multiple CPU cores.
To further validate the improvements:
This script will help verify that the parallel implementation indeed provides a performance boost.
Solution:
👮🏻👮🏻👮🏻 !!!! REFERENCE THE PROBLEM YOUR ARE SOLVING IN THE PR TITLE AND DESCRIBE YOUR SOLUTION HERE !!!! DO NOT FORGET !!!! 👮🏻👮🏻👮🏻
PR Checklist:
make
)make test
)go fmt
)golangci-lint run
)go list -json -m all | nancy sleuth
)Thank you for your code, it's appreciated! :)
Summary by CodeRabbit
New Features
Bug Fixes
Documentation