-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(package): Enable replica set for the MongoDB results cache and configure it when starting the package. #632
base: main
Are you sure you want to change the base?
Conversation
…en starting the package.
Warning Rate limit exceeded@junhaoliao has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 23 minutes and 53 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (3)
WalkthroughThe pull request introduces modifications to support MongoDB replica set initialization in a Python utility script and updates the MongoDB configuration file. The changes enable automatic replica set configuration during the index creation process for a stream collection. The script now checks the replica set status and initializes it if not already configured, ensuring proper MongoDB replication setup. Changes
Sequence DiagramsequenceDiagram
participant Script as Create Results Cache Indices Script
participant MongoDB as MongoDB Client
Script->>MongoDB: Establish Connection
Script->>MongoDB: Check Replica Set Status
alt Replica Set Not Initialized
Script->>MongoDB: Initialize Replica Set
MongoDB-->>Script: Replica Set Configured
else Replica Set Already Initialized
Script-->>MongoDB: Continue Processing
end
Script->>MongoDB: Create Indexes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
An issue is found when in
However, such issue is only observed on By looking into the community Mongo server source code, it was found they have two ways to check whether a node is up before the server configures itself as a replica node:
Why
|
You may also see from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
components/clp-py-utils/clp_py_utils/create-results-cache-indices.py (2)
19-33
: Consider parametrising the replica set host and port.Although this initialisation flow works well for the default port (27017) on localhost, it may not be sufficient if the user wants to set up a replica set on a non-standard port or host. Passing in the host and port as parameters will improve flexibility and will align with the PR’s objective of making replica set configuration more adaptable.
47-49
: Avoid creating two separate MongoClient connections.Re-using the same
MongoClient
after invokinginitialize_replica_set
can be more efficient and simpler. If you do require a new client connection for index creation, consider adding a short comment explaining why a fresh connection is needed, so it is clear to maintainers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/clp-py-utils/clp_py_utils/create-results-cache-indices.py
(3 hunks)components/package-template/src/etc/mongo/mongod.conf
(1 hunks)
🔇 Additional comments (2)
components/clp-py-utils/clp_py_utils/create-results-cache-indices.py (1)
6-6
: Good use of the OperationFailure import.
This import from pymongo.errors
is essential for handling replica set status checks and ensures that any non-initialised replica set is properly caught and configured.
components/package-template/src/etc/mongo/mongod.conf (1)
3-4
: Replica set configuration approved.
Enabling the replica set by specifying replSetName
is a straightforward and effective strategy. If you later find a need to run multiple replica sets on the same machine, you may consider making rs0
a configurable parameter in the environment or in a generated config file.
Quick question: for Result cache host switching validation, I assumed you also ensured the search in webui is still working with different IPs and ports? |
Discussed with Junhao offline and clarified the two points:
Imo, we should make these two points clear in the code. We have agreed on renaming some replica related functions and adding docstrings so developer who only reads code can also have the context above. |
@@ -491,6 +491,7 @@ def start_results_cache(instance_id: str, clp_config: CLPConfig, conf_dir: pathl | |||
cmd = [ | |||
"docker", "run", | |||
"-d", | |||
"--network", "host", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, do you know exactly why we need --network host? is there any IP other than results_cache.port that will be accessed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using --network host
with MongoDB in Docker ensures proper replica set configuration because Docker's network containerization interferes with MongoDB's isSelf
checks when using non-localhost-&-27017 ports. By exposing the host network to the MongoDB container, we can avoid this issue.
More details of the investigation can be found at #632 (comment) Let me know if I can elaborate on any part
Description
host
and specify--bind-ip
and--port
when starting the mongo image.Validation performed
Search query performance compassion
With & Without the replica set enabled, performed a search query in the WebUI with a screen recording of 30fps, and extract below times:
The speed performance does not seem to significantly differ with and without replica-set enabled.
Result cache host switching
Previous attempts to enable replica set failed because there were host reachability issues observed. Below validations ensure the
clp-config.yml
/results_cache host and port are configurable without affecting the result cache's avaialbility.build/clp-package/lib/python3/site-packages/clp_py_utils/create-results-cache-indices.py
Built the clp-package.
clp-package/sbin/start-clp.sh
with the default config and observed below logs:Changed the
result_cache
'shost
andport
config inclp-package/etc/clp-config.yml
to belocalhost:27018
. e.g.,clp-package/sbin/start-clp.sh
and observed below logs:Used
ip addr
to find another address that belongs to the clp-package host (i got172.22.78.194
). Changed theresult_cache
'shost
andport
config inclp-package/etc/clp-config.yml
to be172.22.78.194:27018
.clp-package/sbin/start-clp.sh
and observed below logs:Changed the
result_cache
'shost
andport
config inclp-package/etc/clp-config.yml
to belocalhost:27017
.clp-package/sbin/start-clp.sh
and observed below logs: