Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Failing --number-of-docs parameter for create-workload action #658

Closed
ngc4579 opened this issue Sep 30, 2024 · 2 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@ngc4579
Copy link

ngc4579 commented Sep 30, 2024

Describe the bug

When creating a workload from existing indices and limiting the number of documents using the --number-of-docs parameter of the create-workload action, the command fails with an exception:

$ opensearch-benchmark create-workload ... --indices=index1,index2 --number-of-docs="index-1:1000 index-2:1000"

2024-09-30 09:29:19,940 -not-actor-/PID:251 osbenchmark.workload_generator.workload_generator INFO Extracted index settings and mappings from [[Index(name='index-1', document_frequency=0, number_of_docs={'index-1': '1000', 'index-2': '1000'}, settings_and_mappings={}), Index(name='index-2', document_frequency=0, number_of_docs={'index-1': '1000', 'index-2': '1000'}, settings_and_mappings={})]]

2024-09-30 09:29:19,944 -not-actor-/PID:251 osbenchmark.benchmark ERROR A fatal error occurred while running subcommand [create-workload].
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/osbenchmark/benchmark.py", line 940, in dispatch_sub_command
    workload_generator.create_workload(cfg)
  File "/usr/local/lib/python3.11/site-packages/osbenchmark/workload_generator/workload_generator.py", line 73, in create_workload
    index_corpora = corpus_extractor.extract_documents(index.name, index.number_of_docs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/osbenchmark/workload_generator/extractors.py", line 174, in extract_documents
    documents_to_extract = total_documents if not documents_limit else min(total_documents, documents_limit)
                                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<' not supported between instances of 'dict' and 'int'

To reproduce

Try creating a workload from an existing index while limiting the number of documents using --number-of-docs.

Expected behavior

Workload should be created as specified without the command crashing.

Screenshots

If applicable, add screenshots to help explain your problem.

Host / Environment

K8s 1.29, OSB 1.9.1 running in Pod

Additional context

It seems in helpers.py, the function process_indices assigns the entire index / count dict to each Index element instead of extracting the specific document count.

Relevant log output

2024-09-30 09:29:19,940 -not-actor-/PID:251 osbenchmark.workload_generator.workload_generator INFO Extracted index settings and mappings from [[Index(name='index-1', document_frequency=0, number_of_docs={'index-1': '1000', 'index-2': '1000'}, settings_and_mappings={}), Index(name='index-2', document_frequency=0, number_of_docs={'index-1': '1000', 'index-2': '1000'}, settings_and_mappings={})]]
2024-09-30 09:29:19,941 -not-actor-/PID:251 py.warnings WARNING /usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py:1099: InsecureRequestWarning: Unverified HTTPS request is being made to host 'opensearch-nodes.opensearch.svc'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings
  warnings.warn(

2024-09-30 09:29:19,944 -not-actor-/PID:251 osbenchmark.benchmark ERROR A fatal error occurred while running subcommand [create-workload].
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/osbenchmark/benchmark.py", line 940, in dispatch_sub_command
    workload_generator.create_workload(cfg)
  File "/usr/local/lib/python3.11/site-packages/osbenchmark/workload_generator/workload_generator.py", line 73, in create_workload
    index_corpora = corpus_extractor.extract_documents(index.name, index.number_of_docs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/osbenchmark/workload_generator/extractors.py", line 174, in extract_documents
    documents_to_extract = total_documents if not documents_limit else min(total_documents, documents_limit)
                                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<' not supported between instances of 'dict' and 'int'
@ngc4579 ngc4579 added bug Something isn't working untriaged labels Sep 30, 2024
@IanHoang IanHoang self-assigned this Sep 30, 2024
@IanHoang
Copy link
Collaborator

Was able to reproduce. Will put out a fix shortly.

@IanHoang
Copy link
Collaborator

Got a fix working for it. Addressing in a PR.

(.venv) hoangia@80a9971b1103 opensearch-benchmark % opensearch-benchmark create-workload --target-hosts=XXXXXX --client-options=basic_auth_user:'XXXXXX',basic_auth_password:'XXXXXX' --indices=movies-1000,movies-2000,nyc_taxis  --output-path=~/Desktop/ --workload=test-workload --number-of-docs="movies-2000:1500 nyc_taxis:1500"

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.
[INFO] Connected to OpenSearch cluster [69622e766ec7eb17f038aed664796847] version [2.5.0].

A workload already exists at /Users/hoangia/Desktop/test-workload. Would you like to remove it? (y/n): y
[INFO] Removing workload of the same name.
Extracting documents for index [movies-1000] for test mode... 1000/1000 docs [100.0% done]
Extracting documents for index [movies-1000]...               1000/1000 docs [100.0% done]
Extracting documents for index [movies-2000] for test mode... 1000/1000 docs [100.0% done]
Extracting documents for index [movies-2000]...               1500/1500 docs [100.0% done]
Extracting documents for index [nyc_taxis] for test mode...   1000/1000 docs [100.0% done]
Extracting documents for index [nyc_taxis]...                 1500/1500 docs [100.0% done]

[INFO] Workload test-workload has been created. Run it with: opensearch-benchmark --workload-path=/Users/hoangia/Desktop/test-workload

-------------------------------
[INFO] SUCCESS (took 4 seconds)
-------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: ✅ Done
Development

No branches or pull requests

2 participants