Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB ingest fails when primary node not available #5637

Closed
KulykDmytro opened this issue Aug 12, 2022 · 6 comments · Fixed by #5650
Closed

MongoDB ingest fails when primary node not available #5637

KulykDmytro opened this issue Aug 12, 2022 · 6 comments · Fixed by #5650
Labels
bug Bug report

Comments

@KulykDmytro
Copy link
Contributor

Describe the bug
Unable to ingest from MongoDB which have Primary node closed for external access (Secondary opened only)

readPreference option set to secondary / secondaryPreferred / nearest have no effect on behavior

Screenshots
only one node opened for read to tight security

[2022-08-12, 14:29:49 UTC] {pipeline.py:127} ERROR - No replica set members match selector "Primary()", Timeout: 30s,
Topology Description: <TopologyDescription id: 62f663bf7ea043e606f7d507, topology_type: ReplicaSetNoPrimary,
  servers: [
    <ServerDescription ('192.168.150.16', 27017) server_type: Unknown, rtt: None, error=NetworkTimeout('192.168.150.16:27017: timed out')>,
    <ServerDescription ('192.168.150.26', 27017) server_type: RSSecondary, rtt: 0.0169131452171132>,
    <ServerDescription ('192.168.150.84', 27017) server_type: Unknown, rtt: None, error=NetworkTimeout('192.168.150.84:27017: timed out')>
  ]
>

Additional context
MongoDB cluster is accessible over Secondary nodes only
Ingestor is running over Python code (within Airflow)

@KulykDmytro KulykDmytro added the bug Bug report label Aug 12, 2022
@KulykDmytro KulykDmytro changed the title MongoDB ingest fails when Primary not available MongoDB ingest fails when primary node not available Aug 12, 2022
@KulykDmytro
Copy link
Contributor Author

KulykDmytro commented Aug 12, 2022

@hsheth2 seems like this is a root of the issue

self.mongo_client.admin.command("ismaster")

needs to be processed in any another way, f.e. use ping command etc

@hsheth2
Copy link
Collaborator

hsheth2 commented Aug 16, 2022

Thanks for opening this issue @KulykDmytro - you're right that the ping command seems more appropriate, so we'll change it.

@KulykDmytro
Copy link
Contributor Author

@hsheth2, thanks for rapidly fast response
not helped unfortunatelly
changed to "ping" directly in module code and rerun - receiving same error

so please reopen

@KulykDmytro
Copy link
Contributor Author

KulykDmytro commented Aug 17, 2022

Even commenting this line (with connection test call) and set readPreference: secondary in ingestor config falling into same error
seems like Read Preference parameter not passed to client

debug stacktrace

[2022-08-17 21:24:15,493] DEBUG    {datahub.entrypoints:168} - Stackprinter failed:
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/stackprinter/formatting.py", line 171, in format_exc_info
    whole_stack = format_stack(frameinfos, style=style,
TypeError: format_stack() got an unexpected keyword argument 'suppressed_vars'

So here is your original traceback at least:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/entrypoints.py", line 149, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/telemetry/telemetry.py", line 343, in wrapper
    raise e
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/telemetry/telemetry.py", line 295, in wrapper
    res = func(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/utilities/memory_leak_detector.py", line 102, in wrapper
    return func(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 191, in run
    loop.run_until_complete(run_func_check_upgrade(pipeline))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 154, in run_func_check_upgrade
    ret = await the_one_future
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 145, in run_pipeline_async
    return await loop.run_in_executor(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 135, in run_pipeline_to_completion
    raise e
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 121, in run_pipeline_to_completion
    pipeline.run()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 263, in run
    for wu in itertools.islice(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/datahub/ingestion/source/mongodb.py", line 301, in get_workunits
    database_names: List[str] = self.mongo_client.list_database_names()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1839, in list_database_names
    return [doc["name"] for doc in self.list_databases(session, nameOnly=True, comment=comment)]
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1812, in list_databases
    res = admin._retryable_read_command(cmd, session=session)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pymongo/database.py", line 843, in _retryable_read_command
    return self.__client._retryable_read(_cmd, read_preference, session)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pymongo/_csot.py", line 105, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1413, in _retryable_read
    server = self._select_server(read_pref, session, address=address)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1229, in _select_server
    server = topology.select_server(server_selector)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pymongo/topology.py", line 272, in select_server
    server = self._select_server(selector, server_selection_timeout, address)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pymongo/topology.py", line 261, in _select_server
    servers = self.select_servers(selector, server_selection_timeout, address)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pymongo/topology.py", line 223, in select_servers
    server_descriptions = self._select_servers_loop(selector, server_timeout, address)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pymongo/topology.py", line 238, in _select_servers_loop
    raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: No replica set members match selector "Primary()", Timeout: 30s, Topology Description: <TopologyDescription id: 62fd3230b3ad462bddfff9bf, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('192.168.150.16', 27017) server_type: Unknown, rtt: None, error=NetworkTimeout('192.168.150.16:27017: timed out')>, <ServerDescription ('192.168.150.26', 27017) server_type: RSSecondary, rtt: 0.039533382311999765>, <ServerDescription ('192.168.150.84', 27017) server_type: Unknown, rtt: None, error=NetworkTimeout('192.168.150.84:27017: timed out')>]>

@KulykDmytro
Copy link
Contributor Author

KulykDmytro commented Aug 17, 2022

Seems like command not passing client's defaults (incl. read_preference)
This one will pass read_preference from client instead of relying on defaults and allows to connect while Primary not available in readPreference: secondary mode

self.mongo_client.admin.command("ping", read_preference=self.mongo_client.read_preference)

BTW: ismaster command also works w/o exception

@hsheth2
Copy link
Collaborator

hsheth2 commented Aug 18, 2022

@KulykDmytro that seems pretty unexpected - we're passing the config options directly into the MongoClient (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/mongodb.py#L235-L240), and the mongo client docs (https://pymongo.readthedocs.io/en/stable/api/pymongo/mongo_client.html#pymongo.mongo_client.MongoClient) says that the readPreference attr should be respected.

Outside of the datahub connector, can you get a MongoClient instance working e.g. in just a python script?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants