-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Amazon OpenSearch Serverless #269
Comments
Hi @sameercaresu thanks for bringing this up. This is a known issue with OpenSearch serverless. The hadoop client makes a |
Hi, |
I haven't heard of any changes to Serverless to address this API gap. |
We have a usecase to connect to open search serverless from Apache spark. I am running into similar issue as well. Is there a workaround to connect to open search serverless from Apache spark ? |
There is still no known workaround. If you do figure out a way, please share it here or propose a PR so we can patch the client. |
Not sure if everyone is doing the same as what I was trying to do but it works for me. I am using opensearch-hadoop(java) to connect to openSearch serverless (this is deployed in VPC) using a vpc endpooint something below and it works for me,
|
I've done some quick investigation into this and it's more extensive than just the
The bigger issue I then hit trying to do a read:
|
@Xtansia It looks like _search_shards is getting called even when the setting os.nodes.client.only is set to TRUE. In that scenario the _search_shards is useless and shouldn't execute since no shards will map to non-data nodes. That means this should be a noop: opensearch-hadoop/mr/src/main/java/org/opensearch/hadoop/rest/RestRepository.java Line 279 in c9a6a1c
|
It's not quite as simple as just not calling it, as it uses the shards to determine how to partition the job within Spark for parallelisation. Serverless doesn't expose any shard information. It may be possible to workaround and hard code 1 or a configurable number of partitions for serverless, but I haven't dug into it far enough to know if that's feasible if other parts of the code expect to use an actual shard ID |
@Xtansia How does Hadoop client use the following APIs? I am exploring a solution to support some dummy/empty response for these APIs in serverless to support backward compatibility. But without understanding how the client uses these APIs, returning dummy response would be of no use.
|
@dblock i have seen you pinned this issue about using hadoop with opensearch serverless. Can I ask what have you solved? if for example I want to use glue to transfer documents from a collection OpenSearch Serverless to another, now could I do that? Thanks in advance |
Is your feature request related to a problem?
I am trying to connect to Opensearch serverless collection from databrikcs. I can connect to Opensearch managed cluster using this. However, while trying to connect to serverless collection, I keep getting this error
OpenSearchHadoopIllegalArgumentException: Cannot detect OpenSearch version - typically this happens if the network/OpenSearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'opensearch.nodes.wan.only' Caused by: OpenSearchHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[https://xxx.aoss.amazonaws.com:9200
I have tried following configuration
"pushdown" -> "true", "opensearch.nodes" -> "https://xxx.aoss.amazonaws.com", "opensearch.nodes.wan.only" -> "true", "opensearch.aws.sigv4.region" -> "us-east-1", "opensearch.aws.sigv4.service.name" -> "aoss", "opensearch.aws.sigv4.enabled" -> "true"
What solution would you like?
Is it already possible to connect to Opensearch serverless? If yes, then could you please point me to correct set of configuration? If not, then I would like to request this feature.
What alternatives have you considered?
I used elasticsearch hadoop, but that doesn't work either with Opensearch serverless.
Do you have any additional context?
No.
The text was updated successfully, but these errors were encountered: