-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] OpenSearch v1.2.4 Docker image is broken while v1.2.3 just works #1529
Comments
From what I understand looks like the file path config for SSL coming from Security plugin is not configured. |
@q2dg I can't reproduce with docker, running
Docker command output
|
I am going to install
|
I setup an EC2 instance running the AMI
Image ID matches what I saw when I started up docker on my windows machine
@q2dg we don't have much more action to take at this point as I cannot reproduce the issue you had. Please run those |
Thanks a lot for you patience. I'm running a last-minute updated Fedora 35 system (kernel v5.15.15, podman v3.4.4) If I run podman container run --rm --name=puto -e "discovery.type=single-node" -p 9200:9200 opensearchproject/opensearch:latest I get:
The output of podman image ls is:
BUT if I run podman container run --rm --name=puto -e "discovery.type=single-node" -p 9200:9200 opensearchproject/opensearch:1.2.3 I get this, instead:
The output of podman image ls now is:
Thanks a lot again!! |
Details from my ubuntu machine
OpenSearch 1.2.4 container startup
When look at at a diff between the different images outputs, after the line opensearch-build/docker/release/config/opensearch/opensearch-docker-entrypoint.sh Lines 72 to 73 in eb7c932
The failure is on the left and the operational one on the right I think you'll need to inspect if the cat / sed / tee commands are failing which is aborting the rest of the script execution. Adding @peterzhuamazon in case you have any debugging advice |
Well, you're true! I've entered (by executing podman exec -it puto /bin/bash ) into my functional v1.2.3 container and I've run cat->sed->tee pipeline manually. The fact is that running just cat or cat->sed is the same: there's no difference in the shown output (that is, sed doesn't delete any line because there's no line with the "plugins.security.disabled" string in the original opensearch.yml file). So all it's all right... BUT when sed's output is piped into tee command, this output disappears!! There's no output when running entire pipeline...tee swallows it! In fact, it's worst, because tee overwrittes original "opensearch.yml" file thus it erases all its content (resulting in a void file). The fact is that if I indicate another name for the file written by tee, it's all right: output is shown through screen and is written in final file; but if this final file is the same than the original, its content is destroyed. Anyway, this behaviour happens in my functional container, so I don't know if this is specifically the reason why v.1.2.4 doesn't works... Thanks a lot again!! |
Does that mean you can start the 1.2.4 container? We did change how the container updates the config file as there were reported issues with sed reading while tee was writing, but we might not have fixed this at all. Checkout #1130 as there is more detail in that pull request. Previous startup script: opensearch-build/docker/release/config/opensearch/opensearch-docker-entrypoint.sh Lines 71 to 72 in 780c28d
|
No, no, I can't start the v1.2.4 container. My tests have been done in the v1.2.3 one. That's why I say that I don't know to what extent this may or may not be the source of the error, but it is still an interesting investigation, anyway. In fact, I've tried the previous startup script and I can tell you that it does respect the content on opensearch.yml file (that is, it doesn't void it). If this can help... Thanks again! |
@q2dg I'm out for the weekend, I'll check with some of our other folks on Monday that might have a better idea how to A) reproduce this and B) how to resolve this issue. We might have some experimental docker images for you to try if you are willing. |
#1458
I think I get confused this should not happen regardless of the host system.
Thanks. |
I am using Fedora 35 and then update the kernel to the latest, since 5.15.15 is not available I am using latest 5.15.16.
Podman allows me to choose from these images:
test log 1.2.4
test log latest
|
|
I am not seeing any issues running this. Weird. The only thing I can think of is tee somehow happens before cat, thus empty the file by default before cat can read anything. I am thinking about using > directly since we are not using sed to read anymore, which should have very similar behavior to tee just no output, and inode should not change as well. I think I need more information on your setup @q2dg as of now I cannot reproduce your situation on exactly the same system. Thanks. |
@q2dg can you run this line and let me know what is the output?
I want to understand whether the Thanks. |
Sure! What I get is this:
Doing a diff with reference file whose link you give, I get no output, so it seems both files are exactly equal. My setup is a VirtualBox machine, with standard options (disk is a VDI one with a simulated SATA connection) Sorry for disturbing so much, and thanks again |
And are you running on x86_64 or ARM64 host? |
My host is a x86_64 machine (in fact, it's another Fedora 35 Workstation system) lscpu's output as seen from vm is this:
Thanks! |
I will try the workstation version later as I only download the Server version. |
I am experiencing the same problem and I cannot get 1.2.4 to start with the same security configuration error as in the original error report. To try to debug I took a copy of the docker image that failed to start using This issue seems to be intermittent and sometimes it works fine, but most of the time it fails. I've not been able to tie it down to anything more concrete than that. I'm running on Amazon Linux version 4.14.232-177.418.amzn2.x86_64 |
I've reproduced my issue of the config being wiped in the Dockerfile below, which replicates the cat > sed > tee command:
Build with
Here is the output running it 4 times. Note that on the 4th execution the config file was been wiped and the file is now empty.
I don't know if this is of any help? |
Could it be that the order of the execution of the If it does get executed out of order then the config file will be wiped. |
@peterzhuamazon What do you think about resolve these issues once and for all with a migration to python? I've created a separate task that we might be able to bang out relatively quickly to mitigate these issues. |
Maybe I'm not fully understanding the issue, but isn't part of the problem that the entrypoint is directly modifying the original config file? When a config file is mounted into the docker container, a user probably wouldn't expect that their config file on their local machine might get modified (or even deleted!). Moreover, the config file may be mounted as read only, meaning that the code will probably fail when it is unable to write to the config. I'm not sure how rewriting in python addresses this issue. Maybe it would be better to take a copy of the config file, modify that and then run OpenSearch using that? That said I don't believe there is a way of specifying an alternative config path to OpenSearch. |
Hi @jgough thanks for the investigation. Seems like this is not an ideal approach at this point. The reason we go all the way to use cat/sed/tee is due to sed -i creating new inode, and mounting in docker will disallow that behavior. Would love to have some more opinions on how this can be changed and fixed to a better approach. @unhipzippo You have helped us to identify the issue before, thanks for that, would like to see if you have any take on this? Thanks. |
Using Thanks. |
We can, however, save the output in a var, then echo the var into the file in a second line, but that is quite a messy script going forward.
|
I think there are several ways we can think about resolving this issue:
|
Please note following sem-ver we cannot re-release. We would either need to wait for next release or do a patch release fixing this or have a work around. |
Thinking about it more, I think I agree with @jgough -- As an end-user, I would generally expect that config files that I bind into the container are my config, and they won't be modified in any way by the program at runtime (unless the program has communicated this to users up front and received implicit buy-in). I wonder whether a better solution wouldn't just be to update whatever code is consulting plugins.security.disabled from opensearch.yml and instead have it consult the environment variable instead -- Then set the environment variable on startup as needed. i.e. You end up setting the environment variable based on the config file, rather than setting the config file based on the environment variable. :) The code in opensearch-docker-entrypoint.sh could change to something like:
This would save you from needing to modify the config file at runtime at all. The downside is that you now need to go through the code and find anywhere that directly consults plugins.security.disabled, and have it pay attention to the environment var instead. |
Are we trying to overcomplicate this? If I run |
Yep -- that might be even simpler; I hadn't checked in the code to see that was a possibility. |
@unhipzippo @jgough the entrypoint will try to figure out if any env var is presenting with the configuration settings and applying during startup: We can probably deprecated these made up variables and just ask people to directly use opensearch settings in ENV. However, one of the reasons we introduce these new ENV VAR is due to some of them present in ODFE back in the days, so people still want to have that BC. |
And especially for Dashboards you need to completely uninstall the security FE plugin and replace all the HTTPS to HTTP in config file, as it assume OpenSearch has security BE plugin installed by default. Thus have all these kind of issues that we need to change config file on the fly. I do, however, appreciate any ideas on how to have a way to resolve this without all the live patches to the file. |
Will create a PR for temp fix based on #1529 (comment). |
@jgough @unhipzippo @qmonitoring @deng47 We have staging images here if you willing to try out, and let us know if the issue is fixed for now.
Thanks. |
@peterzhuamazon I can confirm that I've upgraded a 1.2.3 OpenSearch cluster with the opensearchstaging/opensearch:1.2.4-testfix image successfully and that has fixed the issue we were having. I can't easily test the opensearch-dashboards image. |
@peterzhuamazon - I can confirm from my side that none of the versions work for me, when I like to exchange the config (latest, 1.2.3, 1.2.4-testfix and so on). An exchange of the The The docker-compose looks like this: version: '3'
services:
opensearch-node:
# image: opensearchproject/opensearch:latest
image: opensearchstaging/opensearch:1.2.4-testfix
container_name: opensearch-node
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node
- discovery.seed_hosts=opensearch-node
- cluster.initial_master_nodes=opensearch-node
- bootstrap.memory_lock=true
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- opensearch-data:/usr/share/opensearch/data
- ./custom_opensearch.yml:/usr/share/opensearch/config/opensearch.yml
ports:
- 9200:9200
- 9600:9600
networks:
- opensearch-net
volumes:
opensearch-data:
networks:
opensearch-net: Am I missing something? |
Hi @lerdt do you have any logs showing what is going on with your errors? Thanks. |
We have officially re-released 1.2.4 OpenSearch and 1.2.0 Dashboards with the above fixes as well as new OS level patches.
@lerdt I will close this issue for now it has resolved majority of the issues that 1.2.3 can run but 1.2.4 cant. Thanks. |
This command works
But this one doesn't
Errors are shown below
The text was updated successfully, but these errors were encountered: