You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This problem is first reported based on 0.21.0-rc1. And it also exists on master branch
Description
When starting druid cluster in docker with the docker-compose(distribution/docker/docker-compose.yml), ALL druid's service nodes failed to start with messages as below:
Note that var directory is belong to root instead of druid. Since the process inside container is launched by user druid, of course it has no permission to create directories under var.
Analysis
This problem is introduced by #10506 . Looking at the scripts after 10506,
At first, we create /opt/druid/var directory and change owner of /opt and its all sub-dirs to druid. This instruction looks OK.
But the following command COPY --chown=druid:druid --from=builder /opt /opt replaces the entire /opt, including its sub-directory opt/druid/var, which means there's no such directory inside the container.
Since /opt/druid/var is declared as a VOLUME, when cluster is brought up, docker is responsible for creating such directory. And docker is running as root on user's computer, the owner of var is now root instead of druid we expect.
Before 10506, there's no such problem, see the scripts below, /opt/druid/var is created after COPY, so that dir exists inside the container after build.
I'm not sure why this problem didn't come out in some other environment. I guess it has something to do with VOLUME. I'm not familiar with that, and this is my guess: since volume is also on HOST env, if there's such a directory (saying created by previous image), the var dir won't be created as root.
Fix
The fix I can come up with is putting mkdir -p /opt/druid/var after COPY command.
Back to what 10506 tries to solve, the change I propose only creates a new directory and makes no changes to the files, and it won't double the image size.
On my test environment, the image size shows 547MiB
@FrankChen021 this might be specific to docker for mac and how it mounts volumes. There might also be some issue with how COPY handles the existing var directory, complicated by the fact that /opt/druid is a symlink.
This might cause things to work a little differently across docker versions and OSes.
A workaround is to always mount a volume on the docker run command line. At least in my testing this somehow makes /opt/druid/var take on the right druid:druid ownership instead of root when the volume is left unset.
Docker volume directory was accidentally removed due to reordering of statements.
This causes ownership and permissions on the volume directory to be reset, preventing startup.
fixes#11166
Signed-off-by: frank chen <[email protected]>
Docker volume directory was accidentally removed due to reordering of statements.
This causes ownership and permissions on the volume directory to be reset, preventing startup.
fixes#11166
Signed-off-by: frank chen <[email protected]>
Affected Version
This problem is first reported based on 0.21.0-rc1. And it also exists on master branch
Description
When starting druid cluster in docker with the docker-compose(distribution/docker/docker-compose.yml), ALL druid's service nodes failed to start with messages as below:
Inside the container, listing the owner of all directories under
/opt/druid
showesNote that
var
directory is belong toroot
instead ofdruid
. Since the process inside container is launched by userdruid
, of course it has no permission to create directories undervar
.Analysis
This problem is introduced by #10506 . Looking at the scripts after 10506,
At first, we create
/opt/druid/var
directory and change owner of/opt
and its all sub-dirs todruid
. This instruction looks OK.But the following command
COPY --chown=druid:druid --from=builder /opt /opt
replaces the entire/opt
, including its sub-directoryopt/druid/var
, which means there's no such directory inside the container.Since
/opt/druid/var
is declared as a VOLUME, when cluster is brought up, docker is responsible for creating such directory. And docker is running asroot
on user's computer, the owner ofvar
is nowroot
instead ofdruid
we expect.Before 10506, there's no such problem, see the scripts below,
/opt/druid/var
is created after COPY, so that dir exists inside the container after build.Some proof
To find out the problem, I added "ls" command to Dockerfile to observe directories and their owner during image building.
druid
we created by command RUN before COPYdruid
now changes to symbolic link we created at the beginning of Dockerfile/opt/apache-druid-0.21.0
, note that there' NOvar
directoryI'm not sure why this problem didn't come out in some other environment. I guess it has something to do with VOLUME. I'm not familiar with that, and this is my guess: since volume is also on HOST env, if there's such a directory (saying created by previous image), the
var
dir won't be created as root.Fix
The fix I can come up with is putting
mkdir -p /opt/druid/var
after COPY command.Back to what 10506 tries to solve, the change I propose only creates a new directory and makes no changes to the files, and it won't double the image size.
On my test environment, the image size shows 547MiB
cc @jihoonson @gianm
The text was updated successfully, but these errors were encountered: