Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oc cluster with oc 1.5.0.alpha.2 fails when persistent data directory used. #12602

Closed
GrahamDumpleton opened this issue Jan 22, 2017 · 27 comments
Assignees
Labels
component/cluster-up kind/bug Categorizes issue or PR as related to a bug. priority/P2

Comments

@GrahamDumpleton
Copy link

When using --host-data-dir option with oc cluster up using oc 1.5.0.alpha.2, get the error:

oc cluster up --host-data-dir "/C/Users/Graha/PowerShift/profiles/oc15/data" --host-config-dir "/C/Users/Graha/PowerShift/profiles/oc15/config" --use-existing-config
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.5.0-alpha.2 image ...
   Pulling image openshift/origin:v1.5.0-alpha.2
   Pulled 1/3 layers, 36% complete
   Pulled 1/3 layers, 70% complete
   Pulled 2/3 layers, 83% complete
   Pulled 3/3 layers, 100% complete
   Extracting
   Image pull complete
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
   Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
   Using 10.0.75.2 as the server IP
-- Starting OpenShift container ...
   Creating initial OpenShift configuration
   Starting OpenShift using container 'origin'
FAIL
   Error: could not start OpenShift container "origin"
   Details:
     Last 10 lines of "origin" container log:
     2017-01-15 02:43:53.883049 I | etcdserver: name = openshift.local
     2017-01-15 02:43:53.883125 I | etcdserver: data dir = /var/lib/origin/openshift.local.etcd
     2017-01-15 02:43:53.883146 I | etcdserver: member dir = /var/lib/origin/openshift.local.etcd/member
     2017-01-15 02:43:53.883163 I | etcdserver: heartbeat = 100ms
     2017-01-15 02:43:53.883174 I | etcdserver: election = 1000ms
     2017-01-15 02:43:53.883185 I | etcdserver: snapshot count = 10000
     2017-01-15 02:43:53.883201 I | etcdserver: advertise client URLs = https://10.0.75.2:4001
     2017-01-15 02:43:53.883265 I | etcdserver: initial advertise peer URLs = https://10.0.75.2:7001
     2017-01-15 02:43:53.883288 I | etcdserver: initial cluster = openshift.local=https://10.0.75.2:7001
     2017-01-15 02:43:53.897919 C | etcdserver: create wal error: rename /var/lib/origin/openshift.local.etcd/member/wal.tmp /var/lib/origin/openshift.local.etcd/member/wal: permission denied
Version

oc 1.5.0.alpha.2

Steps To Reproduce

Use oc cluster up with --host-data-dir option.

Current Result

Fails on startup.

Expected Result

Should startup.

@csrwng
Copy link
Contributor

csrwng commented Jan 23, 2017

@GrahamDumpleton Please try using windows-style paths for the directory arguments

@csrwng
Copy link
Contributor

csrwng commented Jan 23, 2017

So this looks like an issue with version v1.4.0 and newer of the origin images.
Using --version=v1.3.0 works ok for me. Needs more investigation.

@GrahamDumpleton
Copy link
Author

Correct, works up to 1.3.3.

Under 1.4.0 you wouldn't be able to tell unless you first fix #12601 as likely can't get far enough into startup to encounter it.

@GrahamDumpleton
Copy link
Author

Using Windows style paths doesn't help. Get the same error about the wal file.

@csrwng
Copy link
Contributor

csrwng commented Jan 24, 2017

@GrahamDumpleton this looks like an issue in etcd and Windows file systems: etcd-io/etcd#5852
for which a fix was delivered, but not sure that we've picked it up.

As a workaround for your wrapper for now, would it be possible to just use a path inside the vm for storing profiles (e.g. /var/run/profile/blah)?

@csrwng
Copy link
Contributor

csrwng commented Jan 24, 2017

@smarterclayton ^^ v3.1.0 of etcd was just released. Any idea when we'd pick it up?

@csrwng
Copy link
Contributor

csrwng commented Jan 24, 2017

nm, I see @smarterclayton merged the release version of etcd yesterday. I will test with the latest build of the images and see if this issue has been resolved.

@csrwng
Copy link
Contributor

csrwng commented Jan 24, 2017

#12600

@csrwng
Copy link
Contributor

csrwng commented Jan 24, 2017

Hmm, so the latest version of the code also failed. And I see why... etcd has a unix version of the rename code (https://github.com/coreos/etcd/blob/master/wal/wal_unix.go) and a windows version (https://github.com/coreos/etcd/blob/master/wal/wal_windows.go) which works great when you're running a windows version of the binary. But in this case we're running on linux but accessing a windows file system. I'm not sure that I have an easy solution to this.

@csrwng
Copy link
Contributor

csrwng commented Jan 24, 2017

@GrahamDumpleton @jorgemoralespou another option for your wrapper on Windows would be to run with the unix file system and copy the data to/from the user's file system when starting/stopping the cluster.

@jorgemoralespou
Copy link

@csrwong luckily bash version dies not support Windows, but in any case I don't think this is a good solution but a poor's man hack to something you had not considered.
Why is it copying the wal on startup?

@csrwng
Copy link
Contributor

csrwng commented Jan 24, 2017

@jorgemoralespou I didn't dig deep enough into the etcd code to understand why it needs to do this initial rename. However, it looks like after the data is initialized, no other rename occurs. Another hack that occurred to me was to do an initial run to initialize the data directory inside the container, stop the container and copy the data directory out to the windows host directory. Then run with the --host-data-dir argument as usual.

@jorgemoralespou
Copy link

jorgemoralespou commented Jan 24, 2017 via email

@rokkanen
Copy link

rokkanen commented Jan 24, 2017

with oc 1.5.0.alpha.2 on Windows 10 / Docker 1.13.0,
I've just read the coment up !
I confirm this
oc cluster up --host-data-dir=/c/mydata --version=v1.4.0
I get the same error ==> etcdserver: create wal error: rename ....permission denied

if i do this:
1/ drop the content in c:\mydata
2/ oc cluster up --host-data-dir=/c/mydata --version=v1.3.2
=> OK
3/ oc cluster down
4/ oc cluster up --host-data-dir=/c/mydata --version=v1.4.0
=> OK, no error anymore !!!

@GrahamDumpleton
Copy link
Author

Problem also occurs in 1.4.1 after backport for path issue in #12601.

So WIndows also unsupported by 1.4.X if want to use persistent directories and still no choice but to use 1.3.X if want to use Windows.

@csrwng
Copy link
Contributor

csrwng commented Jan 30, 2017

@GrahamDumpleton maybe it's worth reconsidering the approach to persistent storage on Windows. Even if we get a fix from etcd so that the linux version of etcd is able to work on a windows ntfs file system, other apps that you run on OpenShift may not work on an ntfs mount (a db for example). What about running with directories on the Docker vm and only copying data to the Windows file system afterwards for backup?

@GrahamDumpleton
Copy link
Author

I have a few concerns/questions about going down that path, based on ignorance of how things work more than anything else.

The first is how do I specify a directory in the VM? Right now I use a path /c/Users/... and because that directory is mounted through my Docker setup, it uses the local disk. Is it enough to just specify a directory outside of that and it lands on the VM? Or does it have to be under a certain path prefix to land on the VM, else lands in some container.

The next is where on the VM would be an appropriate place to store the PV directories? For our use case, we would need the location to be qualified by the profile name so PV's for different profiles do not clash.

And finally, how can I delete the PV directories on the VM when I am destroying a profile? For docker-machine I know how to ssh into the VM host, but I don't know how to when using Docker for Windows.

As a addendum to last question, how for the builtin support you have for setting up a set of PV is someone meant to clean up a PV on the VM when oc cluster down is run and it was not persistent?

@csrwng
Copy link
Contributor

csrwng commented Jan 31, 2017

The first is how do I specify a directory in the VM?

You need to specify a directory that is not a windows directory (which is pretty much anything that's not /c/blah). In non-windows Docker, it's easy to just mount the root fs of the vm into a container and explore. However, on Windows, it looks like mounting the root doesn't work, so using /var is a safe bet. How can you explore it? Simply run a container that mounts it and take a look at what's there:

docker run -ti --entrypoint=/bin/bash -v /var:/var openshift/origin:latest
$ ls -al /var

Is it enough to just specify a directory outside of that and it lands on the VM?

Pretty much yes, as long as it's not root ('/') as I mentioned above. I tried -v /foo:/foo and it simply creates that directory on the vm

where on the VM would be an appropriate place to store the PV directories?

I'd follow the pattern we already use by default, something under '/var/lib/', like: '/var/lib/profiles/[profile]/pv'

And finally, how can I delete the PV directories on the VM when I am destroying a profile?

You can just run a container that mounts the parent directory and remove that directory. For example:

docker run --entrypoint=/bin/bash --rm -v /var:/var openshift/origin:latest -c "rm -rf /var/lib/profiles/myprofile"

Same for copying it to the host drive... just create a container with a given name that mounts the directory, use docker cp to copy the contents, and then delete the container:

docker create --name=mycontainer -v /var:/var openshift/origin:latest
docker cp mycontainer:/var/lib/profiles/myprofile/pv c:\users\graham\profiles\myprofile
docker rm mycontainer

@pweil- pweil- added component/composition kind/bug Categorizes issue or PR as related to a bug. priority/P1 labels Feb 2, 2017
@csrwng
Copy link
Contributor

csrwng commented Feb 3, 2017

This is an issue that was introduced in v3.0 of etcd when they changed how they initialize their data.
They already have an issue open for it, which they have labeled as an enhancement request. etcd-io/etcd#6984
In the meantime, we can document that you will need to use a linux vm directory for persistence.

@GrahamDumpleton
Copy link
Author

@csrwng Your workaround highlights a further obscure bug.

If I do:

docker run --rm -v /var:/var busybox mkdir -p /var/lib/powershift/profiles/default/data

so that the directory exists in the VM, and then run:

oc cluster up --host-data-dir "/var/lib/powershift/profiles/default/data" --host-config-dir "/C/Users/Graha/PowerShift/profiles/default/config" --use-existing-config --forward-ports=false

so that the host data directory is using that directory from the VM, but the host config directory still comes from the Windows file system via /C/..., you get the the error:

PS C:\Users\Graha> oc cluster up --host-data-dir "/var/lib/powershift/profiles/default/data" --host-config-dir "/C/Users/Graha/PowerShift/profiles/default/config" --use-existing-config --forward-ports=false
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.4.1 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
   Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
   Using 10.0.75.2 as the server IP
-- Starting OpenShift container ... ERROR: Error reading next tar header: API error (500): {"message":"mkdir /C: file exists"}
ERROR: Error extracting tar stream

   Creating initial OpenShift configuration
FAIL
   Error: could not create OpenShift configuration
   Caused By:
     Error: cannot start container df07360e260a083c91e8b06a34b45e158bab33a045a72e55995bd43aea3f420f
     Caused By:
       Error: API error (500): {"message":"mkdir /C: file exists"}

So it doesn't like mixing directories for data and config directories such that coming from different file systems.

I was going to leave the config directory on the Windows host for now but can't because of this. Have to do even more changes down to have it inside as well, as will have to inject special scripts into the container to run as can't just copy them into the config directory on Windows host, but then run from inside of container.

@jorgemoralespou
Copy link

@GrahamDumpleton maybe the time to start just pushing minishift. All these problems will not exist there, and there's little push from our engineering apart from Cesar into get fixes to oc cluster.

@csrwng
Copy link
Contributor

csrwng commented Feb 23, 2017

@GrahamDumpleton that seems like a bug in our code, will take a look

@LeoLanceb
Copy link

@csrwng Hi, is there any progress on this? I'm trying to run openshift via "oc cluster up" on my Win10 laptop but having the same problem. I'm using oc v1.5.1+7b451fc. Luckily, the "--version=v1.3.2" workaround solves the wal copy permissions problem.

@GrahamDumpleton
Copy link
Author

@LeoLanceb I would suggest you look at my wrapper for oc cluster up. It runs on Windows and takes care of this issue when wanting to have persistent profiles.

For installation of the wrapper see:

Some details of the commands arround oc cluster up can be found at:

It could do with some more documentation, which I have in email somewhere, but once installed, use:

powershift cluster up

to start up the cluster and:

powershift cluster down

to stop it.

You can have multiple saved profiles, but can only run one at a time.

@csrwng
Copy link
Contributor

csrwng commented Jun 27, 2017

I am closing this issue since we don't plan to workaround the etcd root issue.

@csrwng csrwng closed this as completed Jun 27, 2017
@glennodickson
Copy link

glennodickson commented Aug 17, 2017

I'm using:

VirtualBox 5.1.26
Kubernetes v1.5.2+43a9be4
openshift v1.5.0+031cbe4

Didn't work for me using --host-data-dir (and others) :

oc cluster up  --logging=true --metrics=true --docker-machine=openshift --use-existing-config=true --host-data-dir=/vm/data --host-config-dir=/vm/config --host-pv-dir=/vm/pv --host-volumes-dir=/vm/volumes

With output:

-- Checking OpenShift client ... OK
-- Checking Docker client ...
   Starting Docker machine 'openshift'
   Started Docker machine 'openshift'
-- Checking Docker version ...
   WARNING: Cannot verify Docker version
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.5.0 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
   Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
   Using docker-machine IP 192.168.99.100 as the host IP
   Using 192.168.99.100 as the server IP
-- Starting OpenShift container ...
   Starting OpenShift using container 'origin'
FAIL
   Error: could not start OpenShift container "origin"
   Details:
     Last 10 lines of "origin" container log:
     github.com/openshift/origin/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc4202a1600, 0x42b94c0, 0x1f, 0xc4214d9f08, 0x2, 0x2)
        /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x16a
     github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend.newBackend(0xc4209f84c0, 0x33, 0x5f5e100, 0x2710, 0xc4214d9fa8)
        /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend/backend.go:106 +0x341
     github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend.NewDefaultBackend(0xc4209f84c0, 0x33, 0x461e51, 0xc421471200)
        /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend/backend.go:100 +0x4d
     github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver.NewServer.func1(0xc4204bf640, 0xc4209f84c0, 0x33, 0xc421079a40)
        /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver/server.go:272 +0x39
     created by github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver.NewServer
        /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver/server.go:274 +0x345

Openshift writes to the directories /vm/... (also defined in VirtualBox) but successfully won't start.

@bparees or @GrahamDumpleton : Just wondering if you would have any ideas as I've seen a couple of your conversations that touched on this problem please?

Thanks.

@Sawon90
Copy link

Sawon90 commented May 13, 2018

@glennodickson I faced the same problem with vagrant/virtualbox. I solve it by not using any shared directory (eg. windows file system / autosync folder). Just use any folder inside your virtual machine (I use my home directory: /home/vagrant/etcd-data).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/cluster-up kind/bug Categorizes issue or PR as related to a bug. priority/P2
Projects
None yet
Development

No branches or pull requests

9 participants