oc cluster with oc 1.5.0.alpha.2 fails when persistent data directory used. #12602

GrahamDumpleton · 2017-01-22T22:51:31Z

When using --host-data-dir option with oc cluster up using oc 1.5.0.alpha.2, get the error:

oc cluster up --host-data-dir "/C/Users/Graha/PowerShift/profiles/oc15/data" --host-config-dir "/C/Users/Graha/PowerShift/profiles/oc15/config" --use-existing-config
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.5.0-alpha.2 image ...
   Pulling image openshift/origin:v1.5.0-alpha.2
   Pulled 1/3 layers, 36% complete
   Pulled 1/3 layers, 70% complete
   Pulled 2/3 layers, 83% complete
   Pulled 3/3 layers, 100% complete
   Extracting
   Image pull complete
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
   Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
   Using 10.0.75.2 as the server IP
-- Starting OpenShift container ...
   Creating initial OpenShift configuration
   Starting OpenShift using container 'origin'
FAIL
   Error: could not start OpenShift container "origin"
   Details:
     Last 10 lines of "origin" container log:
     2017-01-15 02:43:53.883049 I | etcdserver: name = openshift.local
     2017-01-15 02:43:53.883125 I | etcdserver: data dir = /var/lib/origin/openshift.local.etcd
     2017-01-15 02:43:53.883146 I | etcdserver: member dir = /var/lib/origin/openshift.local.etcd/member
     2017-01-15 02:43:53.883163 I | etcdserver: heartbeat = 100ms
     2017-01-15 02:43:53.883174 I | etcdserver: election = 1000ms
     2017-01-15 02:43:53.883185 I | etcdserver: snapshot count = 10000
     2017-01-15 02:43:53.883201 I | etcdserver: advertise client URLs = https://10.0.75.2:4001
     2017-01-15 02:43:53.883265 I | etcdserver: initial advertise peer URLs = https://10.0.75.2:7001
     2017-01-15 02:43:53.883288 I | etcdserver: initial cluster = openshift.local=https://10.0.75.2:7001
     2017-01-15 02:43:53.897919 C | etcdserver: create wal error: rename /var/lib/origin/openshift.local.etcd/member/wal.tmp /var/lib/origin/openshift.local.etcd/member/wal: permission denied

Version

oc 1.5.0.alpha.2

Steps To Reproduce

Use oc cluster up with --host-data-dir option.

Current Result

Fails on startup.

Expected Result

Should startup.

The text was updated successfully, but these errors were encountered:

csrwng · 2017-01-23T19:41:34Z

@GrahamDumpleton Please try using windows-style paths for the directory arguments

csrwng · 2017-01-23T20:35:44Z

So this looks like an issue with version v1.4.0 and newer of the origin images.
Using --version=v1.3.0 works ok for me. Needs more investigation.

GrahamDumpleton · 2017-01-23T20:38:44Z

Correct, works up to 1.3.3.

Under 1.4.0 you wouldn't be able to tell unless you first fix #12601 as likely can't get far enough into startup to encounter it.

GrahamDumpleton · 2017-01-24T00:58:44Z

Using Windows style paths doesn't help. Get the same error about the wal file.

csrwng · 2017-01-24T15:00:49Z

@GrahamDumpleton this looks like an issue in etcd and Windows file systems: etcd-io/etcd#5852
for which a fix was delivered, but not sure that we've picked it up.

As a workaround for your wrapper for now, would it be possible to just use a path inside the vm for storing profiles (e.g. /var/run/profile/blah)?

csrwng · 2017-01-24T15:07:49Z

@smarterclayton ^^ v3.1.0 of etcd was just released. Any idea when we'd pick it up?

csrwng · 2017-01-24T15:17:27Z

nm, I see @smarterclayton merged the release version of etcd yesterday. I will test with the latest build of the images and see if this issue has been resolved.

csrwng · 2017-01-24T15:18:05Z

#12600

csrwng · 2017-01-24T16:24:42Z

Hmm, so the latest version of the code also failed. And I see why... etcd has a unix version of the rename code (https://github.com/coreos/etcd/blob/master/wal/wal_unix.go) and a windows version (https://github.com/coreos/etcd/blob/master/wal/wal_windows.go) which works great when you're running a windows version of the binary. But in this case we're running on linux but accessing a windows file system. I'm not sure that I have an easy solution to this.

csrwng · 2017-01-24T16:30:47Z

@GrahamDumpleton @jorgemoralespou another option for your wrapper on Windows would be to run with the unix file system and copy the data to/from the user's file system when starting/stopping the cluster.

jorgemoralespou · 2017-01-24T17:28:00Z

@csrwong luckily bash version dies not support Windows, but in any case I don't think this is a good solution but a poor's man hack to something you had not considered.
Why is it copying the wal on startup?

csrwng · 2017-01-24T17:31:48Z

@jorgemoralespou I didn't dig deep enough into the etcd code to understand why it needs to do this initial rename. However, it looks like after the data is initialized, no other rename occurs. Another hack that occurred to me was to do an initial run to initialize the data directory inside the container, stop the container and copy the data directory out to the windows host directory. Then run with the --host-data-dir argument as usual.

jorgemoralespou · 2017-01-24T18:00:32Z

@csrwng that would be an amazing user experience 😂😂😂 El 24 ene. 2017 18:33, "Cesar Wong" <[email protected]> escribió:

…

@jorgemoralespou <https://github.com/jorgemoralespou> I didn't dig deep enough into the etcd code to understand why it needs to do this initial rename. However, it looks like after the data is initialized, no other rename occurs. Another hack that occurred to me was to do an initial run to initialize the data directory inside the container, stop the container and copy the data directory out to the windows host directory. Then run with the --host-data-dir argument as usual. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12602 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEyDhY2veqBbXfhN61S8AHSLAewsBudks5rVjWLgaJpZM4LqiV7> .

rokkanen · 2017-01-24T19:09:41Z

with oc 1.5.0.alpha.2 on Windows 10 / Docker 1.13.0,
I've just read the coment up !
I confirm this
oc cluster up --host-data-dir=/c/mydata --version=v1.4.0
I get the same error ==> etcdserver: create wal error: rename ....permission denied

if i do this:
1/ drop the content in c:\mydata
2/ oc cluster up --host-data-dir=/c/mydata --version=v1.3.2
=> OK
3/ oc cluster down
4/ oc cluster up --host-data-dir=/c/mydata --version=v1.4.0
=> OK, no error anymore !!!

GrahamDumpleton · 2017-01-29T08:51:07Z

Problem also occurs in 1.4.1 after backport for path issue in #12601.

So WIndows also unsupported by 1.4.X if want to use persistent directories and still no choice but to use 1.3.X if want to use Windows.

csrwng · 2017-01-30T18:16:12Z

@GrahamDumpleton maybe it's worth reconsidering the approach to persistent storage on Windows. Even if we get a fix from etcd so that the linux version of etcd is able to work on a windows ntfs file system, other apps that you run on OpenShift may not work on an ntfs mount (a db for example). What about running with directories on the Docker vm and only copying data to the Windows file system afterwards for backup?

GrahamDumpleton · 2017-01-31T06:31:43Z

I have a few concerns/questions about going down that path, based on ignorance of how things work more than anything else.

The first is how do I specify a directory in the VM? Right now I use a path /c/Users/... and because that directory is mounted through my Docker setup, it uses the local disk. Is it enough to just specify a directory outside of that and it lands on the VM? Or does it have to be under a certain path prefix to land on the VM, else lands in some container.

The next is where on the VM would be an appropriate place to store the PV directories? For our use case, we would need the location to be qualified by the profile name so PV's for different profiles do not clash.

And finally, how can I delete the PV directories on the VM when I am destroying a profile? For docker-machine I know how to ssh into the VM host, but I don't know how to when using Docker for Windows.

As a addendum to last question, how for the builtin support you have for setting up a set of PV is someone meant to clean up a PV on the VM when oc cluster down is run and it was not persistent?

csrwng · 2017-01-31T19:56:11Z

The first is how do I specify a directory in the VM?

You need to specify a directory that is not a windows directory (which is pretty much anything that's not /c/blah). In non-windows Docker, it's easy to just mount the root fs of the vm into a container and explore. However, on Windows, it looks like mounting the root doesn't work, so using /var is a safe bet. How can you explore it? Simply run a container that mounts it and take a look at what's there:

docker run -ti --entrypoint=/bin/bash -v /var:/var openshift/origin:latest
$ ls -al /var

Is it enough to just specify a directory outside of that and it lands on the VM?

Pretty much yes, as long as it's not root ('/') as I mentioned above. I tried -v /foo:/foo and it simply creates that directory on the vm

where on the VM would be an appropriate place to store the PV directories?

I'd follow the pattern we already use by default, something under '/var/lib/', like: '/var/lib/profiles/[profile]/pv'

And finally, how can I delete the PV directories on the VM when I am destroying a profile?

You can just run a container that mounts the parent directory and remove that directory. For example:

docker run --entrypoint=/bin/bash --rm -v /var:/var openshift/origin:latest -c "rm -rf /var/lib/profiles/myprofile"

Same for copying it to the host drive... just create a container with a given name that mounts the directory, use docker cp to copy the contents, and then delete the container:

docker create --name=mycontainer -v /var:/var openshift/origin:latest
docker cp mycontainer:/var/lib/profiles/myprofile/pv c:\users\graham\profiles\myprofile
docker rm mycontainer

csrwng · 2017-02-03T14:10:40Z

This is an issue that was introduced in v3.0 of etcd when they changed how they initialize their data.
They already have an issue open for it, which they have labeled as an enhancement request. etcd-io/etcd#6984
In the meantime, we can document that you will need to use a linux vm directory for persistence.

GrahamDumpleton · 2017-02-23T10:18:09Z

@csrwng Your workaround highlights a further obscure bug.

If I do:

docker run --rm -v /var:/var busybox mkdir -p /var/lib/powershift/profiles/default/data

so that the directory exists in the VM, and then run:

oc cluster up --host-data-dir "/var/lib/powershift/profiles/default/data" --host-config-dir "/C/Users/Graha/PowerShift/profiles/default/config" --use-existing-config --forward-ports=false

so that the host data directory is using that directory from the VM, but the host config directory still comes from the Windows file system via /C/..., you get the the error:

PS C:\Users\Graha> oc cluster up --host-data-dir "/var/lib/powershift/profiles/default/data" --host-config-dir "/C/Users/Graha/PowerShift/profiles/default/config" --use-existing-config --forward-ports=false
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.4.1 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
   Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
   Using 10.0.75.2 as the server IP
-- Starting OpenShift container ... ERROR: Error reading next tar header: API error (500): {"message":"mkdir /C: file exists"}
ERROR: Error extracting tar stream

   Creating initial OpenShift configuration
FAIL
   Error: could not create OpenShift configuration
   Caused By:
     Error: cannot start container df07360e260a083c91e8b06a34b45e158bab33a045a72e55995bd43aea3f420f
     Caused By:
       Error: API error (500): {"message":"mkdir /C: file exists"}

So it doesn't like mixing directories for data and config directories such that coming from different file systems.

I was going to leave the config directory on the Windows host for now but can't because of this. Have to do even more changes down to have it inside as well, as will have to inject special scripts into the container to run as can't just copy them into the config directory on Windows host, but then run from inside of container.

jorgemoralespou · 2017-02-23T10:31:27Z

@GrahamDumpleton maybe the time to start just pushing minishift. All these problems will not exist there, and there's little push from our engineering apart from Cesar into get fixes to oc cluster.

csrwng · 2017-02-23T13:26:15Z

@GrahamDumpleton that seems like a bug in our code, will take a look

LeoLanceb · 2017-06-14T04:02:25Z

@csrwng Hi, is there any progress on this? I'm trying to run openshift via "oc cluster up" on my Win10 laptop but having the same problem. I'm using oc v1.5.1+7b451fc. Luckily, the "--version=v1.3.2" workaround solves the wal copy permissions problem.

GrahamDumpleton · 2017-06-14T06:09:18Z

@LeoLanceb I would suggest you look at my wrapper for oc cluster up. It runs on Windows and takes care of this issue when wanting to have persistent profiles.

For installation of the wrapper see:

https://pypi.python.org/pypi/powershift-cli

Some details of the commands arround oc cluster up can be found at:

https://github.com/getwarped/powershift-cluster

It could do with some more documentation, which I have in email somewhere, but once installed, use:

powershift cluster up

to start up the cluster and:

powershift cluster down

to stop it.

You can have multiple saved profiles, but can only run one at a time.

csrwng · 2017-06-27T19:28:10Z

I am closing this issue since we don't plan to workaround the etcd root issue.

glennodickson · 2017-08-17T14:12:47Z

I'm using:

VirtualBox 5.1.26
Kubernetes v1.5.2+43a9be4
openshift v1.5.0+031cbe4

Didn't work for me using --host-data-dir (and others) :

oc cluster up  --logging=true --metrics=true --docker-machine=openshift --use-existing-config=true --host-data-dir=/vm/data --host-config-dir=/vm/config --host-pv-dir=/vm/pv --host-volumes-dir=/vm/volumes

With output:

-- Checking OpenShift client ... OK
-- Checking Docker client ...
   Starting Docker machine 'openshift'
   Started Docker machine 'openshift'
-- Checking Docker version ...
   WARNING: Cannot verify Docker version
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.5.0 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
   Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
   Using docker-machine IP 192.168.99.100 as the host IP
   Using 192.168.99.100 as the server IP
-- Starting OpenShift container ...
   Starting OpenShift using container 'origin'
FAIL
   Error: could not start OpenShift container "origin"
   Details:
     Last 10 lines of "origin" container log:
     github.com/openshift/origin/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc4202a1600, 0x42b94c0, 0x1f, 0xc4214d9f08, 0x2, 0x2)
        /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x16a
     github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend.newBackend(0xc4209f84c0, 0x33, 0x5f5e100, 0x2710, 0xc4214d9fa8)
        /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend/backend.go:106 +0x341
     github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend.NewDefaultBackend(0xc4209f84c0, 0x33, 0x461e51, 0xc421471200)
        /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend/backend.go:100 +0x4d
     github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver.NewServer.func1(0xc4204bf640, 0xc4209f84c0, 0x33, 0xc421079a40)
        /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver/server.go:272 +0x39
     created by github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver.NewServer
        /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver/server.go:274 +0x345

Openshift writes to the directories /vm/... (also defined in VirtualBox) but successfully won't start.

@bparees or @GrahamDumpleton : Just wondering if you would have any ideas as I've seen a couple of your conversations that touched on this problem please?

Thanks.

Sawon90 · 2018-05-13T09:19:50Z

@glennodickson I faced the same problem with vagrant/virtualbox. I solve it by not using any shared directory (eg. windows file system / autosync folder). Just use any folder inside your virtual machine (I use my home directory: /home/vagrant/etcd-data).

smarterclayton assigned csrwng Jan 23, 2017

GrahamDumpleton mentioned this issue Jan 29, 2017

oc cluster for Origin 1.4.0 fails on Windows. #12601

Closed

pweil- added component/composition kind/bug Categorizes issue or PR as related to a bug. priority/P1 labels Feb 2, 2017

csrwng added priority/P2 and removed priority/P1 labels Feb 3, 2017

bparees added component/cluster-up and removed component/composition labels Jun 27, 2017

csrwng closed this as completed Jun 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oc cluster with oc 1.5.0.alpha.2 fails when persistent data directory used. #12602

oc cluster with oc 1.5.0.alpha.2 fails when persistent data directory used. #12602

GrahamDumpleton commented Jan 22, 2017

csrwng commented Jan 23, 2017

csrwng commented Jan 23, 2017

GrahamDumpleton commented Jan 23, 2017

GrahamDumpleton commented Jan 24, 2017

csrwng commented Jan 24, 2017

csrwng commented Jan 24, 2017

csrwng commented Jan 24, 2017

csrwng commented Jan 24, 2017

csrwng commented Jan 24, 2017

csrwng commented Jan 24, 2017

jorgemoralespou commented Jan 24, 2017

csrwng commented Jan 24, 2017

jorgemoralespou commented Jan 24, 2017 via email

rokkanen commented Jan 24, 2017 •

edited

Loading

GrahamDumpleton commented Jan 29, 2017

csrwng commented Jan 30, 2017

GrahamDumpleton commented Jan 31, 2017

csrwng commented Jan 31, 2017 •

edited

Loading

csrwng commented Feb 3, 2017

GrahamDumpleton commented Feb 23, 2017

jorgemoralespou commented Feb 23, 2017

csrwng commented Feb 23, 2017

LeoLanceb commented Jun 14, 2017

GrahamDumpleton commented Jun 14, 2017

csrwng commented Jun 27, 2017

glennodickson commented Aug 17, 2017 •

edited

Loading

Sawon90 commented May 13, 2018 •

edited

Loading

oc cluster with oc 1.5.0.alpha.2 fails when persistent data directory used. #12602

oc cluster with oc 1.5.0.alpha.2 fails when persistent data directory used. #12602

Comments

GrahamDumpleton commented Jan 22, 2017

Version

Steps To Reproduce

Current Result

Expected Result

csrwng commented Jan 23, 2017

csrwng commented Jan 23, 2017

GrahamDumpleton commented Jan 23, 2017

GrahamDumpleton commented Jan 24, 2017

csrwng commented Jan 24, 2017

csrwng commented Jan 24, 2017

csrwng commented Jan 24, 2017

csrwng commented Jan 24, 2017

csrwng commented Jan 24, 2017

csrwng commented Jan 24, 2017

jorgemoralespou commented Jan 24, 2017

csrwng commented Jan 24, 2017

jorgemoralespou commented Jan 24, 2017 via email

rokkanen commented Jan 24, 2017 • edited Loading

GrahamDumpleton commented Jan 29, 2017

csrwng commented Jan 30, 2017

GrahamDumpleton commented Jan 31, 2017

csrwng commented Jan 31, 2017 • edited Loading

csrwng commented Feb 3, 2017

GrahamDumpleton commented Feb 23, 2017

jorgemoralespou commented Feb 23, 2017

csrwng commented Feb 23, 2017

LeoLanceb commented Jun 14, 2017

GrahamDumpleton commented Jun 14, 2017

csrwng commented Jun 27, 2017

glennodickson commented Aug 17, 2017 • edited Loading

Sawon90 commented May 13, 2018 • edited Loading

rokkanen commented Jan 24, 2017 •

edited

Loading

csrwng commented Jan 31, 2017 •

edited

Loading

glennodickson commented Aug 17, 2017 •

edited

Loading

Sawon90 commented May 13, 2018 •

edited

Loading