You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to run the examples/v1/mnist_with_summaries code on Azure.
TFJob was created ok, but the pod stays Pending state. I believe there is an issue with the storage. After running .yamls in tfevent-volume, persistent volume is defined, but not Bound, and ultimately the worker pod never starts.
$ k -n kubeflow get pv tfevent-volume
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
tfevent-volume 10Gi RWX Retain Available standard 22m
The pvc for tfevent-volume stays in Pending state too:
$ k -n kubeflow get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
...
tfevent-volume Pending default 19m
...
And there seem to be errors with the access mode:
$ k -n kubeflow describe pvc tfevent-volume
Name: tfevent-volume
Namespace: kubeflow
StorageClass: default
Status: Pending
Volume:
Labels: app=tfjob
type=local
Annotations: volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/azure-disk
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Mounted By: mnist-tf-worker-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 116s (x13 over 14m) persistentvolume-controller Failed to provision volume with StorageClass "default": invalid AccessModes [ReadWriteMany]: only AccessModes [ReadWriteOnce] are supported
Here are the tfevent-volume/tfevent-pvc.yaml and tfevent-volume/tfevent-pv.yaml(same as in the repository):
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hello,
I wanted to run the
examples/v1/mnist_with_summaries
code on Azure.TFJob was created ok, but the pod stays
Pending
state. I believe there is an issue with the storage. After running .yamls intfevent-volume
, persistent volume is defined, but not Bound, and ultimately the worker pod never starts.The pvc for
tfevent-volume
stays in Pending state too:And there seem to be errors with the access mode:
Here are the
tfevent-volume/tfevent-pvc.yaml
andtfevent-volume/tfevent-pv.yaml
(same as in the repository):The text was updated successfully, but these errors were encountered: