HTTP Error 500 + "Could not load roles" #558

euh2 · 2024-05-13T21:59:49Z

I could need some pointers on how to further debug this. I try to deploy this chart, but am stuck at "Internal Server Error". It seems to be connected to when storageusers is set to s3ng. If I set it to ocis and redeploy the chart from scratch, I can login.

First some logs and then my config:

ingress-nginx

12.34.56.78 - - [13/May/2024:21:16:18 +0000] "GET /ocs/v1.php/cloud/user HTTP/2.0" 500 0 "https://ocis.domain.tld/web-oidc-callback?code=qgKy4tNlcX7EYY_n0NJAXT9IKvhzfYuI&scope=openid%20profile%20email&session_state=eeff02d8d4a7dbce9d755cd127eee89380992734c126ee12ca0ae2632fe6fe15.cXW_U1gVwcSYs4BQGfzw9V4IkK6YRbNh33OcWs-i838&state=5e0962bc8abd431cbd7a09b6ed7d3259" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36" 1258 0.073 [ocis-proxy-9200] [] 10.233.182.96:9200 0 0.074 500 6310a8cf-ecac-41ab-94bf-fb6e65cf7189

proxy-..-pod:

{"level":"info","service":"proxy","proto":"HTTP/1.1","request-id":"00380f3e-52b6-4dc2-a8db-7d5cb2ab88da","traceid":"503282f7958ea7331a359732389d547e","remote-addr":"12.34.56.78","method":"GET","status":200,"path":"/ocs/v1.php/cloud/capabilities","duration":3.193941,"bytes":3105,"time":"2024-05-13T21:16:18Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/accesslog.go:34","message":"access-log"}
{"level":"info","service":"proxy","proto":"HTTP/1.1","request-id":"f9c20b29dd32806513a4b7693748717e","traceid":"67c9e2f9d84664033b44c34b48e7fcb5","remote-addr":"xxx.xxx.63.230","method":"GET","status":200,"path":"/konnect/v1/userinfo","duration":37.805854,"bytes":265,"time":"2024-05-13T21:16:18Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/accesslog.go:34","message":"access-log"}
{"level":"error","service":"proxy","error":"{\"id\":\"com.owncloud.api.settings\",\"code\":502,\"detail\":\"circuit breaker is open\",\"status\":\"Bad Gateway\"}","time":"2024-05-13T21:16:18Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/userroles/defaultrole.go:38","message":"Could not load roles"}
{"level":"error","service":"proxy","error":"{\"id\":\"com.owncloud.api.settings\",\"code\":502,\"detail\":\"circuit breaker is open\",\"status\":\"Bad Gateway\"}","time":"2024-05-13T21:16:18Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/account_resolver.go:140","message":"Could not get user roles"}
{"level":"info","service":"proxy","proto":"HTTP/1.1","request-id":"6310a8cf-ecac-41ab-94bf-fb6e65cf7189","traceid":"0a15fd98788bd0f25562bf66bd5b62dd","remote-addr":"12.34.56.78","method":"GET","status":500,"path":"/ocs/v1.php/cloud/user","duration":70.088662,"bytes":0,"time":"2024-05-13T21:16:18Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/accesslog.go:34","message":"access-log"}
{"level":"info","service":"proxy","proto":"HTTP/1.1","request-id":"64f0ace2a798d5b14ec4bfd177c2af50","traceid":"f3ba7e9868dedf663bcb42f7b76267f9","remote-addr":"12.34.56.78","method":"GET","status":200,"path":"/themes/owncloud/assets/logo.svg","duration":0.923335,"bytes":9896,"time":"2024-05-13T21:16:18Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/accesslog.go:34","message":"access-log"}

Here it looks like an Gateway Error (502) which results in "Could not load roles".

My values.yaml:

values.yaml

externalDomain: ocis.domain.tld
ingress:
  enabled: true
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    cert-manager.io/cluster-issuer: "issuer"
    nginx.ingress.kubernetes.io/proxy-body-size: "0"
    # nginx.ingress.kubernetes.io/backend-protocol: HTTP
    # nginx.ingress.kubernetes.io/default-backend: "proxy"
    nginx.ingress.kubernetes.io/custom-http-errors: "404,415,431"
    nginx.ingress.kubernetes.io/client-body-buffer-size: 512k
    nginx.ingress.kubernetes.io/client-header-buffer-size: 512k
    nginx.ingress.kubernetes.io/http2-max-header-size: 512k
    nginx.ingress.kubernetes.io/large-client-header-buffers: 4 1m
    nginx.ingress.kubernetes.io/proxy_buffers: 4 256k
    nginx.ingress.kubernetes.io/proxy-buffer-size: 128k
    nginx.ingress.kubernetes.io/proxy-busy-buffers-size: 256k
    nginx.ingress.kubernetes.io/proxy-read-timeout: "5000"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "5000"
    nginx.ingress.kubernetes.io/server-snippet: |
      client_header_buffer_size 512k;
      large_client_header_buffers 4 1m;
  tls:
    - hosts:
        - ocis.domain.tld
      secretName: ocis-tls
services:
  idm:
    persistence:
      enabled: true
      size: 1Gi
      storageClassName: hot
  nats:
    persistence:
      enabled: true
      size: 1Gi
      storageClassName: hot
  search:
    persistence:
      enabled: true
      size: 1Gi
      storageClassName: hot
  storagesystem:
    persistence:
      enabled: true
      size: 1Gi
      storageClassName: hot
      accessModes:
        - ReadWriteMany
  storageusers:
    storageBackend:
      # -- Configures the storage driver. Possible values are "ocis" and "s3ng".
      # The oCIS driver stores all data in the persistent volume if persistence is enabled.
      # The S3NG driver stores all metadata in the persistent volume and uploads blobs to s3 if persistence is enabled.
      driver: s3ng
      driverConfig:
        s3ng:
          # -- Maximum number of concurrent go-routines. Higher values can potentially get work done faster but will also cause more load on the system.
          maxConcurrency: 100
          # -- Metadata backend to use for the S3NG storage driver. Valid values are: "messagepack", "xattrs".
          metadataBackend: messagepack

          # The S3NG driver needs an existing S3 bucket with following permissions:
          # {
          #     "Version": "2012-10-17",
          #     "Statement": [
          #         {
          #             "Sid": "ListBucket",
          #             "Effect": "Allow",
          #             "Action": [
          #                 "s3:ListBucket"
          #             ],
          #             "Resource": [
          #                 "arn:aws:s3:::bucket-name"
          #             ]
          #         },
          #         {
          #             "Sid": "ActionsInBucketContext",
          #             "Effect": "Allow",
          #             "Action": [
          #                 "s3:*Object",
          #                 "s3:*MultipartUpload",
          #                 "s3:ListMultipartUploadParts"
          #             ],
          #             "Resource": [
          #                 "arn:aws:s3:::bucket-name/*"
          #             ]
          #         }
          #     ]
          # }
          # -- S3 endpoint to use for the S3NG driver. Only used if driver is set to "s3ng".
          endpoint: https://minio.minio-hot.svc
          # -- S3 region to use for the S3NG driver. Only used if driver is set to "s3ng".
          region: default
          # -- S3 bucket to use for the S3NG driver. Only used if driver is set to "s3ng".
          bucket: ocis-storageusers
          # -- Disable sending content sha256 when copying objects to S3.
          disableContentSHA256: false
          putObject:
            # -- Disable multipart uploads when copying objects to S3
            disableMultipart: false
            # -- Send a Content-MD5 header when copying objects to S3.
            sendContentMD5: true
            # -- Always precreate parts when copying objects to S3.
            concurrentStreamParts: true
            # -- Number of concurrent uploads to use when copying objects to S3.
            numThreads: 4
            # -- Part size for concurrent uploads to S3.
            partSize: 0
    persistence:
      enabled: true
      size: 20Gi
      storageClassName: hot
      accessModes:
        - ReadWriteMany
  thumbnails:
    persistence:
      enabled: true
      size: 10Gi
      storageClassName: hot
  web:
    persistence:
      enabled: true
      size: 1Gi
      storageClassName: hot

secretRefs:
  s3CredentialsSecretRef: hot-obst-keys

There are quiet a few annotations. That's because I first got HTTP Error 431, too large header. Managed to get past that. But now I am at 500 or 502. As mentioned, probably connected to storageusers driver setting.

One thing I wounder if there exists a insecure setting for the s3ng driver? Maybe the 502 error comes from ocis not being able to verify the minio certificate?

Any hints?

The text was updated successfully, but these errors were encountered:

wkloucek · 2024-05-16T05:51:44Z

Can you please check if you see a line like the following on the settings pods when you try to log in?

settings-576f978c9f-ld5c8 settings {"level":"error","service":"ocis","error":"error: not found: create container: error: not found: f1bdd61a-da7c-49fc-8203-0558109d1b4f!f1bdd61a-da7c-49fc-8203-0558109d1b4f/settings","time":"2024-05-16T05:50:52Z","line":"github.com/owncloud/ocis/v2/services/settings/pkg/store/metadata/store.go:70","message":"error initializing metadata client"}

euh2 · 2024-05-16T07:15:23Z

Yes, I do!

2024-05-16T07:13:01Z ERR error initializing metadata client error="internal error: error creating space" line=github.com/owncloud/ocis/v2/services/settings/pkg/store/metadata/store.go:70 service=ocis

euh2 · 2024-05-16T07:20:23Z

And after a few minutes doing nothing, these show up in the settings pod:

2024-05-16T07:18:22Z ERR error initializing metadata client error="internal error: delete:msgpack : invalid code 0 decoding map" line=github.com/owncloud/ocis/v2/services/settings/pkg/store/metadata/store.go:70 service=ocis
2024-05-16T07:18:22Z ERR error initializing metadata client error="internal error: delete:msgpack : invalid code 0 decoding map" line=github.com/owncloud/ocis/v2/services/settings/pkg/store/metadata/store.go:70 service=ocis
2024-05-16T07:18:22Z ERR error initializing metadata client error="internal error: delete:msgpack : invalid code 0 decoding map" line=github.com/owncloud/ocis/v2/services/settings/pkg/store/metadata/store.go:70 service=ocis
2024-05-16T07:19:07Z ERR error initializing metadata client error="internal error: delete:msgpack : invalid code 0 decoding map" line=github.com/owncloud/ocis/v2/services/settings/pkg/store/metadata/store.go:70 service=ocis

wkloucek · 2024-05-16T09:08:57Z

In my case I reset the data on the storagesystem-data PVC. If you do that, you'd also need to reset the storagesystem cache. If you use an external NATS installation (see

ocis-charts/deployments/ocis-nats/helmfile.yaml

Lines 6 to 10 in 9023c40

    
           - name: nats 
        
             namespace: ocis-nats 
        
             chart: nats/nats 
        
             version: 1.1.11 
        
             values:

), you can purge that cache like this: kubectl exec -n ocis-nats deployments/nats-box -- nats stream prune --all KV_storage-system

euh2 · 2024-05-16T14:11:36Z

Using the deployment example with external nats brought me a bit further. Now I am able to log in. But now I'm having trouble to list storage spaces:

2024-05-16T13:35:18Z ERR failed to list storage spaces error="error reading user index: stat /var/lib/ocis/storage/users/indexes/by-user-id/6b28f78c-ef67-4ff9-9e2f-a1c90cc48875.mpk: no such file or directory" filters=[{"Term":{"SpaceType":"project"},"type":4},{"Term":{"User":{"idp":"https://ocis.domain.tld","opaque_id":"6b28f78c-ef67-4ff9-9e2f-a1c90cc48875"}},"type":6}] line=github.com/cs3org/reva/[email protected]/internal/grpc/services/storageprovider/storageprovider.go:575 pkg=rgrpc service=storage-users status={"code":15,"message":"error listing spaces","trace":"866fa6adf2ff392bba234d7b5429d0de"} traceid=866fa6adf2ff392bba234d7b5429d0de

When I look at the directory, it is indeed empty:

~ $ ls -la /var/lib/ocis/storage/users/indexes/by-user-id/
total 8
drwx------    2 ocis-use ocis-gro      4096 May 16 13:32 .
drwx------    5 ocis-use ocis-gro      4096 May 16 13:32 ..
~ $

Inside web I also see no Files, probably because ocis cannot list storage spaces.

In my case I reset the data on the storagesystem-data PVC

What do you mean by that? I try to deploy ocis from scratch, so I destroy everything and deploy again with some changed values. I shouldn't need to reset. Or do I?

Also when I use Monitor > Trace > S3 inside Minio, I don't see anything coming from ocis. Although the connection details are correct. I can e.g. do a curl from the storageusers-pod:

~ $ curl --verbose http://minio.minio-hot.svc.cluster.local
* Host minio.minio-hot.svc.cluster.local:80 was resolved.
* IPv6: (none)
* IPv4: 10.96.121.235
*   Trying 10.96.121.235:80...
* Connected to minio.minio-hot.svc.cluster.local (10.96.121.235) port 80
> GET / HTTP/1.1
> Host: minio.minio-hot.svc.cluster.local
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Accept-Ranges: bytes
< Content-Length: 254
< Content-Type: application/xml
< Server: MinIO
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< Vary: Origin
< Vary: Accept-Encoding
< X-Amz-Id-2: 0a0da9bb6b6d3c38dfeab2cc7e8e6ed0fc8ccf27dcf29c14fd96819b66189b4d
< X-Amz-Request-Id: 17CFFCA1BA8B8480
< X-Content-Type-Options: nosniff
< X-Xss-Protection: 1; mode=block
< Date: Thu, 16 May 2024 13:55:54 GMT
<
<?xml version="1.0" encoding="UTF-8"?>
* Connection #0 to host minio.minio-hot.svc.cluster.local left intact
<Error><Code>AccessDenied</Code><Message>Access Denied.</Message><Resource>/</Resource><RequestId>17CFFCA1BA8B8480</RequestId><HostId>0a0da9bb6b6d3c38dfeab2cc7e8e6ed0fc8ccf27dc
~ $

The curl above shows up in the Minio Trace, but nothing else from ocis is showing up. Hmmm!

euh2 · 2024-05-16T15:04:58Z

When I combine the nats example with the s3 example it works! I will dig a bit deeper tomorrow.

wkloucek · 2024-05-17T05:47:52Z

Switching from the builtin nats to the external nats removes all state (thus acting like the purge command I provided).

In my case I reset the data on the storagesystem-data PVC

What do you mean by that? I try to deploy ocis from scratch, so I destroy everything and deploy again with some changed values. I shouldn't need to reset. Or do I?

You shouldn't. But if you assume that you're deploying from scratch, you should challenge / verify if that's really the case. Eg. you should make sure that PersistentVolumes are gone and in the case you use Minikube with the Hostpath provisioner, you actually need to clean up manually ( minikube ssh 'sudo rm -rf /tmp/hostpath-provisioner/*')

euh2 · 2024-05-17T14:07:21Z

I think I found the root cause for these issues in my deployment. I run Longhorn and their rwx access mode uses NFS. My storageClass was configured with these NFS options: nfsOptions: "vers=4.2,noresvport,softerr,timeo=600,retrans=5". I removed them and use Longhorn defaults instead. That seems to work!

Thanks @wkloucek for your assistance!

wkloucek added the Interaction:Question label May 16, 2024

euh2 closed this as completed May 17, 2024

butonic mentioned this issue Sep 25, 2024

make debugging sytem storage users miconfiguration easier owncloud/ocis#10161

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP Error 500 + "Could not load roles" #558

HTTP Error 500 + "Could not load roles" #558

euh2 commented May 13, 2024

wkloucek commented May 16, 2024

euh2 commented May 16, 2024

euh2 commented May 16, 2024

wkloucek commented May 16, 2024

euh2 commented May 16, 2024

euh2 commented May 16, 2024

wkloucek commented May 17, 2024

euh2 commented May 17, 2024

HTTP Error 500 + "Could not load roles" #558

HTTP Error 500 + "Could not load roles" #558

Comments

euh2 commented May 13, 2024

wkloucek commented May 16, 2024

euh2 commented May 16, 2024

euh2 commented May 16, 2024

wkloucek commented May 16, 2024

euh2 commented May 16, 2024

euh2 commented May 16, 2024

wkloucek commented May 17, 2024

euh2 commented May 17, 2024