Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Error 500 + "Could not load roles" #558

Closed
euh2 opened this issue May 13, 2024 · 8 comments
Closed

HTTP Error 500 + "Could not load roles" #558

euh2 opened this issue May 13, 2024 · 8 comments

Comments

@euh2
Copy link

euh2 commented May 13, 2024

I could need some pointers on how to further debug this. I try to deploy this chart, but am stuck at "Internal Server Error". It seems to be connected to when storageusers is set to s3ng. If I set it to ocis and redeploy the chart from scratch, I can login.

First some logs and then my config:

ingress-nginx

12.34.56.78 - - [13/May/2024:21:16:18 +0000] "GET /ocs/v1.php/cloud/user HTTP/2.0" 500 0 "https://ocis.domain.tld/web-oidc-callback?code=qgKy4tNlcX7EYY_n0NJAXT9IKvhzfYuI&scope=openid%20profile%20email&session_state=eeff02d8d4a7dbce9d755cd127eee89380992734c126ee12ca0ae2632fe6fe15.cXW_U1gVwcSYs4BQGfzw9V4IkK6YRbNh33OcWs-i838&state=5e0962bc8abd431cbd7a09b6ed7d3259" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36" 1258 0.073 [ocis-proxy-9200] [] 10.233.182.96:9200 0 0.074 500 6310a8cf-ecac-41ab-94bf-fb6e65cf7189

proxy-..-pod:

{"level":"info","service":"proxy","proto":"HTTP/1.1","request-id":"00380f3e-52b6-4dc2-a8db-7d5cb2ab88da","traceid":"503282f7958ea7331a359732389d547e","remote-addr":"12.34.56.78","method":"GET","status":200,"path":"/ocs/v1.php/cloud/capabilities","duration":3.193941,"bytes":3105,"time":"2024-05-13T21:16:18Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/accesslog.go:34","message":"access-log"}
{"level":"info","service":"proxy","proto":"HTTP/1.1","request-id":"f9c20b29dd32806513a4b7693748717e","traceid":"67c9e2f9d84664033b44c34b48e7fcb5","remote-addr":"xxx.xxx.63.230","method":"GET","status":200,"path":"/konnect/v1/userinfo","duration":37.805854,"bytes":265,"time":"2024-05-13T21:16:18Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/accesslog.go:34","message":"access-log"}
{"level":"error","service":"proxy","error":"{\"id\":\"com.owncloud.api.settings\",\"code\":502,\"detail\":\"circuit breaker is open\",\"status\":\"Bad Gateway\"}","time":"2024-05-13T21:16:18Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/userroles/defaultrole.go:38","message":"Could not load roles"}
{"level":"error","service":"proxy","error":"{\"id\":\"com.owncloud.api.settings\",\"code\":502,\"detail\":\"circuit breaker is open\",\"status\":\"Bad Gateway\"}","time":"2024-05-13T21:16:18Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/account_resolver.go:140","message":"Could not get user roles"}
{"level":"info","service":"proxy","proto":"HTTP/1.1","request-id":"6310a8cf-ecac-41ab-94bf-fb6e65cf7189","traceid":"0a15fd98788bd0f25562bf66bd5b62dd","remote-addr":"12.34.56.78","method":"GET","status":500,"path":"/ocs/v1.php/cloud/user","duration":70.088662,"bytes":0,"time":"2024-05-13T21:16:18Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/accesslog.go:34","message":"access-log"}
{"level":"info","service":"proxy","proto":"HTTP/1.1","request-id":"64f0ace2a798d5b14ec4bfd177c2af50","traceid":"f3ba7e9868dedf663bcb42f7b76267f9","remote-addr":"12.34.56.78","method":"GET","status":200,"path":"/themes/owncloud/assets/logo.svg","duration":0.923335,"bytes":9896,"time":"2024-05-13T21:16:18Z","line":"github.com/owncloud/ocis/v2/services/proxy/pkg/middleware/accesslog.go:34","message":"access-log"}

Here it looks like an Gateway Error (502) which results in "Could not load roles".

My values.yaml:

values.yaml
externalDomain: ocis.domain.tld
ingress:
  enabled: true
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    cert-manager.io/cluster-issuer: "issuer"
    nginx.ingress.kubernetes.io/proxy-body-size: "0"
    # nginx.ingress.kubernetes.io/backend-protocol: HTTP
    # nginx.ingress.kubernetes.io/default-backend: "proxy"
    nginx.ingress.kubernetes.io/custom-http-errors: "404,415,431"
    nginx.ingress.kubernetes.io/client-body-buffer-size: 512k
    nginx.ingress.kubernetes.io/client-header-buffer-size: 512k
    nginx.ingress.kubernetes.io/http2-max-header-size: 512k
    nginx.ingress.kubernetes.io/large-client-header-buffers: 4 1m
    nginx.ingress.kubernetes.io/proxy_buffers: 4 256k
    nginx.ingress.kubernetes.io/proxy-buffer-size: 128k
    nginx.ingress.kubernetes.io/proxy-busy-buffers-size: 256k
    nginx.ingress.kubernetes.io/proxy-read-timeout: "5000"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "5000"
    nginx.ingress.kubernetes.io/server-snippet: |
      client_header_buffer_size 512k;
      large_client_header_buffers 4 1m;
  tls:
    - hosts:
        - ocis.domain.tld
      secretName: ocis-tls
services:
  idm:
    persistence:
      enabled: true
      size: 1Gi
      storageClassName: hot
  nats:
    persistence:
      enabled: true
      size: 1Gi
      storageClassName: hot
  search:
    persistence:
      enabled: true
      size: 1Gi
      storageClassName: hot
  storagesystem:
    persistence:
      enabled: true
      size: 1Gi
      storageClassName: hot
      accessModes:
        - ReadWriteMany
  storageusers:
    storageBackend:
      # -- Configures the storage driver. Possible values are "ocis" and "s3ng".
      # The oCIS driver stores all data in the persistent volume if persistence is enabled.
      # The S3NG driver stores all metadata in the persistent volume and uploads blobs to s3 if persistence is enabled.
      driver: s3ng
      driverConfig:
        s3ng:
          # -- Maximum number of concurrent go-routines. Higher values can potentially get work done faster but will also cause more load on the system.
          maxConcurrency: 100
          # -- Metadata backend to use for the S3NG storage driver. Valid values are: "messagepack", "xattrs".
          metadataBackend: messagepack

          # The S3NG driver needs an existing S3 bucket with following permissions:
          # {
          #     "Version": "2012-10-17",
          #     "Statement": [
          #         {
          #             "Sid": "ListBucket",
          #             "Effect": "Allow",
          #             "Action": [
          #                 "s3:ListBucket"
          #             ],
          #             "Resource": [
          #                 "arn:aws:s3:::bucket-name"
          #             ]
          #         },
          #         {
          #             "Sid": "ActionsInBucketContext",
          #             "Effect": "Allow",
          #             "Action": [
          #                 "s3:*Object",
          #                 "s3:*MultipartUpload",
          #                 "s3:ListMultipartUploadParts"
          #             ],
          #             "Resource": [
          #                 "arn:aws:s3:::bucket-name/*"
          #             ]
          #         }
          #     ]
          # }
          # -- S3 endpoint to use for the S3NG driver. Only used if driver is set to "s3ng".
          endpoint: https://minio.minio-hot.svc
          # -- S3 region to use for the S3NG driver. Only used if driver is set to "s3ng".
          region: default
          # -- S3 bucket to use for the S3NG driver. Only used if driver is set to "s3ng".
          bucket: ocis-storageusers
          # -- Disable sending content sha256 when copying objects to S3.
          disableContentSHA256: false
          putObject:
            # -- Disable multipart uploads when copying objects to S3
            disableMultipart: false
            # -- Send a Content-MD5 header when copying objects to S3.
            sendContentMD5: true
            # -- Always precreate parts when copying objects to S3.
            concurrentStreamParts: true
            # -- Number of concurrent uploads to use when copying objects to S3.
            numThreads: 4
            # -- Part size for concurrent uploads to S3.
            partSize: 0
    persistence:
      enabled: true
      size: 20Gi
      storageClassName: hot
      accessModes:
        - ReadWriteMany
  thumbnails:
    persistence:
      enabled: true
      size: 10Gi
      storageClassName: hot
  web:
    persistence:
      enabled: true
      size: 1Gi
      storageClassName: hot

secretRefs:
  s3CredentialsSecretRef: hot-obst-keys

There are quiet a few annotations. That's because I first got HTTP Error 431, too large header. Managed to get past that. But now I am at 500 or 502. As mentioned, probably connected to storageusers driver setting.

One thing I wounder if there exists a insecure setting for the s3ng driver? Maybe the 502 error comes from ocis not being able to verify the minio certificate?

Any hints?

@wkloucek
Copy link
Contributor

Can you please check if you see a line like the following on the settings pods when you try to log in?

settings-576f978c9f-ld5c8 settings {"level":"error","service":"ocis","error":"error: not found: create container: error: not found: f1bdd61a-da7c-49fc-8203-0558109d1b4f!f1bdd61a-da7c-49fc-8203-0558109d1b4f/settings","time":"2024-05-16T05:50:52Z","line":"github.com/owncloud/ocis/v2/services/settings/pkg/store/metadata/store.go:70","message":"error initializing metadata client"}

@euh2
Copy link
Author

euh2 commented May 16, 2024

Yes, I do!

2024-05-16T07:13:01Z ERR error initializing metadata client error="internal error: error creating space" line=github.com/owncloud/ocis/v2/services/settings/pkg/store/metadata/store.go:70 service=ocis

@euh2
Copy link
Author

euh2 commented May 16, 2024

And after a few minutes doing nothing, these show up in the settings pod:

2024-05-16T07:18:22Z ERR error initializing metadata client error="internal error: delete:msgpack : invalid code 0 decoding map" line=github.com/owncloud/ocis/v2/services/settings/pkg/store/metadata/store.go:70 service=ocis
2024-05-16T07:18:22Z ERR error initializing metadata client error="internal error: delete:msgpack : invalid code 0 decoding map" line=github.com/owncloud/ocis/v2/services/settings/pkg/store/metadata/store.go:70 service=ocis
2024-05-16T07:18:22Z ERR error initializing metadata client error="internal error: delete:msgpack : invalid code 0 decoding map" line=github.com/owncloud/ocis/v2/services/settings/pkg/store/metadata/store.go:70 service=ocis
2024-05-16T07:19:07Z ERR error initializing metadata client error="internal error: delete:msgpack : invalid code 0 decoding map" line=github.com/owncloud/ocis/v2/services/settings/pkg/store/metadata/store.go:70 service=ocis

@wkloucek
Copy link
Contributor

In my case I reset the data on the storagesystem-data PVC. If you do that, you'd also need to reset the storagesystem cache. If you use an external NATS installation (see

- name: nats
namespace: ocis-nats
chart: nats/nats
version: 1.1.11
values:
), you can purge that cache like this: kubectl exec -n ocis-nats deployments/nats-box -- nats stream prune --all KV_storage-system

@euh2
Copy link
Author

euh2 commented May 16, 2024

Using the deployment example with external nats brought me a bit further. Now I am able to log in. But now I'm having trouble to list storage spaces:

2024-05-16T13:35:18Z ERR failed to list storage spaces error="error reading user index: stat /var/lib/ocis/storage/users/indexes/by-user-id/6b28f78c-ef67-4ff9-9e2f-a1c90cc48875.mpk: no such file or directory" filters=[{"Term":{"SpaceType":"project"},"type":4},{"Term":{"User":{"idp":"https://ocis.domain.tld","opaque_id":"6b28f78c-ef67-4ff9-9e2f-a1c90cc48875"}},"type":6}] line=github.com/cs3org/reva/[email protected]/internal/grpc/services/storageprovider/storageprovider.go:575 pkg=rgrpc service=storage-users status={"code":15,"message":"error listing spaces","trace":"866fa6adf2ff392bba234d7b5429d0de"} traceid=866fa6adf2ff392bba234d7b5429d0de

When I look at the directory, it is indeed empty:

~ $ ls -la /var/lib/ocis/storage/users/indexes/by-user-id/
total 8
drwx------    2 ocis-use ocis-gro      4096 May 16 13:32 .
drwx------    5 ocis-use ocis-gro      4096 May 16 13:32 ..
~ $

Inside web I also see no Files, probably because ocis cannot list storage spaces.
screenshot

In my case I reset the data on the storagesystem-data PVC

What do you mean by that? I try to deploy ocis from scratch, so I destroy everything and deploy again with some changed values. I shouldn't need to reset. Or do I?


Also when I use Monitor > Trace > S3 inside Minio, I don't see anything coming from ocis. Although the connection details are correct. I can e.g. do a curl from the storageusers-pod:

~ $ curl --verbose http://minio.minio-hot.svc.cluster.local
* Host minio.minio-hot.svc.cluster.local:80 was resolved.
* IPv6: (none)
* IPv4: 10.96.121.235
*   Trying 10.96.121.235:80...
* Connected to minio.minio-hot.svc.cluster.local (10.96.121.235) port 80
> GET / HTTP/1.1
> Host: minio.minio-hot.svc.cluster.local
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< Accept-Ranges: bytes
< Content-Length: 254
< Content-Type: application/xml
< Server: MinIO
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< Vary: Origin
< Vary: Accept-Encoding
< X-Amz-Id-2: 0a0da9bb6b6d3c38dfeab2cc7e8e6ed0fc8ccf27dcf29c14fd96819b66189b4d
< X-Amz-Request-Id: 17CFFCA1BA8B8480
< X-Content-Type-Options: nosniff
< X-Xss-Protection: 1; mode=block
< Date: Thu, 16 May 2024 13:55:54 GMT
<
<?xml version="1.0" encoding="UTF-8"?>
* Connection #0 to host minio.minio-hot.svc.cluster.local left intact
<Error><Code>AccessDenied</Code><Message>Access Denied.</Message><Resource>/</Resource><RequestId>17CFFCA1BA8B8480</RequestId><HostId>0a0da9bb6b6d3c38dfeab2cc7e8e6ed0fc8ccf27dc
~ $

The curl above shows up in the Minio Trace, but nothing else from ocis is showing up. Hmmm!

@euh2
Copy link
Author

euh2 commented May 16, 2024

When I combine the nats example with the s3 example it works! I will dig a bit deeper tomorrow.

@wkloucek
Copy link
Contributor

Switching from the builtin nats to the external nats removes all state (thus acting like the purge command I provided).

In my case I reset the data on the storagesystem-data PVC

What do you mean by that? I try to deploy ocis from scratch, so I destroy everything and deploy again with some changed values. I shouldn't need to reset. Or do I?

You shouldn't. But if you assume that you're deploying from scratch, you should challenge / verify if that's really the case. Eg. you should make sure that PersistentVolumes are gone and in the case you use Minikube with the Hostpath provisioner, you actually need to clean up manually ( minikube ssh 'sudo rm -rf /tmp/hostpath-provisioner/*')

@euh2
Copy link
Author

euh2 commented May 17, 2024

I think I found the root cause for these issues in my deployment. I run Longhorn and their rwx access mode uses NFS. My storageClass was configured with these NFS options: nfsOptions: "vers=4.2,noresvport,softerr,timeo=600,retrans=5". I removed them and use Longhorn defaults instead. That seems to work!

Thanks @wkloucek for your assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants