-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: file is locked, existing lock on file: exclusive on big files upload #1928
Comments
I think we have the same issue on multiple NC Versions(NC 22.2.3) the latest being NC 24.0.2. Setup:
It is possible to see, that the files are correctly stored on storage, before this happens. From the nextcloud.log in debug mode and with
Other threads can be found with only one workaround, which is: Disable File locking or upping the Filelock ttl (did not work for us). Some other Threads I've found: Update:
didn't change anything... |
I've also noticed that desktop client (Linux version) and |
Still exists on 24.0.6 |
The issue with the MOVE Command that fails after 60 Seconds has to be the Ingress (in our case nginx).
Which seem to improve the situation and notably the speed of the uploads. But with the caveat that we could not upload files bigger than 10Gbs so far with our config (so surely this config is still not optimal). However, with the MOVE command from the browser, the Pod is busy and using all its resources for that and Apache is failing to serve the status.php, which resulted in our case in a killed Pod (failing Readiness Probes). As a workaround, we upped the timeout time in the probes. |
|
This comment has been minimized.
This comment has been minimized.
Hi, my installation was exactly as the above one, with NC 24.0.5, I was experimenting the problem and I stumbled upon this issue. On 24.0.5 I tried to upload a 2.2GB file, and got the locking error mentioned above. The file was correctly uploaded though (checked: I downloaded it again and checked it matched the original). I upped the locking TTL to 12h (as suggested above), but still the locking error after uploading persists. As suggested I upgraded my NC to 25.0.3 (Helm chart version 3.5.0), but the problem is still the same: upload a 2.2 GB file, lock error but file successfully uploaded. |
Sounds like you are running into some kind of timeout. Can you make sure that all of the mentioned values in https://docs.nextcloud.com/server/latest/admin_manual/configuration_files/big_file_upload_configuration.html are correctly applied? |
These are my current values:
And also:
These values seem enough for a 2.2GB upload, right? It does not seem a timeout to me... the upload finishes successfully (but shows the locking error), and the file appears in the web interface afterwards. Indeed, if I download the file, the content matches the original byte for byte. |
Did you also adjust web server or proxy timeouts? |
My proxy is a nginx Ingress in Kubernetes, but the timeouts that can be configured are for establishing connections, not for limiting the duration of those connections. Again, the upload finishes successfully (HTTP wise) so there seems not to be any timeout or maximum data size problems in the data path...? |
What we discovered on k8s with almost unlimited timeoutsettings:
|
This is exactly my failure scenario. |
Some more tests on this front: I installed a temporary VM with:
With this setup, the same upload as indicated above (2.2GB) works flawlessly, and shows no lock errors after uploading. After doing this, I modified the installation above:
I retried the same 2.2GB upload, and again it worked flawlessly, with no locking errors shown after upload. |
And another new test: another temporary VM with:
With this setup, the same upload as indicated above (2.2GB) works flawlessly, and shows no lock errors after uploading. So:
Clues so far point at some specific problem with the Helm/Kubernetes deployment. One of my team mates suggested it may have something to do with Redis being deployed in cluster mode in Kubernetes? There are several Pods for it, while in a VM or AIO install only a single Redis instance is deployed... I'll test and report back. |
New test: deployed in our K8S staging cluster adding redis configuration The error still happens with single node Redis deployment. |
So basically it looks like the docker image or helm chart might have an issue, right? |
Well, according to my tests (see previous comments), the Docker image is fine, it installs and runs with no errors on a regular instance (Rocky VM). Error seems to be specific to K8S deployment. |
All right. Shall I move over to the helm chart repo then? |
That's my current line of investigation, yes. BTW, @Doriangaensslen said that the problem happened also with the DOcker AIO image, but I did not reproduce it there. |
Ah, you tried the AIO image instead of the "normal" image? That could still lead to the docker image being the culprit here as AIO is using its own image. Moving to the docker repo then for now. |
I did not know there were 2 docker images, now I know, thanks :-) |
New test: deployed Nextcloud Helm chart with Redis disabled, and retried the 2.2GB upload. An error about locking appears, though different from the previous one, and more terse: ""TEST20_3.bin" is locked" As with the previous errors, if you refresh, the file has been correctly uploaded and appears in the web UI. So it seems we can discard Redis as the source of the problem, it is somewhere else. |
It's also what we experienced. We also tried with the other cashing methods, no luck there. |
We noticed restarts, because the readiness probe failed when the file was stiched toghether, Because the pod was busy processing it. But still after resolving that, we still have this problem. |
i have the same issue "file locked: exclusive" using iONOS as host. |
I have intermittent issues on a "normal" (non-docker, non-k8s) setup, also using Redis. Didn't occur in 22.x, but since I installed 25 I get that every now and then, without meaningful ways to replicate it. It happens often when doing the auto-backup of photos via the iOS app - which is interesting as the files do not exist in the first place. I am tempted to disable file-locking, as this is almost a single-user instance (well, the other user is a family member, and it's really only for auto-backups of photos). Still, it'd be better if this would help detecting any sort of issue here. Happy to help, as always. |
I am experiencing essentially the same problem, running Nextcloud 27.0.1 directly on an Apache instance, without any containers. I have kept the upload and post sizes at their low defaults of 50M, due to resource limitations on the server. Files are stored in S3. Appearing in the log file...
Similarly, shown as a message popup on the browser page...
|
Hi there, we've the same problem with 27.0.1-apache image. We're using a K8S Cluster hosted at IONOS. We use an nginx-ingress controller in front of the pod with the following annotations:
|
I think this is an issue in the Files client embedded in It appears the reason that the client reported in this thread as being okay ( The ones that fail (web, iOS, Android) don't seem to from a cursory look, but I haven't dug deeply enough say with certainty yet:
Open questions:
Assuming treating 423 as a soft failure is at least one part of the correct answer, changes/confirmation should be coordinated across all clients (other than Desktop). |
I am under the impression that iOS will retry up to a point (or at least used to), but recently I found a set of files that were supposed to have been automatically backed up but weren't and I had to force a full photo library upload. Might not be related at all, but since I still see 423 blocking things every now and then I thought I'd chime in. |
You may be on the good track wrt storage latency: the (most) problematic installations seem to be on docker, kubernetes and using S3 storage. All of those have higher (or much higher!) latency than local disks. I hope you can get this sorted out soon. Good luck :-) |
Wrong. This is a proxy, webserver and (most likely) configuration issue, albeit a subtle one. Manually removing file locking will result in messed up data. tl;dr — increase web server timeouts and/or get rid of slow storage, at least for caching. There is a lot to unpack. There's a Chunked file upload v2: it's a way for NC to upload large files. By default, NC uses a reasonable chunk size (~10M). Even in the worst case, with buffering enabled in the web server (e.g. nginx) AND buffering for FastCGI, a chunk will first end up in PHP's temporary location. Once all the pieces are in place, the client issues the
So where and why can this process be broken? First and foremost, all that server-side byte shuffling after the I reiterate: overall, this seemingly fragile process works as expected, and locking works as expected. Modern storage can easily deliver 300MB/s sequential I/O, so assembling, say, a 4G file will fit nicely even in default humble 30s timeout. If the NC data folder lives on a medium with scarce IOPS (NFS, Longhorn, whatever), then even trivial operations that normally happen in the boundaries of a disc will become network calls, so moving chunks back and forth will turn uploads of anything larger than 1G into PITA. To fix this problem and make efficient use of all available bandwidth, one needs to
With all this tweaks NC can easily saturate a 1Gbps link, even on modest hardware. It just needs to be configured properly. Unfortunately, the default configuration isn't the best. P.S. The mobile client defaults to small chunks with no way to configure them. That's why video uploads on 300Mbps LTE are painfully slow. |
Thanks for the thoughtful and detailed response, really appreciative of your time and effort! Just wanted to say that I saw it on a single very small file (a few hundred kilobyte sfx for a game I'm working on) out of a folder of much bigger several GB files. I copy-pasted the file elsewhere then simply deleted the file then added it again and it worked first-time, so not convinced that it's an IOPS problem, think it's a bug of some description. I've been running Nextcloud a few years now and never encountered this issue, even uploading much larger files (like backups/snapshots!). So whilst I agree with a lot of what you're saying, I don't think it's necessarily applicable to every instance of people seeing file locking. Nothing in the log either that shines any light on it. |
Thank you for all the helpful hints. However... I come bearing bad news. I have NFS-backed storage, yes, it's not the fastest. However: No timeouts in Log, only 405 and 423 (Locked), btw. |
I have this issue on a fixed set of rather small files (music files, so 20-100MB), while I had no issues uploading several multi-gigabyte files. I'm running the 28.0.2-apache image in k8s behind ingress-nginx (deployed with nextcloud/helm) with user files on ceph-rgw via S3. I've significantly increased the timeouts on the ingress, but that did not change anything. |
also: when I copy one of the affected files, the copy is uploaded without issues. |
Deleting the /chunking-v2 keys in redis gets rid of this shared lock issue (which seems odd), but the desktop client still tries to upload all affected files (~200). During that sync new /chunking-v2 keys show up which will lead to failed uploads due to a shared lock once again, however not as many. |
I am experiencing the same issue ( I did some tests and I can reliably reproduce this with any file that is larger than exactly 10MB using the web-ui or Linux Client. Up to 10MB it's quite fast but larger files upload very slow, show http 500 responses and eventually lead to the lock error. I also observe that my instance is often very slow even to list the few files that I have stored. I therefore conclude that @Orhideous's summary is correct in my case. I will get in touch with Ionos support. |
@henrythasler Any solution from ionos? i think i have the same issue but only with the windows client, linux works fine. |
@DaDirnbocher No, not yet. I talked to a very helpful and technically skilled support guy last week, explained the problem and provided some additional info/logs. I received some feedback yesterday that the "performance of my Nextcloud was increased" and I should try again. Unfortunately not much had changed. UI was a bit less sluggish but the error is still there. For what it's worth for anyone else, here is a log entry that shows up with the error: lock.json I should mention that this error only occurs when overwriting an already existing file. Fresh upload always work fine. |
I'm beginning to suspect that this issue here is somehow connected to nextcloud/server#6029 where people also report very bad server performance when incoming federated shares are not reachable (as is the case on my instance). |
Btw small update from me: I switch from my ceph S3 to cephfs (everything else is the same) and now all issues are gone. To me it seems like the latency of the S3 backend might be too high for some reason. I don't think nextcloud/server#6029 is related. |
It seems the nexcloud-client has some influence on the multipart-upload. As a workaround/hack I have set |
Bug description
Getting error
"zeros_1gb.new.dd" is locked, existing lock on file: exclusive
when uploading large files (1Gb+) with browser.Setup:
/var/www/html
) stored in RWO volume, 10 pods use this volume on one node.filelocking.debug
set totrue
Logs viewed through Kibana show (pod_name - log_message, notice MOVE request, second lock attempt is done by different pod if it matters):
From browser the process looks like the following:
"MOVE /remote.php"
is done which completes with423
(Locked) response. (I suspect this code making the request).part
suffixWhen using desktop client to sync the same file server-side logs display the same sequence but the client (probably) does retry for this
MOVE
request which succeeds.Steps to reproduce
Expected behavior
Even large files uploaded successfully.
Solution (?): Locking retried on server side when executing the
MOVE
request.Installation method
Official Docker image
Operating system
Debian/Ubuntu
PHP engine version
PHP 8.0
Web server
Nginx
Database engine version
PostgreSQL
Is this bug present after an update or on a fresh install?
Fresh Nextcloud Server install
Are you using the Nextcloud Server Encryption module?
Encryption is Disabled
What user-backends are you using?
Configuration report
List of activated Apps
Nextcloud Signing status
No response
Nextcloud Logs
No response
Additional info
No response
The text was updated successfully, but these errors were encountered: