Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[6.0.3] Files uploaded directly to SWIFT storage don't show up in oC #8633

Closed
ser72 opened this issue May 19, 2014 · 38 comments · Fixed by #10885
Closed

[6.0.3] Files uploaded directly to SWIFT storage don't show up in oC #8633

ser72 opened this issue May 19, 2014 · 38 comments · Fixed by #10885

Comments

@ser72
Copy link

ser72 commented May 19, 2014

Steps

  1. configure External Storage as SWIFT
  2. Upload a file directly to the SWIFT storage -- not through oC
  3. View SWIFT directory in oC and the file from Step 2 isn't there:

IMAGES
Swift storage -- the last file is uploaded directly to SWIFT. The other 2 were uploaded via oC.

bouquet_contents_missing_file

oC. Notice the missing file
capture

@PVince81
Copy link
Contributor

As discussed, might be an issue with the mtime of the SWIFT root.
From what I remember seeing is that the root mtime is always the "creation time of the bucket" and it never updates itself.
We might need to find another way to detect changes there.

@PVince81
Copy link
Contributor

CC @icewind1991

@icewind1991
Copy link
Contributor

iirc we decided not to support this use case for the swift backends since swift doesn't handle nested directories itself and thus we need to emulate it by storing additional meta-data

@PVince81
Copy link
Contributor

I checked the other headers and haven't found yet how to detect whether ANY change has been done on the SWIFT server.

As for the directory structure, the only thing that isn't emulated is the propagation of mtime from a subdirectory to parent directories. This would be an additional issue.

@PVince81
Copy link
Contributor

Looking through the docs I have the feeling that it's not possible to get a timestamp or date for the bucket itself (the root).

The alternative would be to do a full rescan, or a cron job, or something that would sync it.

As for the scanning, we might be able to use range requests on the object list. I saw in the API that it's possible to retrieve the list in a paginated manner. The scanner wouldn't need to make calls for every subdir but just take the whole list at once... still might not be good for performance and memory. Also might be a much bigger change.

I'll continue my research.

@PVince81
Copy link
Contributor

I've asked on IRC #openstack-swift and was told that it isn't currently possible.
They suggested reload the whole list and comparing the mtimes to the known ones, which is inconvenient.
And then, they said it would be cool if it was possible to sort results (the object list) by mtime, which isn't possible at the moment.

@PVince81
Copy link
Contributor

I was suggested to write a SWIFT middleware that would detect changes on the server (not sure how) and then send notifications.
If that would be possible, it means that after the initial scan we could rely on the notifications to update the file cache. But that seems like a lot of work and not at all a simple solution.

@PVince81
Copy link
Contributor

One possible workaround for the mtime issue: provide a "root" field and let users put their files into a specific folders on the SWIFT server. We should be able to detect the mtime of that folder since it has an entry on the object store.
One prerequisites is however that whichever tool is used to upload files must also automatically update the mtime of the folders, which would make it possible to detect changes from ownCloud.

@ser72
Copy link
Author

ser72 commented May 22, 2014

@PVince81

We created a sub folder in the SWIFT storage in ownCloud
Then we uploaded a file directly to the SWIFT storage (bypassing ownCloud) in that subfolder
Going back into ownCloud, the file did not exist.

@PVince81
Copy link
Contributor

@ser72 ok, thanks for the info.
This needs more investigation. I guess it should be at least possible to make the detection work in a subfolder.

@PVince81
Copy link
Contributor

I can't test this because the Rackspace UI doesn't accept folders, only files.
From the screenshot it looks like they are using Nectar, another provider.

If the folder case doesn't work, it is likely to be because whichever program or UI they use to upload a file, that program doesn't automatically update the folder's mtime, which makes it impossible to detect changes.

For example, if the object store contains these entries:

file_in_root.txt
subdir/
subdir/file_in_subdir.txt

When uploading a file into the subdir, it will simply add a new entry called "subdir/newfile_in_subdir.txt" and will probably not change the mtime of the "subdir/" entry, which our code checks to find whether there are changes.

I have the feeling that detecting changes is close to impossible with SWIFT interfaces without resorting to either getting notifications from the object store itself (needs architectural additions / development of object store specific plugins) OR often doing a full rescan.

Doing a full rescan might be possible but the performance is likely to be very bad.
A partial rescan could be possible based on the currently accessed folder, but requires checking all files in subfolders. When accessing root, the whole store would be rescanned.

@PVince81
Copy link
Contributor

@icewind1991 would it be possible to trigger a rescan specifically for a given storage using a cron job ?

@icewind1991
Copy link
Contributor

Not possible at the moment, would be fairly easy to write an app for it

@ser72
Copy link
Author

ser72 commented May 27, 2014

@PVince81

Any updates on this issue?

@PVince81
Copy link
Contributor

The only possibility I see for now is to poll the server for changes (bad performance), trigger a rescan regularly.
But as @icewind1991 pointed out it seems it's not possible.
Currently triggering a rescan, I believe, will rescan all storages, not only the SWIFT one.

I need support from @icewind1991 for the rescan/cron job part.

@PVince81
Copy link
Contributor

@ser72 I guess you already tried running ./occ files:scan and the updates were still not there ?
Maybe we can add an option to force a full rescan, so the admin can create a system cron job to run that command (as a quick fix)

@ser72
Copy link
Author

ser72 commented May 30, 2014

@PVince81 Don't recall, off hand, if we did that. Just requested it be done and will let you know the results.

@ser72
Copy link
Author

ser72 commented May 30, 2014

@PVince81 Did the file scan and the updates were still not there...

@PVince81
Copy link
Contributor

Actually it should be ./occ files:scan --all.
I discovered today that if you don't pass arguments, there is no output at all, but also nothing happens.

But based on what I found out today about etag propagation, there is only a minor chance that this works.

@ser72
Copy link
Author

ser72 commented Jun 2, 2014

./occ files:scan --all made the files visible.

Need to cron that as a work around

@ser72
Copy link
Author

ser72 commented Jun 10, 2014

@PVince81

Any updates on a permanent fix?

@PVince81
Copy link
Contributor

No updates. SWIFT is the most tricky ext storage.

Once idea like suggested before by @icewind1991 is to make hasUpdated of the ext storage implementation always return true to make the scanner believe that the storage has always changed.
This is probably worse than having files:scan --all in a cronjob because it would trigger the full rescan for every access over the web UI.

@jacobgardiner
Copy link

i've witnessed the same behaviour with S3 as the external storage, i'm assuming it's the same problem.

@craigpg craigpg added this to the ownCloud 7.0.1 milestone Jun 24, 2014
@PVince81 PVince81 mentioned this issue Jun 26, 2014
2 tasks
@butonic
Copy link
Member

butonic commented Jun 27, 2014

after writing the objectstore and having had a look at the issue I agree with @PVince81 that we cannot easily detect if something has changed on the swift/s3 side. They do not propagate mtime changes up the pseudo file hierarchy (because it's an object store ... duh) and likely never will. For swift the proper solutuion IMHO would be to implement a SWIFT middleware that notifies owncloud of changes. I don't see that kind of solution for s3 though.

In the meantime a cron job to scan and compare the mtimes of all files may be a viable solution. Depending on how long that takes files will show up after some time. It will however create quite some traffic and load on the db.

@craigpg craigpg added this to the 2014-sprint-03-next milestone Sep 2, 2014
@MTRichards
Copy link
Contributor

Add a release note for ownCloud 7, as this is true for SWIFT and S3 external storage. Can't reach around ownCloud in these situations, it doesn't detect automatically.

@MTRichards
Copy link
Contributor

Follow on thought: the config.php setting in oC 7 does allow external storage to set a scanfile action when external storage is accessed. Does this solve this problem?

@PVince81
Copy link
Contributor

PVince81 commented Sep 5, 2014

I'm not aware of any such addition.
One workaround for some storages like SMB was to manually setup a cron job that would run "./occ files:scan --all".

@PVince81
Copy link
Contributor

PVince81 commented Sep 5, 2014

Ah, maybe you mean "filesystem_check_changes" set to 2 ?
I have never tried that option. Back when that issue was there the option was not available.
One drawback is probably that it will happen on ALL storages, not only the SWIFT one.

@icewind1991
Copy link
Contributor

'filesystem_check_changes' => 2 will not solve this issue

@icewind1991
Copy link
Contributor

#10885 on the other hand... :)

@ser72
Copy link
Author

ser72 commented Sep 18, 2014

From the user who found this issue:
FYI: the alternative fix that was put in this month (as #10885 ) actually works and properly fixes the issue below.

That is to say, it all works fine on the top-level directory. As soon as you allow ownCloud to build a subdirectory structure within the secodary storage OpenStack swift space, inconsistenceis appear, with (just as an example) the OpenStack swift dashboard not showing contents of the subdirectories if that content was placed there by ownCloud.

@ser72 ser72 reopened this Sep 18, 2014
@butonic
Copy link
Member

butonic commented Oct 10, 2014

That is a known limitation of the current implementation. Swift and s3 only emulate directories. As a result the mtime change to a file is not propagated up the directory tree, which we would need to have a cheap way (one http call) to detect changes in the external storage. I see two solutions:

  1. always do a full scan of the external storage when it is accessed
  2. periodically do a full scan of the external storage in the background

1 is not a viable solution because we have ti iterate over all files in swift/s3 which will take too long to do it on demand
2. still takes several http calls to iterate over all objects in swift/s3 but will allow for eventual consistency, depending on the rescan frequency..

Unfortunately 2. comes at a cost for s3 users because they have to pay for the arising traffic. But I still think it is our best bet. Also note that #11375 now makes all users reuse the same filecache for s3 mounts, which makes rescanning unnecessary when all access happens through owncloud.

@butonic
Copy link
Member

butonic commented Oct 10, 2014

cc @PVince81 @icewind1991 for an opinion on the background scanning of externat swift/s3 storages

@PVince81
Copy link
Contributor

At some point I asked the openStack guys on IRC and they said one should implement a SWIFT middleware that would detect changes and then notify OC about such. But I have no idea how such a middleware could be written... it is likely that the middleware itself would still need to scan the SWIFT storage to find about changes, unless it's possible to have hooks.

So for now it seems the only solution is to do a full rescan periodically.

@butonic
Copy link
Member

butonic commented Oct 20, 2014

@icewind1991 What about a scanner that scans all objects at once, instead of descending the tree. That would certainly improve scan performance on s3 and swift. It could then be executed as a background job or after a configurable timeout. I'll look into it tomorrow, but your input is very welcome!

@MorrisJobke
Copy link
Contributor

Needs additional discussion in #11797

@DeepDiver1975 DeepDiver1975 modified the milestones: ownCloud 7 backlog, backlog Jan 8, 2015
@carlaschroder
Copy link

added to release notes

@RobinMcCorkell
Copy link
Member

Given that release notes have been updated, I'm marking this as a duplicate of #11797

@MorrisJobke MorrisJobke removed this from the backlog milestone Mar 20, 2015
@lock lock bot locked as resolved and limited conversation to collaborators Aug 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.