Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of further functionality #1

Open
ml31415 opened this issue Jul 22, 2022 · 6 comments
Open

Integration of further functionality #1

ml31415 opened this issue Jul 22, 2022 · 6 comments

Comments

@ml31415
Copy link

ml31415 commented Jul 22, 2022

I recently had all sorts of issues converting a py2 db with 50GB of compressed grown data, including some unreadable pickles, compression errors and so on, into something readable for py3. In the end I had do patch quite some parts in zodbupdate and other tools in order to get the job done. The result is in the link below. I don't think it would be worth being published as a separate project, but it might be worth being integrated in your zodbtools. What do you think?

https://gist.github.com/ml31415/b4d00251e76afe35358070564864287a

@navytux
Copy link
Member

navytux commented Aug 1, 2022

Good day, Michael.

Thanks for sharing, this should be a useful functionality when one needs to migrate / verify such a database.

I had a quick look at the code and, as I understand, it provides the following functionality:

  1. FTFileStorage that amends std FileStorage to:
    a) ignore format-level problems, such as non-zero version length, and
    b) ignore unloadable objects (treating them as not having references) when doing GC.

  2. an FT storage iterator that skips objects that fail to load (do not exist, zlib error, IO error, whatever ...)

  3. command to verify object records for being loadable/unpicklable ok.

  4. command to fix unloadable/non-unpicklable object records by replacing them with PersistentBroken.

  5. command to rebuild an index for FileStorage.

  6. command to pack a database.

  7. FTUpdater that amends zodbupdate.Updater to create transactions with more informative notes, and that can replace objects, that fail to load, with PeristentBroken + associated command to convert a database via FTUpdater.


This is a mixture of functionality, that in my view, would be better integrated into appropriate places in parts. If that is ok with you, here is what I would suggest:

What do you think? Is that ok with you?
If yes - then we can proceed with creating corresponding merge-requests step-by-step.

Kirill

P.S. regarding ignoring non-zero version length I'm not so sure it is ok to ignore FileStorage format problems. Non-zero version length might represent either corruption of data.fs, or actually an old record with version information present. If it is corruption of data.fs, then we should be checking and repairing other format errors as well, e.g. broken transaction record headers, broken data record headers, inconsistencies in between those, etc. For the reference FileStorage/go does that automatically and also there is fs1 verify command to explicitly run this check - see e.g. https://lab.nexedi.com/kirr/neo/commit/5a26fb31 .

But if it is actually an old record with version information present - then just ignoring non-zero version length would lead to loading object data payload from wrong position (and thus getting "bad" data for the object). Why it is like this should be clear from the following link:

zopefoundation/ZODB@eef59eb35610#diff-25a69183c0b022dbd72bc0ff163682d82339b53c4ff0435b1cabddbf2145c413

So, I believe, if you actually hit those non-zero version lengths, it might be that many objects - that could be preserved - were turned to be Broken unnecessary.

I, of course, might be missing something, but that what comes to mind after checking quickly.

/cc @perrinjerome

@ml31415
Copy link
Author

ml31415 commented Aug 3, 2022

Hi @navytux I agree, that this functionality ideally should be split up and be integrated in the according tools. I just put all together in one file / program as there is quite some overlap for the tools in order to make the object iteration fault tolerant. I'm not familiar enough with ZODB, if the used iteration approach can be used for any kind of ZODB storage, so that a general fix: ... wrapper can be implemented. So far I used it for zlib and ordinary file storages.

About fixing objects: I was mainly dealing with broken zlib objects. So it was unrecoverable data garbage for me, when unpacking failed, and I wanted just anything marking an error. Providing a hook for what to do with broken objects would probably be a good idea. In the ZODB the versioning feature had never been used and was just corrupted. So you're right, it should be covered in a better way if that's a concern.

What do you think? Is that ok with you?

All good for me 👍

@navytux
Copy link
Member

navytux commented Aug 8, 2022

Hi @ml31415, thanks for feedback.

I understand that such kind of tools are usually one-shot, and that, after database is migrated, the interest to enhance the tool is usually no longer there. Still if maybe you want to proceed with upstreaming a bit I could give you some (imho) useful direction:

  1. first I would try to go with easier parts: things like zodb pack, zodb fs1 reindex, zodbupdate verbosity, and, maybe zodb verify,
  2. then, if "1" works ok, I would suggest to study ZODB.IStorage and related interfaces to see how/if FixStorage could be possible and done.

If that is ok, we could try to proceed further step-by-step in pull-requests. But, once again, I will understand that if the migration is already done and the tool is no longer used, there is not so much interest to continue working on it, and that you already tried to share your work via posting the gist and this issue.

Kirill

@ml31415
Copy link
Author

ml31415 commented Aug 8, 2022

Hi @navytux I'm not sure how any changes would make sense without having the fault-tolerant FixStorage wrapper first. IMHO that's the main improvement, that changes to the other tools could make use of. Otherwise you'd just duplicate code for every tool separately I guess.

I wasn't planning on writing PRs myself right now, as you assumed correctly, I'm done with my week of grief, to get that stuff working at all, and I still got enough to do with the project I wrote the converter for. Might come back to that later, though. But as I said, welcome to take any code snippets you like. :)

@navytux
Copy link
Member

navytux commented Aug 8, 2022

Michael, I see; thanks for feedback.

@perrinjerome
Copy link
Member

@ml31415 thanks for sharing this script, we will also soon be working on converting some ZODBs to python 3 and if we encounter problems I'll remember that there is this code.

@navytux I agree with everything you said and I also believe that these would be good additions to zodbtools that we can add if we find the need and/or the time.

PS: it's totally of topic but since we are discussing about wrapper storages and storage migration to python3, did you try to write a wrapper storage that would include zodbupdate logic to do the python3 conversion on the fly ? The idea would be to migrate to python3 without much downtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants