initial implementation of 3to4 migration #39

whyrusleeping · 2016-06-25T21:05:02Z

No description provided.

kevina · 2016-06-28T04:44:32Z

@whyrusleeping I am not sure what stage this is at, but as far as transferring the blocks I can think of two important optimizations that may be worth implementing:

It should be possible to tell if a block key is not managed by checking if the key is the expected length. If it was managled in some way it would be shorter. This will avoid the cost of recomputing the hash.
Although this will brake abstraction the file respecting a blocks could simply be renamed. When combined with (1) this could lead to a large (at least an order of magnitude) speed up.

I could probably implement these in a day if you want and you think they are worth implementing. Let me know.

whyrusleeping · 2016-06-28T05:51:23Z

It should be possible to tell if a block key is not managed by checking if the key is the expected length. If it was managled in some way it would be shorter. This will avoid the cost of recomputing the hash.

I'm not worried about that. I don't really mind the cost of re-hashing all the blocks, as it may help catch any potential corruption issues that might have occurred.

Although this will brake abstraction the file respecting a blocks could simply be renamed. When combined with (1) this could lead to a large (at least an order of magnitude) speed up.

I considered that, the issue with breaking the abstractions is that we have a lot of edge casing in the code (especially related to renaming files across different platforms and filesystem boundaries)

kevina · 2016-06-28T06:31:14Z

Although this will brake abstraction the file respecting a blocks could simply be renamed. When combined with (1) this could lead to a large (at least an order of magnitude) speed up.

I considered that, the issue with breaking the abstractions is that we have a lot of edge casing in the code (especially related to renaming files across different platforms and filesystem boundaries)

Maybe we can provide a '--fast' option that could abort if an edge case is detected? In any case this is something that could be done a bit later if the need arises. For now I would agree it is more important to get something simple that works.

As a side note (and no need to respond), avoiding the renaming is also one of the reason I originally wanted to preserve the hex encoding in the file names of the flatfs.

whyrusleeping · 2016-06-29T05:11:24Z

@Kubuxu the tests are failing right now because the ipfs init with 0.4.3-dev is printing 'Error: EOF' for some reason. I havent looked too deeply, but i'm tired and maybe you can figure it out while i'm sleeping.

Kubuxu · 2016-06-29T11:48:26Z

@whyrusleeping I am away for the bigger part of today.

whyrusleeping · 2016-06-29T15:55:33Z

@Kubuxu no worries then

kevina · 2016-06-29T21:27:20Z

@whyrusleeping for completeness I image we should make sure that a few of the datastore entries migrated in the test are effected by the key.Clean bug. Apologies if this was already done.

This might be more important if we decide to implement an optimized migration path as I outlined above.

whyrusleeping · 2016-06-29T21:28:32Z

@kevina good point. One easy way to do that would be to manually add one we know to have that bug. I'll try and find a small file that has the issue.

kevina · 2016-06-29T21:40:41Z

@whyrusleeping Here are a few from ipfs/kubo#994 thanks to @Cleric-K comment:

Slash at the end. Test with $ echo 10 | ipfs add
Two or more slashes (get converted to one slash). $ echo 00000259 | ipfs add
/../ - removes the part before that token. $ echo 0243397916 | ipfs add. Creates file:
d83220a26feced8c55f3ede2819a7f0a84ce.data
instead of:
12200bf25b81d5a2ab229b652f2e2e2fd83220a26feced8c55f3ede2819a7f0a84ce.data

I image the test cases will dig some more up.

whyrusleeping · 2016-06-29T21:44:59Z

the strings: ba bbd and cdbd currently trigger the bug when added as files

whyrusleeping · 2016-06-29T21:45:52Z

as well as aabdb adbdc bccac and dacab (I can keep going, but this should be plenty)

whyrusleeping · 2016-06-29T22:23:02Z

@kevina any other edge cases you think we should test?

kevina · 2016-06-29T22:31:54Z

@whyrusleeping, not off hand.

whyrusleeping · 2016-07-01T00:22:00Z

@kevina changed the blocks rewriting to just rename if it can.

ghost · 2016-07-01T00:28:09Z

Got any estimate how long this will take on a certain 16 TB spinning-disks host?

whyrusleeping · 2016-07-01T00:29:05Z

@lgierth much less time now :P

With the current rename changes, probably within a 10x overhead from the time to do an 'ipfs refs local'

kevina · 2016-07-01T01:27:41Z

@whyrusleeping looks good

kevina · 2016-07-01T07:07:49Z

@whyrusleeping, one other think I thought of, since this is something that could take a while, what would happen if the process was interrupted. Will it just continue where it left off when restarted?

If not, although I don't see how this could corrupt the repo, I could see how it could leave it in a bad state that will require special care to fix.

whyrusleeping · 2016-07-01T16:22:57Z

@kevina if the process gets interrupted everything should be fine. We only transfer keys that match the format we're expecting, so any keys that have already been moved will be skipped.

kevina · 2016-07-01T16:58:20Z

@whyrusleeping I see that now that what testing for the "1220" prefix is for.

Note that it is possible for a key to be so badly managed that it doesn't start with "1220" and will thus be skipped. For example echo 0243397916 | ipfs add, from an earlier comment, would create the fiile "d83220a26feced8c55f3ede2819a7f0a84ce.data".

I would add a pass at the end two to do two things:

Verify that all files in the datastore are now base32 encoded (try to decode the key) and the correct length and report any errors.
Remove empty directories.

whyrusleeping · 2016-07-01T17:04:33Z

@kevina ah! interesting. The binary key for that value contians the sequence /../

I'll add a test for that

whyrusleeping · 2016-07-01T18:58:36Z

@kevina there we go, added that as a test case. We no longer assume anything about the key. We will only skip a key if it matches the exact base32 format, or if it doesnt have a .data suffix

kevina · 2016-07-01T19:36:43Z

ipfs-3-to-4/migration/migration.go

+		}
+
+		cdir := filepath.Join(dir, c.Name())
+		blocks, err := ioutil.ReadDir(cdir)


nit: File.Readdirnames would be a better choice. ioutil.ReadDir will perform a stat on each directory entry and then sort them by name. Not sure if it really matters so fell free to ignore.

There are only few files in most dirs at a time, but using that instead wouldn't hurt in case of big repos.

@kevina I actually dont think it calls stat individually on each one. take a look at the go source in os/dir_unix.go

@whyrusleeping the file you pointed me to only contains an implementation for readdirnames, readdir is implemeted in os/file_unix.go and it actually calls Readdirnames first and than calls lstat for each entry. I don't see how it could be implemented any other way because of how the related system calls on unix work.

Kubuxu · 2016-07-01T20:46:33Z

LGTM

whyrusleeping · 2016-07-01T20:59:28Z

alright, choo choo

initial implementation of 3to4 migration

0fa3ae4

whyrusleeping mentioned this pull request Jun 26, 2016

encode keys to datastore with base32 standard encoding ipfs/kubo#2903

Merged

4 tasks

whyrusleeping added 2 commits June 25, 2016 23:11

fix tests (no more 0.4.0-dev)

b5a024a

finish initial implementation of 3 to 4 migration

7a4bd1f

whyrusleeping added 3 commits June 28, 2016 09:56

add in deps

e25affb

fixes to make 0.4.3 install and migration work properly

2136df7

add new test for 3-to-4

4a4f038

whyrusleeping added 3 commits June 29, 2016 12:54

make tests for 3-to-4 not use docker (and work)

f15036d

add in pollEndpoint dep

72492a6

fix pollEndpoint builds

7b724a7

explicitly add buggy path clean refs

8885034

whyrusleeping added 2 commits June 29, 2016 15:35

cleanup

5d60b7c

optimize to rename instead of rewrite

ba93f55

handle more heavily mangled keys

da31d7c

clean up empty flatfs directories after migration

5f00c7f

kevina reviewed Jul 1, 2016
View reviewed changes

whyrusleeping merged commit fc46b58 into master Jul 1, 2016

whyrusleeping deleted the feat/3-to-4 branch July 1, 2016 20:59

kevina mentioned this pull request Oct 4, 2018

Add a Rename method? ipfs/go-datastore#100

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial implementation of 3to4 migration #39

initial implementation of 3to4 migration #39

whyrusleeping commented Jun 25, 2016

kevina commented Jun 28, 2016 •

edited

Loading

whyrusleeping commented Jun 28, 2016

kevina commented Jun 28, 2016

whyrusleeping commented Jun 29, 2016

Kubuxu commented Jun 29, 2016

whyrusleeping commented Jun 29, 2016

kevina commented Jun 29, 2016

whyrusleeping commented Jun 29, 2016

kevina commented Jun 29, 2016

whyrusleeping commented Jun 29, 2016

whyrusleeping commented Jun 29, 2016

whyrusleeping commented Jun 29, 2016

kevina commented Jun 29, 2016

whyrusleeping commented Jul 1, 2016

ghost commented Jul 1, 2016

whyrusleeping commented Jul 1, 2016

kevina commented Jul 1, 2016

kevina commented Jul 1, 2016

whyrusleeping commented Jul 1, 2016

kevina commented Jul 1, 2016

whyrusleeping commented Jul 1, 2016

whyrusleeping commented Jul 1, 2016

kevina Jul 1, 2016

Kubuxu Jul 1, 2016

whyrusleeping Jul 1, 2016

kevina Jul 1, 2016

Kubuxu commented Jul 1, 2016

whyrusleeping commented Jul 1, 2016

initial implementation of 3to4 migration #39

initial implementation of 3to4 migration #39

Conversation

whyrusleeping commented Jun 25, 2016

kevina commented Jun 28, 2016 • edited Loading

whyrusleeping commented Jun 28, 2016

kevina commented Jun 28, 2016

whyrusleeping commented Jun 29, 2016

Kubuxu commented Jun 29, 2016

whyrusleeping commented Jun 29, 2016

kevina commented Jun 29, 2016

whyrusleeping commented Jun 29, 2016

kevina commented Jun 29, 2016

whyrusleeping commented Jun 29, 2016

whyrusleeping commented Jun 29, 2016

whyrusleeping commented Jun 29, 2016

kevina commented Jun 29, 2016

whyrusleeping commented Jul 1, 2016

ghost commented Jul 1, 2016

whyrusleeping commented Jul 1, 2016

kevina commented Jul 1, 2016

kevina commented Jul 1, 2016

whyrusleeping commented Jul 1, 2016

kevina commented Jul 1, 2016

whyrusleeping commented Jul 1, 2016

whyrusleeping commented Jul 1, 2016

kevina Jul 1, 2016

Choose a reason for hiding this comment

Kubuxu Jul 1, 2016

Choose a reason for hiding this comment

whyrusleeping Jul 1, 2016

Choose a reason for hiding this comment

kevina Jul 1, 2016

Choose a reason for hiding this comment

Kubuxu commented Jul 1, 2016

whyrusleeping commented Jul 1, 2016

kevina commented Jun 28, 2016 •

edited

Loading