Replies: 1 comment
-
The problem here is not just extending the protocol, but storing and finding all the file hashes in DHT. This could easily blow up the number of Mainline DHT entries dozens of times, which may force other clients to ignore and drop this kind of entry for efficiency's sake. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
High there!
I would like to discuss only version 1 torrents, from the original specification, up to the latest version of libtorrent 1.
Please limit responses to those specifications, thank you.
Currently, (as has always been the case), if we are downloading multiple torrents that have the same file(s), we are downloading multiple copies of those files. Why don't we take advantage of that?
Given a scenario where we are downloading three torrents with the same file(s), the first thing we have to do is ask...
Currently, bittorrent clients only use the file names and sizes to determine if they are the same.
I have thought of some tests to ensure that they are, in fact, the same.
I have tried to find the answers to the following questions, but my google-fu is poo :(
I have found that since 2015 (maybe earlier, even) that torrents created by some client software
have files that are piece-aligned, but what was the situation prior to that? Was it possible that
the first piece of a file was a few bits or bytes short, and that those missing bits or bytes were
in the previous piece of the torrent? Or does the first piece always have the file start somewhere
in there?
That would make comparing files from different torrents pretty difficult :(
How does a client even know where a file starts in a non-piece-aligned torrent?
How (non-)trivial is it to take a 16k or 32k block from a file in one torrent, and do a
"sliding window" bit comparison against a file from another torrent?
Can we even find the first piece of a file if files aren't piece-aligned?
Test 1, Hash Comparison1: Where files are piece-aligned, and all three torrents use the
same piece size.
Do we need to download any pieces for this one? If not, simply compare hashes,
otherwise...
Get the first one or two pieces of the file from the first two torrents, roll your own
hashes, and compare them.
Repeat for first and third torrents, then for second and third if first and second fails.
Output: 0 - Files are not the same.
1 - Files are the same.
2 - Bit offset for torrent A not required.
3 - Bit offset for torrent B not required.
Test 2, Hash Comparison2: Where files are piece-aligned, and the piece sizes are
1MB, 512KB, and 32KB.
Download the first 1MB of the file from each torrent, create a hash for each, and compare
them. Adjust according to the largest piece size. Repeat until you are satisfied that the files
are, or are not the same.
Output: 0 - Files are not the same.
1 - Files are the same.
2 - Bit offset for torrent A not required.
3 - Bit offset for torrent B not required.
Test 3, "Sliding Windows" Bit comparison: Where files in one or more torrents are not
piece-aligned, and the piece sizes are A: 1MB, B: 512KB, and C: 32KB.
Download the first 2MB of the file from each torrent.
To get an exact bit offset, take the first 16KB of the file from torrent C, and do "sliding window"
bit comparisons along the 2MB of the file from torrent B.
When that fails, move the "sliding window" on torrent C across by 1 bit, and compare against
the 2MB of the same file.
Continue sliding the window on torrent C by one bit and testing until you get a match.
Remember the bit offset from each file, (you will need them later).
When you get a match, take the next 16KB from torrent C, and test against the next 16KB of
the same file. If that succeeds, you can pick up the pace a bit and test 32KB, then 63KB.
Repeat until you are satisfied whether they are or are not the same.
Output: 0 - Files are not the same.
1 - Files are the same.
2 - Torrent C bit offset (Where the file actually starts in the first piece).
3 - Torrent B bit offset (Where the file actually starts in the first piece).
Repeat the test, comparing torrents C and A (starting with the first 16KB from C and
comparing to A).
If torrent C's files are piece-aligned, then you can use the first piece for comparison, rather
than taking the first 16KB, and you don't need the bit offset of torrent C.
Basically, you want to use the smallest piece sized torrent, and if it is not piece-aligned, start
with the first half of the first piece of the file you want, and if it is piece-aligned, start with the
first piece.
If the tests for C+B, and C+A fail, test B+A using the same 16KB "sliding windows".
Please note that this requires no more communication between clients than, "can I have..." "Please" and "Thank you". Comparisons are done locally, and results will be used locally.
I realise that my ignorance is on display for all to see, so if you could tell me where my thinking is wrong, and - most importantly - why, it would be greatly appreciated.
Take care,
&
Have fun!
Radar =8^)
Beta Was this translation helpful? Give feedback.
All reactions