Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Zeppelin - 683][WIP] BittorrentNotebookRepo #1231

Closed
wants to merge 2 commits into from

Conversation

onkarshedge
Copy link

What is this PR for?

To share notes and download via a magnet link. Trackerless, only via DHT.

What type of PR is it?

[Feature]

Todos

  • - Task

What is the Jira issue?

How should this be tested?

Outline the steps to test the PR here.

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update?
  • Is there breaking changes for older versions?
  • Does this needs documentation?

@onkarshedge
Copy link
Author

@bzz Design help needed. Connect all the pieces...

TorrentEngine - singleton enum . has methods to start,stop a session. download a torrent from a magnet link and seed a given input file. It also has a alertListener which receives alert updates and a TorrentEngineListener.

BittorrentNotebookRepo - has a torrentEngine Instance and a list of TorrentSocket connections, To receive messages and send to each websocket. It implements the TorrentEngineListener.

Should this be separate or included in NotebookServer OnMessage

TorrentSocket- has an BittorrentNotebookRepo Instance. How to pass an instance to it ? . It could be passed via WebSocketCreator.I read here but then BittorrentNotebookRepo would have to extend WebSocketServlet and it already extends VFSNotebookRepo.

@bzz
Copy link
Member

bzz commented Jul 28, 2016

Thank you for sharing the progress! Let me look into this and get back to you

@onkarshedge
Copy link
Author

@bzz ping.

@bzz
Copy link
Member

bzz commented Aug 1, 2016

Sorry, was quite a busy week.

Great progress so far, from the first glance and on the formal side of the contribution:

  1. CI is failing, could you please post here the reason of failure? You can find it by browsing the logs of TravisCI.

    If you are sure that the reason of failure is not your code - please search project JIRA and check if that is something already known. (Issues with label flaky-test would be good candidates to check in case CI fails on some test)

  2. Every new file that you add must have an Apache 2.0 licence header at the beginning, same as all others - please add it

  3. Every dependency you add pom.xml need to be verified to have Apache 2.0 compatible license

  4. Every dependency you add pom.xml AND it's transitive dependencies which you can see with mvn dependency:tree will be part of the final release convenience binary so it's licenses must be recorded under zeppelin-distribution/src/bin-license/LICENSE

Please address this feedback and I will look more into the design tommorow and let you know!

}

private static void loadLibrary() throws IOException {
String path = LIBTORRENT_OS_LIBRARY_PATH();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about checking this return value in case it's null, would it make sense?

Copy link
Member

@bzz bzz Aug 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onkarshedge please make sure you follow the project java style guide regarding static members

@khalidhuseynov
Copy link
Contributor

@onkarshedge thanks for this contribution. as i can see, the design issue on passing BittorrentNotebookRepo to TorrentSocket was solved by passing through constructor. also better have torrent websocket message handling separate from NotebookServer as you did. any other design issues left non-mentioned?

String path = LIBTORRENT_OS_LIBRARY_PATH();
if (path == null) {
LOG.error("Unknown Operating system");
throw new IOException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's provide the message for auser on the reason of failure though the constructor argument here

@onkarshedge
Copy link
Author

@khalidhuseynov thanks for helping.

To actually see TorrentSocket in action I have to create aWebSocketCreator and a Servlet, then add this to jetty server , at this line

webApp.addServlet(new ServletHolder("ws-events",new TorrentServlet()),"/download");

So then should it be like this ...

public class BittorrentNotebookRepo extends VFSNotebookRepo implements TorrentEngineListener,
    WebSocketCreator {
@Override
public Object createWebSocket(ServletUpgradeRequest servletUpgradeRequest,
                 ServletUpgradeResponse servletUpgradeResponse) {
    return new TorrentSocket(this);
  }
}

And the TorrentServlet class

public class TorrentServlet extends WebSocketServlet {
  @Override
  public void configure(WebSocketServletFactory webSocketServletFactory) {
    webSocketServletFactory.setCreator(new BittorrentNotebookRepo());
  }
}

Here is the problem as I cannot instantiate the BittorrentNotebookRepo here , it requires conf parameter. Also the NotebookRepos' are instantiated in NotebookRepoSync.

private void setupDir() {
if (!torrentHomeDir.exists())
torrentHomeDir.mkdir();
if (!torrentDir.exists())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make sure you follow the project java style guide regarding usage of braces

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

@bzz
Copy link
Member

bzz commented Aug 2, 2016

@onkarshedge on the architecture side - could you please elaborate on the design that you choose, especially regarding TorrentServlet - why do you need a new servlet?

Initial idea the we discussed was that BittorrentNotebookRepo encapsulates all the things that are required in order to get notebooks.

Could you please clarify, what are the parts that communicate to each other here? A diagram (event ASCI one) would help here.

On the code side - please read the project style guide for Java language carefully and make sure you follow the conventions described there.

@onkarshedge
Copy link
Author

onkarshedge commented Aug 2, 2016

@bzz . I will post a diagram of communication for clear view.
BittorrentNotebookRepo does not implement/override any NotebookRepo methods. It uses all VFSNotebookRepo methods.

Note get(String noteId, AuthenticationInfo subject);
List<NoteInfo> list(AuthenticationInfo subject);
void save(Note note, AuthenticationInfo subject);

Only two key methods are

public String shareNote(String noteId); // take the note.json file and start seeding, return magnet link
public void download(String magnetLink); // download the files in the torrent directory.

Initial idea the we discussed was that BittorrentNotebookRepo encapsulates all the things that are required in order to get notebooks.

Could you please elaborate this point.
Once the design and goal is clear, it will help me implement faster.
Give me some time I will post a diagram..

@bzz
Copy link
Member

bzz commented Aug 2, 2016

@onkarshedge

Initial idea the we discussed was that BittorrentNotebookRepo encapsulates all the things that are required in order to get notebooks.

Could you please elaborate this point.

Sure. The idea with P2P notebook storage implementation, be that Bittorrent or Dat or any other protocol is to have:

  • a self-contained implementation (that possibly can be extracted to maven submodule later)
  • of a single class, that implements NotebookRepo protocol
  • and is pluggable, instead or on the side with current VFSNotebookRepo (for now - pluggable though a simple configuration in zeppelin-env.sh, later I can imagine that would be also possible though GUI)

In case only some parts of VFSNotebookRepo are required (not sure if that is the case or not) - we can extract them to the skeletal implementation AbstractFileSystemNotebookRepo and make BittorrentNotebookRepo extends AbstractFileSystemNotebookRepo.

For more details on this approach to design object oriented architecture please read infamous Joshua Bloch "Effective Java, 2nd edition" Item18: Prefer interfaces to abstract classes p.93


Waiting for the diagram of current architecture and code improvements according to styleguide - please ping me and I'll be happy to make another pass tomorrow.

@onkarshedge
Copy link
Author

@bzz here is the diagram.
torrent 1
Link to drive

torrentclass
Link to drive

@onkarshedge
Copy link
Author

onkarshedge commented Aug 3, 2016

The TorrentEngineListener as you can see is similar to NotebookEventListener, the Observer Pattern I read about it.
The NotebookServer has a list of socket connections, similar way BittorrentNotebookRepo has a list of TorrentSocket connections

And also a note on directories..
1] torrent
notes/
17fa195f6c85f72d197d63896beeee15f456e624/note.json
282f928f9429828e5ef0a422f13d3263e671a2d3/note.json
resume/
torrents/
settings.dat

The torrent directory is created which will contain the following directories. When the user wants to share a note via DHT he will give the noteId , the note.json file is copied to torrent/notes//
and then seeded by adding to TorrentEngine.
The reason to copy before seeding is if note is changed or updated after added to seeding, the pieces hash will change and hence corrupt. The torrent state will change from seeding --> downloading because it doesn't have that file.

@khalidhuseynov
Copy link
Contributor

khalidhuseynov commented Aug 3, 2016

@onkarshedge conf object can be obtained on the fly from current configurations by ZeppelinConfiguration.create(). However, as you mentioned, all repository storages are currently being initialized in NotebookRepoSync class and it may not be good idea to initialize it in TorrentServlet. Also the way you mentioned, TorrentServlet will always be initialized regardless of whether corresponding conf was selected. I think it would be more logical to do all the initializations possible in the constructor/initialization of BittorrentNotebookRepo class.

@bzz
Copy link
Member

bzz commented Aug 3, 2016

@onkarshedge thank you for drawing very nice diagram.

Let me be very clear - you did a great job and quite possible that it is me, as a mentor, who under communicate the expectations about the architecture for this project. But..

But in order to make your work useful for the project in the nearest future, this PR must not introduce new client-facing APIs, servlets or change the existing filesystem structure a lot.

So we need to refactor your implementation and simplify, so it has a single entry point BittorentNotebookRepo, a class that implements NotebookRepo, that is both a server and a client for bittorrent protocol, and that has all the code necessary for:

  1. listing existing notes that shared through Bittorrent (local FS cache of the latest version?)
  2. checkpointing AKA creating revisions (and how use Mutable Items to store them in DHT)
  3. listing revisions for each note, stored in DHT
  4. importing new note by magnet URL (should work similar to current "import by URL")

Lifecycle of all objects that are needed to perform this task (i.e TorrentEngine and all others) must be tied to NotebookRepo lifecycle:

  • all connections must be opened from it's constructor
  • all connections must be closed from .close()

Please let me know if that makes sense to you.

Let's start by finding answers to the questions on how to do 1-4 on DHT.
Let's write answers down in English first, here or on the GDoc, before moving on to coding\refactoring them in a programming language.

Please think about it and let me know if you have question - we can schedule a call.

@onkarshedge
Copy link
Author

onkarshedge commented Aug 9, 2016

@bzz I read the DHT specification and here are some questions and answers.

Answer to point 2) checkpointing and 3) get Revisions
Mutable items can be stored with help of public and private key. dht.mput(publicKey, privateKey, new Entry(data)); and retrieved via dht.mget(publicKey). This allows only to store only one item/note against your public key. On retrieval the latest revision is retrieved, highest seq number. We can store multiple notes also if we use salt, the key in DHT would be hash(publicKey+salt), the salt could be noteId. But still we can only get only latest revision as per the protocol.

If we use Immutable items to be stored in DHT dht.put(data) returns SHA hash and we can get the item back from DHT via dht.get(hash). So we can keep track of all revisions by storing hash against the noteID. noteID -> [hash of rev1,hash of rev2] in some Map<String,List> .

point 4) Magnet Link.
magnet link are used to receive the metadata(info) of the .torrent file.Once the metadata part is received it downloads the file. In order for the peer to receive the file it needs seeders. Why use magnet link to import ? as we are storing notes directly in DHT we can just use hash(returned from dht.put("data") ) to get it from DHT.

point 1) Listing existing notes
will this be VFSNotebookRepo's list() method ? I didn't get this one.

--Edited---
The DHT supports storing of arbitrary data with less than 1000 bytes.

@bzz
Copy link
Member

bzz commented Aug 16, 2016

Thank you for your explanation, what's your progress on refactoring current impelemtation?

Let me try to re-visit points 1-5. Also, let's make sure that behavior of BittorentRepo is consistent with other implementations, like i.e S3, so it could be a drop-in replacement for the same use-cases where S3NotebookRerpo is used by a single line of configuration change.

  1. listing existing notes that shared through Bittorrent

    Could we use Mutable item to save a list of hashes of all notebooks, stored by this user\publicKey ?

  2. checkpointing AKA creating revisions

    Can we store Note as Immutable item, so each "revision" of the Note will look like a separate Note with it's own Hash ?

  3. listing revisions for each note

    Can each Note have a publicKey, so we could store a list of all revisions for each note as a Mutable item?

  4. importing new note by hashcode (should work similar to current "import by URL")

    Hashcode should be good indeed, but this might still require changes that are discussed in Added ipfsnotebookrepo [ Zeppelin-683 ] #989

Let me know what you think!

@onkarshedge
Copy link
Author

@bzz What should I do now ? As you read the answer posted by gubatron should I now consider breaking the note into multiple partitions ?
I am using frostwire-jlibtorrent. If you are expecting to code the entire DHT from scratch, it's too tough for me. I could give it a try but it would take time and might not be upto mark.

@bzz
Copy link
Member

bzz commented Aug 19, 2016

If you are expecting to code the entire DHT from scratch, it's too tough for me

@onkarshedge That is not an expectation here of course.

Let's think about alternatives - another way is to distribute note.jspn though regular torrents, right? So we can assume that there is going to be a tracker one day, that lists all notebook that are shared and each Zeppelin instance will be just seeding it's own fraction of notebooks.

Then each revision of the note need to be stored as a separate file, as it will create a new torrent file\magnet link, right?

Do you think this will work?

  1. listing existing notes that shared through Bittorrent

    This will include listing local copies from filesystem i.e under /notebook/bittorent or /notebook-bittorent not to interfere \w existing VFSNotebookRepo impl.

  2. checkpointing AKA creating revisions
    Each revision will result in a new file i.e note-revisionId.json + a new .torrent/magnet link to share it

  3. listing revisions for each note
    This should be simple, given the assumptions from above. And each revision can be shared independently

  4. importing new note by magnet link (should work similar to current "import by URL")

But again, the code to do all this have to be as much as possible hidden/composed behind the single entry-point of BittorentNotebookRepo implements NotebookRepo.

What do you think?

@onkarshedge
Copy link
Author

Yes, this can be done.

@bzz
Copy link
Member

bzz commented Aug 19, 2016

@onkarshedge thank you, that is good to know :) I would rather expect you coming up with such suggestion and then posting an explanation like above though.

Could you also please update PR description according to your further plans?
I.e in case you decide to pursue suggested architecture - what would be the smaller steps\refactorings needed in order to get there?

@bzz
Copy link
Member

bzz commented Nov 9, 2016

@onkarshedge would you be willing to address the comments on the code style and rebase it?

@asfgit asfgit closed this in c38a0a0 May 9, 2018
asfgit pushed a commit that referenced this pull request May 9, 2018
close #83
close #86
close #125
close #133
close #139
close #146
close #193
close #203
close #246
close #262
close #264
close #273
close #291
close #299
close #320
close #347
close #389
close #413
close #423
close #543
close #560
close #658
close #670
close #728
close #765
close #777
close #782
close #783
close #812
close #822
close #841
close #843
close #878
close #884
close #918
close #989
close #1076
close #1135
close #1187
close #1231
close #1304
close #1316
close #1361
close #1385
close #1390
close #1414
close #1422
close #1425
close #1447
close #1458
close #1466
close #1485
close #1492
close #1495
close #1497
close #1536
close #1545
close #1561
close #1577
close #1600
close #1603
close #1678
close #1695
close #1739
close #1748
close #1765
close #1767
close #1776
close #1783
close #1799
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants