-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using HDFS to backup and restore notebook #1600
Conversation
@aspen01 thank you for contribution! Could you please double-check that
As for copyright notices, could you please help to understand if your contribution contains any third-party works or is that all code, developed for the ASF? That would determine, if the existing copyright notice from the files should be moved to root NOTICE file or not. Thanks! |
public synchronized void save(Note note, AuthenticationInfo subject) throws IOException { | ||
super.save(note, subject); | ||
if (this.enableWebHDFS) | ||
uploadNoteListToHDFS(note); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about:
- whether makes sense to pass
user
fromsubject
instead of getting user from environment insideHDFSCommand
- from this function name seems like uploading whole list instead of a single note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If this feature save a notebook to hdfs directly, I think your opinion is right. But now the feature is just backup the notebook to hdfs, so I tried to think of the two functions(connecting hdfs and save the notebook) separately.
- I'll fix it. Thanks.
String notebookStatus = this.hdfsCmd.runCommand(this.hdfsCmd.getFileStatus, notebook, null); | ||
fileStatus = gson.fromJson(notebookStatus, OneFileStatus.class); | ||
} catch (Exception e) { | ||
logger.warn("Warning: ", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe some description to give here
} | ||
} | ||
catch (Exception e) { | ||
logger.error("Exception: ", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit more description would help here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add more description. Thanks.
@@ -0,0 +1,241 @@ | |||
/** | |||
* Copyright 2015 NAVER Corp. All rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file licensed for use in an Apache project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is wrong license. I'll remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure all the code is distributed under one of the ASF category-a licenses
@aspen01 I wonder if any progress on this one? |
</property> | ||
|
||
<property> | ||
<name>hdfs.url</name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to add prefix zeppelin.notebook, because hdfs is used in many places. Adding prefix can help user to understand this this is for notebook storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HDFSCommand use hdfs.url
property already. So I used the same property name.
If adding prefix can help user to understand, I'll add prefix.
@@ -42,6 +42,18 @@ | |||
</dependency> | |||
|
|||
<dependency> | |||
<groupId>${project.groupId}</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a good idea to make zengine to depend on one specific interpreter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. But To use HDFSCommand functions, I had to add dependency.
Could you give me advice to solve this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, Zengine should not have such dependency.
@aspen01 if HDFSCommand
have to be shared:
- one option is to move it to
zeppelin-interpreter
, as bothfile
andzeppelin-zengine
already depend on it - another option to consider would be - extract a new maven sub-module out of
zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo/
, moveHDFSCommand
there and make other modules depend on it.
Second option is more involved and deserves a separate issue \w discussion. First one looks more feasible
@bzz Thank you for advice. But the build is failed even though I fixed license and checkstyle. |
I see, thank you for taking care! This looks like something failing on the TravisCI side though. Could you please rebase this branch on latest master and force-push it here again? This will trigger CI \w latest code\fixes from master. |
Looks like you're missing a hotfix commit: |
Thank you @aspen01, @placeybordeaux In order to be merged, this branch have to be rebased on top of the master, which itself already includes CI fixes, i.e 1c7d8fb. This branch must not include any other commits, except for ones that implement "HDFS to backup and restore notebook" - edae7bc, d94eb16 need to be removed. |
return "/" + newUrl.replaceAll("^hdfs://", "").split("/", 2)[1]; | ||
} | ||
else | ||
return newUrl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, make sure all the code follows project style guide
@placeybordeaux Thank you. |
add header for uploading file using REST API
I committed this branch on latest master and force-push it. |
Not sure why the CI hasn't kicked in, but I tried compiling this on my local and I am getting some test failures:
Not sure if these are related to the PR at all. |
@placeybordeaux How did you build a package?
|
I built with |
Is this ready to be merged? |
Seems overcomplicated for |
@EronWright Thank you for your feedback. When I first developed the feature, I assumed that HDFS would not be used as default storage because remote FS failure could affect zeppelin usage. So extended VFSNotebookRepo to keep the default storage space on local FS and HDFS as backup storage for failover. This makes the implementation is simple, but it may have created complexity. |
close #83 close #86 close #125 close #133 close #139 close #146 close #193 close #203 close #246 close #262 close #264 close #273 close #291 close #299 close #320 close #347 close #389 close #413 close #423 close #543 close #560 close #658 close #670 close #728 close #765 close #777 close #782 close #783 close #812 close #822 close #841 close #843 close #878 close #884 close #918 close #989 close #1076 close #1135 close #1187 close #1231 close #1304 close #1316 close #1361 close #1385 close #1390 close #1414 close #1422 close #1425 close #1447 close #1458 close #1466 close #1485 close #1492 close #1495 close #1497 close #1536 close #1545 close #1561 close #1577 close #1600 close #1603 close #1678 close #1695 close #1739 close #1748 close #1765 close #1767 close #1776 close #1783 close #1799
What is this PR for?
This PR supports using HDFS to backup and restore notebook.
It is similar to #1479.
However this PR just use WebHDFS API, so we don't need to care about hadoop libraries' dependency.
What type of PR is it?
Improvement
Todos
What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-1515
How should this be tested?
Set the variables in zeppelin-site.xml
After zeppelin daemon start, check the notebook directories in
hdfs.notebook.dir
.Screenshots (if appropriate)
Questions: