-
Notifications
You must be signed in to change notification settings - Fork 12
[BBS-266] Adapt Getting Started for DVC remote from Zenodo #267
Conversation
For the record, the issue mentioned here is semi-reproducible, in the sense that it is due to the non-deterministic behavior of a component of SpaCy which uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @pafonta , this was really a great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice job! 🎉 Thanks a lot for doing this Zenodo thing @pafonta! I just did a small comment but can be ignored!
Third, keep track of the path to the working directory, the repository | ||
directory, and the data and models directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I could be better to say keep track of the paths of the working directory, the repository directory, and the data and models directory
. However, it is really a minor comment and I am not confident about what I am suggesting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence is coming from #229 (see here lines 220-221).
@Stannislav was the author of this PR. Maybe he knows about what is valid English?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you have a look at this @Stannislav?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me either way sounds fine :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets keep it that way :)
# Commented to allow the use of a DVC remote from a different type than SSH. | ||
# ssh_check | ||
# Not usable in README as it works only when inside the `bbs_` containers. | ||
# dvc_pull_models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please remind me how the server is going to find the NER models now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decision taken by the team yesterday to comment these lines puts the mining server in the same situation of the search server: dvc pull
needs to have been run before. In the README, there is already a dvc pull
happening before launching the mining server. Commenting the two lines doesn't therefore break something. For further discussions on the subject, there is the ticket BBS-306.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was trying to make sure I understand the setting of the path to the models correctly. The mining server finds the models by reading the BBS_DATA_AND_MODELS_DIR
environment variable. Here's my current understanding of setting this variable, could you maybe please check if it's correct?
With dvc_pull_models
the models would be pulled somewhere locally on the container, so that the value of this variable needs to be configured to point to a local path, e.g.:
BBS_DATA_AND_MODELS_DIR=/src/data_and_models
After commenting out dvc_pull_models
the models will be provided externally. So we need to docker run
with a mount of the volume where these models are, e.g. with the -v /raid:/raid
flag, and then configure the environment variable to point to a location on that volume:
BBS_DATA_AND_MODELS_DIR=/raid/<...>/projects/search/data_and_models
Is this understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job!!
Third, keep track of the path to the working directory, the repository | ||
directory, and the data and models directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me either way sounds fine :)
Fixes BBS-266.
TODO
Currently blocked by BBS-306.Otherwise, after the tests are successful:
Description
Adapt the Getting Started from the
README
to use the DVC remote from Zenodo.How to test?
1 - Getting Started
Follow the Getting Stared from the
README
afterThe notebook at the end should be usable for search and mining.
2 - DVC
a) Setup
b) Pulling
should retrieve all DVC tracked data and models successfully.
c)
pipelines/sentence_embedding
After
the command
should show no change.
d)
pipelines/ner
After
the command
should show no change but it will show these changes
This issue isn't introduced by the current PR. This is already the situation on
master
.Indeed, the issue is semi-reproducible (see #267 (comment)) outside this PR like this
Checklist
whatsnew.rst
updated.