-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First-pass thoughts on the CLI etc #16
Comments
@mmcky it would be very helpful if you or others at the QE team could take similar pass-throughs on the jupyter-cache CLI. I think that this functionality is most-relevant to the QuantEcon usecase so it'll be important that you all are comfortable with the API/CLI/etc. |
Cheers 😄 Yeh obviously it's still at a relatively early stage, with the CLI (currently) aimed at showcasing the full functionality, rather than necessarily being 'user-friendly'. These comments will start helping to achieve that.
As you say,
Then I also agree there is currently a cognitive dissonance on their CLI use. This is because it'll taking some 'CLI wizardry` to get it to work the same, which I haven't got round to coding yet; i.e. it would be better to have: $ jcache stage-nb --assets asset1 asset2 PATH/TO/NB.ipynb But you need to get $ jcache stage-nb --assets asset1 asset2 --- PATH/TO/NB.ipynb For the API, I've made specifying assets very general, so you specify each one individually. But for practical use cases, and for a more automated approach, you probably want a simpler heuristic, e.g. that all assets (if any) are in a folder, with the same name as the notebook, and are read automatically. Do you think this makes sense?
On both these point, I would envisage the 'final' CLI to have a nested structure, i.e. there would be top-level commands/groups ( $ jcache config commit-limit
$ jcache stage list
$ jcache stage nbs This keeps the CLI 'tidyier', with the only con being that you have to go through this extra layer to get to certain commands. Does that sound reasonable? Note with
Sounds reasonable, what information would you expect to receive from this command? Also, given the tab completion I mention above, it would be better to call this something else that doesn't 'clash' with
👍 I'm also feeling that Yeh I couldn't think of a good 4-letter word. But at the same time, you can just tab complete anyway. I'll probably add click-completions, which also makes tab completion available for all sub-commands and arguments. ... |
Ah I didn't realise empty cells didn't get an execution count; that can be fixed. The idea of the $ jcache commit-nb --no-validate
We'll have to encapsulate this issue in a unit test, then I can fix it 😄
All notebooks should only be committed if they are in a successfully executed state. It doesn't necessarily have to have been executed via
Yep 👍 Also with #14, I envisage additional reporting being stored (including failed notebooks). So, after execution, all the notebooks will still be staged, but doing
This probably wouldn't be a generally used feature. It is just to print out the content of an artefact (see explanation below), with the PK being the pointer to the committed notebook and the
Yes. I do explain this a little in the README, but it is essentially any file present in the notebook's folder after execution (that wasn't there before), e.g. if you've used
Yes
Yep good point re:
So here staged actually has nothing to do with commit. Perhaps if
This may be to do with the above points. Will adress this after they have been cleared up. |
And here's the extension, for saving failed notebooks, mentioned in #14 Failed notebooks will be removed when the staged DB record is removed or re-executed |
Was trying the CLI tool and needed a few clarifications as a dense user :) :- It seems like there's two ways of saving an I feel that if we are following git philosophy then it's better to have one flow, where it goes through the staging phase? While saving the same notebook through these two different flows, I found that the DB can have duplicate entries, i.e, entries with the same URI, which creates confusion. Wonderful tool bdw @chrisjsewell |
These are the intended use cases yes, generally you shouldn’t need to add directly to the cache. To clarify, the DB will have duplicate URIs. Every time you change a notebook and re-run the old run will still be there until it is cleaned. The uri does not denote unique notebooks it is just there as a record of where the notebook came from. After entry notebook are matched by hash and don’t care about the uri (FYI, I’m literally about to take off, so won’t be replying for awhile!) |
Taking another pass now, will give my thoughts in the same long-form way as before!
So again trying to be naive about the workflow, this is what I'm intuiting from the CLI:
OK now after having looked at the docs a bit more, I now think this is the expected use-case:
Does this sound more correct? I'm leaving the first set of (incorrect) steps here so you can see where I was getting tripped up. And I know you were explaining a bit of this above, but again I'm being intentionally dense to show where stuff might not be intuitive. But in general, this is definitely an improvement, I was able to get this working and figure out what's happening with the CLI much faster 👍 |
😀
Yep that can be done. Simlilarly this could be done for staging
Yes, the staging solve two 'problems'
No, I think this is too restrictive and would make the
If anything, this would be more sensible IMO
Yes. I could add a shortcut like this; whereby there will be a prompt to confirm that this will wipe the currently staged notebooks, then add this one, before executing. But obviously this will increase the complexity of the
No you can't mix CLI groups with CLI commands. Think of groups (like
Yeh that's possible
No. As I mentioned to @AakashGfude above. Successfully executed notebooks are automatically cached. I will add a message(s) in The staged notebook records are just references to the source files:
(a) To inspect what is in the cache, and related execution statistics
As mentioned above, they can inspect the cache for execution statistics etc.
I can't be accountable for people not reading the docs properly 😜
Again, It's important to note that there is no physical store of 'staged' notebooks, they are just a reference to the source file URIs.
Generally yes though 👍 |
my 2c. I think the confusion with if we adopted a I think this is what @AakashGfude had in mind with removing direct entry into the |
Yep, it does now specifically state this in the CLI: $ jcache -h
...
stage Commands for staging notebooks to be executed.
Well yes, that's basically the current design? A subtle difference I note though is that you are implying that notebooks that executed correctly should be removed from the stage phase.
Yeh possibly, although equally you shouldn't be forced to go via |
Could that confusion be improved by my suggestion to put the |
@chrisjsewell giving another shot at figuring out the merging etc behavior. Thanks again for putting this package together, I think it's pretty handy! I think I have enough context to play around w/ the jupyter-book implementation. Here are a few questions/feedback/etc from me: I've managed to get multiple copies of a file in my cache:
Is this expected behavior, because this is how we keep track of multiple copies of a file over time? From a UI perspective I found it a bit confusing, I was expecting one entry per file. Now using the Python API:
I hope these comments help guide API etc design, and let me know if you'd like me to open issues for some specific ones. In general this is looking great, many thanks for your hard work in getting this ready for testing. |
Yes, the URI is only a record of the original URI that the notebook was read from. After the read notebooks are only distinguished by their hash; the URI is just there to give some context. Perhaps if the table header was something like "Origin URI", that would make that clearer?
👍 (I should probably make this less deep into the package, but I recall there may be an issue to overcome with cyclic dependency)
I don't really see what this improves?
Yep, can just upstream the CLI function into an API method 👍
👍
👌
👍 👌 😄 🎉 |
These are the TODOs so far from this discussion:
Any more? |
None others than I can think of! I will try building this in to myst-nb tomorrow so maybe that'll uncover some other thoughts. I think the question about "should it be a direct path to the cache or to the parent folder?" is something I feel 50/50 on. As I mentioned above I was mostly just trying to document what I intuitively assumed vs. what the reality turned out to be. I agree that we should assume documentation can clear these things up, but we also might as well make the API match intuition as much as possible. Perhaps @mmcky or others could weigh in on that particular design decision, I think it's worth just having more people try it out. |
Hey @choldgraf , here is a minimal implementation of this being build tn to myst-nb :- executablebooks/MyST-NB#55 . Sorry, for the late PR.Got caught up in university life 😅. @chrisjsewell , regarding #29 , you already have an easy util function to access it :- https://github.com/ExecutableBookProject/jupyter-cache/blob/master/jupyter_cache/cli/utils.py#L12 . So, the issue might not be necessary? |
See 14f7a90 for fixes to some of the issues brought up |
See https://jupyter-cache.readthedocs.io/en/latest/using/api.html for a full rundown of the API, and let me know any feedback 😄 |
Anyone free to propose a new logo (I just quickly found that from google), also @choldgraf the left sidebar seems to act poorly on a number of the pages; is this an issue to be raised with the |
Nice - I'll try giving another run-through on the API docs soon! let's think on a logo, copypasta from google is fine in the meantime :-) And re: the sidebar thing, I think that is pydata/pydata-sphinx-theme#118 - I'm hoping somebody in the pandas world with more bootstrap-fu than I do can figure it out... |
@DrDrij, do you have time to work on a logo for this project? You can get a sense of what it's about from the docs. |
Sure thing :) If there are any preferences in terms of colour pallet, shapes, or logos anyone really likes that can provide as a guide - please speak up :D |
Well I guess what we maybe want is new logos for all the EBP packages 😁 with a consistent theme, colour pallet, etc: |
Would love your help with this @DrDrij --- funded by NumFOCUS of course. For context, these projects form the meat and potatoes of EBP, shiny new version of jupinx. |
I've found the jupyter brand guide to be a helpful way of thinking about design in the jupyter ecosystem https://github.com/jupyter/design/tree/master/brandguide |
I'm going to have a go at trying out the CLI and the docs, and will report back thoughts or confusions as they come up. I am going to be sort-of intentionally dense here, because that'll be the perspective of users when they first start using the tool (apologies to new users if they come across this statement lol)
In general, I think that it's moving in the right direction. I'm impressed at the machinery under the hood and excited to get the API sorted a bit. Most of my comments are around clarity and UX for the CLI (I think we should do something similar for the Python API as well).
here are my very intentionally verbose comments :-)
What's the difference between
stage-nb
andstage-nbs
? It seems like these should be doing the same thing, but one of them allows asset paths and the other doesn't? Is there a way that we could allow for a singlestage
command that could handle both a single- and multi-notebook case? **edit: it looks likestage-nb
is in fact a significantly different use-case? In that case we should call the verb something different. I was confused thatstage-nb
takes as a first argument a list of paths to assets, whilestage-nbs
takes paths to notebooks.To that extent, I feel like it would be less cognitively-burdensome to have three verbs at the user's disposal:
stage
execute
commit
-nbs
" in there (and where we use one verb for both single-and multi-notebook cases). Unless we expect that this will ever be used to "stage" a thing that is not also a notebook?It feels awkward to have
commit-limit
as one of the top-level user-facing commands. Do we expect that users will often use this verb? If not, or if we imagine similar functionality where we want users to control the global behavior using the CLI, perhaps another verb could bejcache config <verbs-to-control-config>
?A convenience verb like
jcache status
would be helpful. I found myself naturally wanting to type it since we are using so muchgit
terminology.The
execute
docstring should be "execute staged and outdated notebooks", right?I'm also feeling that
jcache
is a lot of letters to type if I am doing a lot of typing etc...I wonder if there is a more short-hand entrypoint to the CLI we can think of.If I execute a notebook with an empty cell, then I get the following error because the cell wasn't executed:
After then deleting those empty cells and re-running
jcache execute
, I am now getting this much more confusing error:jcache execute
again seems to now not have any errors. I think this is because the second time I ran it, it didn't try to execute the notebook. The error happens again if I change the notebook code cell.commit-nb
strictly require that a notebook be executed? Or can you do a straightjcache stage-nb myntbk.ipynb
->jcache commit-nb
?jcache-execute
should say how many notebooks it skipped and executed in the "finished" statement. Something likeFinished executing 3 notebooks (5 non-stale notebooks skipped)
cat-artifact
does. The docstring doesn't disambiguate whatPK
orARTIFACT_RPATH
is. I'm assuming PK is "Primary Key", but that is database speak not human speak :-)diff-nb
- I'm not sure when a notebook is officially "cached" or not. Is it when I runjcache execute
? Or do I have to explicitly do the 'commit'?commit-nb
- based on the logic of previous steps, I was assuming that this would just "commit everything that is staged right now". I think this is because we are overloadinggit
language. Another suggestion from me that we might this word to becache
notcommit
to make it explicit when cacheing is happening, and to avoid confusion withgit
.commit-nb
andcommit-nbs
again...that is a confusing one...TypeError: the JSON object must be str, bytes or bytearray, not int
and any time I then try to commit the notebook, it warns me that the notebook wasn't executed.OK I hope that this is helpful, I'm happy to step through subsequent iterations!
The text was updated successfully, but these errors were encountered: