Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR: Update all screenshots to high-res versions from Spyder 3.2.8 and Win10, and add new ones #22

Closed
wants to merge 10 commits into from
Closed

Conversation

CAM-Gerlach
Copy link
Member

@CAM-Gerlach CAM-Gerlach commented Apr 22, 2018

Pull Request Checklist

  • Read and followed this repo's Contributing Guidelines
  • Based your PR on the latest version of the correct branch (master or 3.x)
  • Checked your writing carefully for correct English spelling, grammar, etc
  • Described your changes and the motivation for them below
  • Noted what issue(s) this pull request resolves, creating one if needed

Description of Changes

Adds over 200 new, consistent, high-resolution screenshots of modern Spyder (3.2.8) running on a clean Win 10 test machine, replaces all the existing old screenshots and adds a few new ones where appropriate and called for by the existing text, minor layout alterations to the text and theme to accommodate the new images. Also, removes the obsolete legacy console docs (to remove the need to update the now non-existant image) and update links to point to the IPython Console, merging a short paragraph from the former into the latter.

Hopefully this can get reviewed promptly, as I can really get started on whipping the existing text/content into some semblance of shape to address #13 and #15 (updating/correcting out of date/innacurate content, copyediting, adding low-hanging fruit, etc) until this gets merged. I tried to make it as straightforward and non-controversial as possible, just direct replacements of the existing screenshots and a few very judicious new ones as previously discussed, and essentially no real changes to the actual body text itself, so hopefully that can happen quickly. Thanks.

Issue(s) Resolved

Fixes #14

@ccordoba12
Copy link
Member

@CAM-Gerlach, I think we should use Git LFS in this repo to handle images, given the sheer amount of them you want to commit in this PR. So my advice is to close this PR, remove/recreate your local clone (so you can't inadvertently add these images in another PR) and start again with Git LFS and a PR that only adds the updated images required by our current docs.

After that, please open new PRs, each with a single (no matter if short or long) change to our docs and the new images required by it. Does that sound like a good plan?

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented May 3, 2018

TL;DR: You might know more than me, but from the information I was able to gather, for this application, Git LFS seems like a problem in search of a solution. I know you said I shouldn't use humor, but obligatory xkcd is obligatory.

I think we should use Git LFS in this repo to handle images, given the sheer amount of them you want to commit in this PR.

I probably just am not as experienced with git LFS as well as you are, but I'm really not sure I understand the reason, use case or benefits for doing this. From what I've read and heard about it, git LFS (Large File Storage) seems to be designed to handle versioning very large individual files, up to multiple gigabtyes, to work around repo size limitations, allow better versioning support for such files, and avoid inefficiency with git operations such large individual blobs, not really large numbers of small files. However, the images here are only 5-100 KB in size each (averaging ~30 KB each, within the same order of magnitude as the docs themselves), and even all 200 of them are only 7 MB, and aren't likely to change anytime soon until at least Spyder 4 due to the feature freeze, nor have too much added to it since I've already covered pretty much everything I can think of. Its also not like they make the repo unweildly large; the main Spyder repo is around 10x the size of the images at 70 MB, and even our fairly minimal website is 30 MB.

Furthermore, it adds extra complexity to the system and time/work spent needed to setup and maintain it, additional things that can go wrong, and added requirements/confusion/sources of error for new contributors; plus they've been a number of complaints with Git LFS; the free plan is quite limited and apparently it can even break some repos, and it currently has 215 open issues. On the flipside, I'm not sure I understand any of the actual substantial benefits for our use case, given our files are not at all large, don't take up that much space in aggregate, and shouldn't change very frequently at all (and when they do, it would be a complete rewrite). On top of all of that, it will suck up even more contributor time to redo this PR, and push back the formal docs release even further especially with the lengthy review cycle. Maybe you could explain more what you see as the benefits that supercede these downsides?

remove/recreate your local clone (so you can't inadvertently add these images in another PR)

I'm a little confused as to the point of this step too, sorry—if they're still hanging around on my local repo, then they'll still be on my fork that I reclone the repo from, unless I'm silly enough to just leave them untracked in the working dir for some reason. What am I missing?

a PR that only adds the updated images required by our current docs

If the other images aren't in the repo and instead siloed away somewhere on my disk or on Google Drive, anyone else but me who wants to add to, enhance or fix issues with the docs won't have access to them, resulting in either duplicated effort to take likely less consistent/high quality screenshots, or other users' contribution lacking them altogether and having to manually go through me or a different bespoke/siloed off source to get them, when they could easily just be stored right in the repo that needs them for anyone to use and reference.

Furthermore, from a conceptual point of view, they were all created from the exact time and from the same snapshot in time of Spyder (and the OS, dependencies, etc), so committing them together at the same time makes that heritage clear and allows us to keep easier track of how much they've (and their successors, if we update them as needed) have aged right within git, rather than putting them in piecemeal at all different times, dates, and even years. This also allows us to establish the full organization scheme up front, rather than making people guess at how we'd want things organized.

After that, please open new PRs, each with a single (no matter if short or long) change to our docs and the new images required by it.

If you're referring to only updating the currently inserted images in this PR, I actually went back and forth about whether or not to add a few additional ones directly suggested by the current text (and the full window screenshots as discussed) in this PR, or save it for either the next one doing the baseline copyediting, correctness and consistency pass, or the appropriate following one(s) adding basic low-hanging fruit content to specific parts of the docs.

Other than what was directly necessary for replacing the screenshots properly and to address #14 (i.e. removing the non-IPython console doc, as the relevant feature no longer existed for me to screenshot it, and very basic changes to the layout to actually fit the new screenshots), I kept myself to very strictly not touching anything else despite strong temptation to do so, and almost held off adding any at all to observe that, but ended up adding a few obvious ones as commits to the very end (so they could be rolled back easily, if not wanted in this PR) as they were much more within scope of this one than any immediately following ones, and would make this a little less of a blocker to followup PRs if it had come to that (and it very well might have, if I hadn't happened to have a bevy of final projects and exams during the week and a half I've had to wait).

However, if you want me to roll back adding the few images I did that didn't replace existing ones (and modify #14 to not include that aspect, creating a new issue just for the new ones) including the overview alternate layout ones we discussed previously and the couple that were directly suggested by the existing text (i.e. the last two commits before the most recent one), I can do so very easily, as I planned for that possibility. If so, be aware that at this rate of getting things approved and merged lately (around a week or more per PR), it could be weeks till they get merged, since they'll have to wait behind the baseline copyediting/correctness/consistency pass before I can even really start them, since that is necessarily a file-crosscutting one (the changes being tightly linked within a conceptual and practical scope, and a single issue but not necessarily file-wise) and is a hard requirement for the proper public release of our docs as otherwise we're basically promoting heavily outdated and inaccurate information. From there, once that gets merged, things should mostly be file by file as needed going forward, of which adding the images (and expanding the bullet points) of the overview will be first up, followed by adding the extra images elsewhere when addressing those pages in turn.

@ccordoba12
Copy link
Member

From what I've read and heard about it, git LFS (Large File Storage) seems to be designed to handle versioning very large individual files

The thing is this: git-lfs is not only good to handle large files, but binary files in general, so that they don't become part of git history.

That way, fetching, cloning and pushing to this repo will take much less time, so it'll be easier to work with it.

I'm a little confused as to the point of this step too, sorry—if they're still hanging around on my local repo, then they'll still be on my fork that I reclone the repo from, unless I'm silly enough to just leave them untracked in the working dir for some reason. What am I missing?

The problem is you could add (by mistake) this PR to git history and that would make git-lfs useless.

If the other images aren't in the repo and instead siloed away somewhere on my disk or on Google Drive, anyone else but me who wants to add to, enhance or fix issues with the docs won't have access to them, resulting in either duplicated effort to take likely less consistent/high quality screenshots, or other users' contribution lacking them altogether and having to manually go through me or a different bespoke/siloed off source to get them, when they could easily just be stored right in the repo that needs them for anyone to use and reference.

What I said is that you should add new images only with an associated PR that requires them. Else, they could end up never been used.

If you're referring to only updating the currently inserted images in this PR, I actually went back and forth about whether or not to add a few additional ones directly suggested by the current text (and the full window screenshots as discussed) in this PR, or save it for either the next one doing the baseline copyediting, correctness and consistency pass, or the appropriate following one(s) adding basic low-hanging fruit content to specific parts of the docs.

I meant that, for an initial PR, you should only replace our current (and outdated) images. Then, when new text content is added to our docs, you could add more images (as the new content requires them).

@CAM-Gerlach
Copy link
Member Author

The thing is this: git-lfs is not only good to handle large files, but binary files in general, so that they don't become part of git history.

Right, but doesn't this only really matters given such small files if we version and update them a lot, since they only increase repo size if they are modified? Otherwise, they don't grow substantially larger than anything else, and don't need to get passed back and forth often. Even assuming we put in all 200 now and update every single one of them every two years (fairly unlikely given the amount of work and lack of need for all of them to be updated every time), after ten years that's only 35 MB total, approximately the size of the spyder-website repo right now on my disk and half the size of the core Spyder repo. Furthermore, any growth is linear at best, while network bandwidth grows essentially exponentially (as does storage, though it is far below any conceivable size that would be remotely noticeable on that end).

That way, fetching, cloning and pushing to this repo will take much less time, so it'll be easier to work with it.

I did some tests to investigate (origin is my fork with the screenshots branch; upstream is the main repo without it). I'm on an emergency backup internet connection running at 3G speeds due to my main one being out, degraded (by near an order of magnitude) at the moment due to a major technical issue, so I only have about 7 Mb/s down and 1 Mb/s up at the moment; far slower than the average American much less the rest of the world (in fact, the upload speed, the most critical for this analysis, is slower than both the average mobile and fixed upload speeds for every single one of the 135 countries listed in the speedtest global index) ; an average user testing under typical conditions would experience much less impact, especially with pushes which depend on upstream bandwidth.

Initial push to origin on a new branch based off this branch with screenshots: 8s
Initial push to origin on a new branch based off master, without screenshots: 7s
Initial push on a new branch on the website repo: 5s

Followup push on ^ with screenshots; 6s
Followup push on ^ without screenshots: 9s
Followup push on ^ on website: 4s

Git fetch origin on spyder-docs (no changes): 2s
Git fetch upstream on spyder-docs (no changes): 2.5s
Git fetch upstream on spyder-docs (with changes): 3s
Git fetch upstream on website (no changes): 2s

Git clone on origin: 10s
Git clone on upstream: 5s

Cloning was the most consistent difference, but was only a few seconds and only has to be done essentially once per contributor and in any case would actually be slower with git lfs, since it would need to hit the separate server for each file and then download the same file anyway; fetches, relying on downlink, saw essentially no impact and would be similarly slower if any LFS files changed, while pushes varied from 1-5s faster in this essentially worst case without any overhead from actual LFS. Furthermore, with regard to the tiny fixed time cost (cloning and initial push), figuring out, initializing (and possibly installing) git LFS for each contributor would be virtually certain to overwhelm it; furthermore, the files still need to be downloaded and uploaded on and off the LFS server when they are first copied or if they change, and pushes (and fetches, if necessary) need to check with the GitHub LFS server for each file, further increasing latency especially for larger numbers of relatively small files.

Therefore, even in the absolute best case for Git LFS, making the following best case assumptions (to the point of being unrealistic):

  • It only takes an hour to delete my repo, reclone it, recreate everything, setup git-lfs, get everything working, commit the files, and redo the many hours of work in this PR (I've already spent more than that just reading about git lfs and figure it out, but I can't get that time back)
  • All 200+ files are recommitted as git lfs, whether or not they are used (committing only the currently used files makes the scenario an order of magnitude worse for git lfs)
  • 500 pushes per year, implying thousands of commits
  • Every contributor has internet as slow as my 3G-level backup connection, and internet speed does not increase over time
  • The maximum observed 5 seconds difference per push, every push, without any overhead from git lfs
  • Zero extra time per contributor figuring out, installing, initializing, configuring and fixing any problems with git lfs
  • Zero time needed to download/upload the files to from git lfs
  • Zero time inputing any extra commands needed to maintain git lfs and update the files on it
  • Zero extra time per transaction having to hit the git lfs server as well as the remote
  • Zero time resolving bugs, problems, changes etc with git lfs
  • Zero time responding to any would-be contributors trying to resolve and issue on their end

Under those incredibly favorable assumptions, git LFS would take around two years to barely break even on time; if even a few of them are substantially less favorable, which many of them could potentially be by an order of magnitude, we'll both be dead before it does even assuming we update the files regularly.

Furthermore, Git LFS (particularly on Github) has the following significant downsides, aside from any time-related issues mentioned above:

  • Most seriously, users not on Github Enterprise cannot push new Git LFS objects (e.g. any new or updated screenshots) to public forks; only users with push access to the main repo can do so and even then only to the main repo, and they'd have to create a new branch on here
  • 1GB per month bandwidth (and 1GB total object storage), which counts the clones, fetches/pulls and pushes of any user, even just to their own fork—probably not an issue in the short term, but could be as this scales in number of users and files/file size, which is the whole use case for Git LFS in the first place
  • Added time, complexity and possible frustration to successfully start contributing to the project, increasing the barrier to entry for new contributors (one of the very issues this was supposed to solve); users who don't have git lfs for various reasons can't meaningfully contribute at all other than small textual changes
  • Over 200 bugs open on the git lfs repo, and many others closed as wontfix.
  • As you mention, easy to make a mistake and commit files accidentally to the main git repo without an easy undo, especially for new contributors
  • Added dependence on the git lfs server for many operations, if it goes down, is slow, or a bug/attack/user error mangles/removes our files, a lot of things could break
  • Requires a successful connection to the Github LFS server to do any pushing, pulling etc. even locally

Furthermore, more of a near term problem but it will certainly push back our initial docs release, possibly by multiple weeks if the present latency for getting PRs approved continues, particularly considering updating the screenshots was one of the two "highest priority" items you assigned me a few weeks ago.

The problem is you could add (by mistake) this PR to git history and that would make git-lfs useless.

Well, we could still git lfs migrate them I think, but I suppose a cynical individual like myself could take this as another sign of the fragility and ease of misuse and erasing the advantages of git LFS. Further, my question still remains—how does this make a real difference, since if the files would have still been in my local repo somewhere, they would still be in my remote when I reclone from it?

I meant that, for an initial PR, you should only replace our current (and outdated) images. Then, when new text content is added to our docs, you could add more images (as the new content requires them).

Okay, thanks for clarifying; just to be clear there is no new text content whatsoever added in this PR (excepting a small still-relevant existing chunk moved from the removed non-IPython console doc).

However, more to the point, if we don't include the full set of screenshots, all the downsides mentioned above result, and additionally any actual benefits for Git LFS shrink basically to nothing at least for now, while with Git LFS there is essentially is not even the small extra push latency downside to keeping them there for contributors to use when working on the relevant section. Again, I'm not sure I understand the substantial tangible benefits of siloing them away somewhere else other than a few extra seconds a few times per day at most, relative to the numerous downsides mentioned above.

@ccordoba12
Copy link
Member

I don't have much time for these lengthy discussions. So my final proposal is this: let's not use Git LFS, close this PR and start a new PR that only replaces our current images. Then open new PRs to add more images, when new content requires them.

Is that ok for you @CAM-Gerlach?

@ccordoba12
Copy link
Member

The thing is it would be easier to review and merge PRs that add new images, if we see them in the context of new content being added.

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented May 6, 2018

So my final proposal is this: let's not use Git LFS, close this PR and start a new PR that only replaces our current images.

Okay, I guess that's acceptable for now. That still leaves unresolved the issue of how to make the rest of the comprehensive, standardized screenshot set easily accessible to collaborators other than myself, so they don't duplicate effort for a less consistent result (or fail to include screenshots entirely when existing assets would help illustrate a concept), and I'll need to document the process of finding, selecting, and adding them, but we can finishing hashing that out after we get the initial release out the door—I don't want to hold things up any longer.

My suggestion here, if we don't want them in the repo, is to recreate the directory hierarchy and upload the screenshots in a subdirectory of our official Google Drive folder that @goanpeca created that we could give view/download access to collaborators interested in working on the docs. They'd then need to preview them on there, and download and copy them over into the/a corresponding subdirectory of images if they wanted to add them to a content section; we'd need to add information on this process to the Contributing Guide. Relative to just sticking them in the repo, this makes for an overall more manually-tedious and time intensive workflow and an extra layer of complexity for the gain of at most a few seconds when push ing and , but still better than Git LFS in both regards and less of a potential for more serious and intractable problems.

@ccordoba12
Copy link
Member

My suggestion here, if we don't want them in the repo, is to recreate the directory hierarchy and upload the screenshots in a subdirectory of our official Google Drive folder that @goanpeca created

There's no problem with having them here. In fact, I think it's better. I was just worried about that making it difficult to work with the repo. But that doesn't seem to be the case.

@CAM-Gerlach
Copy link
Member Author

There's no problem with having them here. In fact, I think it's better. I was just worried about that making it difficult to work with the repo. But that doesn't seem to be the case.

Okay, cool beans. So, with that being the case then, do we want to:

  1. Re-open this one after all and go from there, since that's essentially what this PR does (fastest solution and allows easy adding of images to the visible doc content, which could then be reviewed in context and in smaller chunks, as you requested, in the more narrowly scoped future PRs that actually use them in the text)
  2. Rebase it (here or in a copy of this branch/PR) to only add the images that are actually used with the current content, and then add the rest in a followup PR (modest time needed, and could be deferred until after the initial public release, but it would get messy if we wanted to add any other new images not in this one until after it is merged)
  3. Just doing the first part of 2. and not adding the images until they are needed (your previous request, which I was just starting to implement, but that would contradict you just said about wanting to include them in the repo after all as the best option with the disadvantages I cited)

Let me know which one you'd like—they're arranged in order of preference, but I'll go right ahead with whichever you think best. Just say the word—err, the number. Thanks!

@ccordoba12
Copy link
Member

Using this PR will mess with git history (due to the large addition and removal of files), so please create a new PR instead.

@ccordoba12
Copy link
Member

A new PR from scratch, I meant.

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented May 7, 2018

So which option? Option 2. in a new branch and a new PR based off this branch with the extra files interactive rebased out (i.e. replacing the first commit from scratch with just the needed files, and then replaying the other commits on top of that, so that the other never existed in that branch)? And then another followup PR with the rest of the files, which can then be reviewed in detail as you requested when they actually get added to user visible content in individual content-specific PRs?

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented May 7, 2018

To be honest I'm not still sure what we really gain from adding the rest of the files all in a separate PR, other than extra work rebasing and reorganizing everything, a bit of conceptual seperation and the possibility to defer merging of that for a bit longer... but if that's how you want it, I can do it that way.

@ccordoba12
Copy link
Member

So which option? Option 2. in a new branch and a new PR based off this branch with the extra files interactive rebased out (i.e. replacing the first commit from scratch with just the needed files, and then replaying the other commits on top of that, so that the other never existed in that branch)?

Yes, that's it.

To be honest I'm not still sure what we really gain from adding the rest of the files all in a separate PR

A couple of things:

  1. For us reviewers, we'll see what images are going to be added for new content.
  2. To not leave unused images here. Maybe you get busy in the future, and you don't have time to write all the new content you have planned right now.

@CAM-Gerlach
Copy link
Member Author

Yes, that's it.

Sorry for the continued confusion, but I thing there was a miscommunication somewhere, since both of the valid concerns you list under

A couple of things

are only directly addressed with Option 3, since Option 2 (as noted in both my previous posts) would then add all the rest of the images to the repo in a separate followup PR (as mentioned previously, Option 3 is basically just your "final proposal" above). I was going to implement the latter, but then you said, in reply to my suggestion of where to put the full set of screenshots for others to use since we weren't putting them in the repo,

There's no problem with having them here. In fact, I think it's better.

...which is what 1. and 2. propose, whereas 3. (your previous proposal) would essentially require the full set of screenshots stored somewhere externally and then manually copied into the correct hierarchy if another collaborator wants to use them, with the disadvantages that entails as mentioned in my previous post that you were responding to.

For us reviewers, we'll see what images are going to be added for new content.

True. But won't you see them either way, and in context, when you're reviewing the changes that actually add them?

To not leave unused images here.

Indeed, and it could be some work tracking down which are used or not.

Maybe you get busy in the future, and you don't have time to write all the new content you have planned right now.

This is actually the main reason behind adding to the repo, so they're readily accessible to contributors other than myself. Otherwise, we have to set up a bespoke system to store them (like the Google Drive solution I suggested) and individually invite interested collaborators, which is more of a burden and increases the barrier to entry for new potential contributors.

So, at least how I see it, it basically comes down to a cost-benefit between maintaining a separate shared google drive folder with the full set of screenshots (and any more that aren't used right away), vs. just including them in the repo; the former keeps the repo itself somewhat faster and cleaner (at least until most of them are added), and makes reviewing potentially a little easier, at the cost of some more time and tedium for collaborators both up front and on an ongoing basis, as well as being another potential hurdle for new contributors.

@ccordoba12
Copy link
Member

@CAM-Gerlach, I really can't keep discussing about this, sorry. Please don't take this the wrong way, it's just that I see things very simple but it seems I'm unable to convey to you what I want :-)

So I'll do a PR for you, committing on your behalf, and I'll submit it as soon as I can (a couple of days at most).

@CAM-Gerlach
Copy link
Member Author

I can just go ahead with what you previously said as your "final proposal" in this comment, as that part was indeed clear and simple and it seems that's still what you want as a first step. We can worry about where to host the rest of the images after, and it seems that's where the confusion/misunderstanding came in.

@ccordoba12
Copy link
Member

Thanks a lot for understanding! I think our purpose for an initial public release is to iterate as quickly as we can until we have something decent to show to the world (like we did with the homepage).

After that, we can discuss about the fine details of future additions.

@CAM-Gerlach
Copy link
Member Author

That's the plan!

@CAM-Gerlach CAM-Gerlach deleted the update-screenshots branch May 11, 2018 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants