-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] 2Vec SaveLoad improvements #2892
Conversation
Everything currently in this is probably a good/safe change, but the initial hope was to knock down some related issues in this PR as well, such as some or all of:
|
Alright. I'll see what I can add in my PR. Please merge my PR gojomo#11 into this PR, to avoid divergence (I already had to resolve some conflicts). |
I know your PR-against-a-PR is trying to help, but your changes would work much better in their own cleanup PR. My intent creating this PR was to early-apply some of the CI machine auto-testing as a convenience against changes I thought safe, but might've had subtle other problems. It was marked I also find it much easier to handle core functionality/coverage issues totally separately from superficial cleanup like docstring-formatting or py2-to-py3 changes - it keeps the diffs focused & easy-to-skim, without mixed-in other changes, and avoids confusion over what might be to blame for any test-result/functionality changes. Also, reviewing-for-integration someone else's merge-from-develop is an inefficient distraction. I could rebase this atop any number of changes on develop quite easily, because I knew exactly the intent of this PR & the couple-dozen lines changed/moved. But your ~38 commits & 65-files-changed merge-from-develop is a giant interleaving of what I was focused on, with a bunch of miscellany: https://github.com/gojomo/gensim/pull/11/files. It's far harder to review for interactions. If it was just preamble to the one-newline-and-two-backticks of gojomo@49b35b7, those can and should come another way. Even if I just blindly trusted the merge, as you've almost certainly done the right conflict-resolutions, it would leave this PR less focused and reviewable for its original purpose. I would prefer to rebase this on |
I understand your preference for "private" PRs, but right now I'd prefer to finish up things that can be finished, in standard "shared" collaborative PRs. To unblock the release. I cannot edit your PR, so please either finish it yourself, or I'll close this and merge my branch directly (hopefully after implementing some of your other suggestions from your checklist above). The gojomo#11 diff only shows commits from |
It's not so much "private" as both "focused" and "amenable to experimentation". And why on earth should a PR with the title "2Vec SaveLoad Improvements" get unrelated py2-to-py3 housekeeping edit mixed in? I don't quite understand the reference to "blocking" 4.0 either. I haven't heard a target release date; I haven't heard a target set of features. I've not noticed any answers to questions I keep asking about whether it's OK to start dropping before-gensim-3.8.3 backward-compatibility. Rather than declaring this little low-priority, nothing-functional-is-broken PR as somehow "critical path" for a release, and sending what are, in my opinion, superficial & unrelated changes into it that I should attend to before proceeding with its original goals, it would have been more efficient just to ignore it until/unless it became ready. Do other touchup/polish elsewhere. If such changes land in Or alternatively, if your review (and/or my report) deemed this code as both ready & urgent for merging - just merge it. Don't complicate it with a PR-on-a-PR process, or defocus with other layers of changes that have to be reconciled-then-reconciled. I'd rather have to catch up to a trunk that's changed, for good reason, only when I think my changes are ready - rather than be put on the spot to both accept side-changes promptly before resuming my own priorites, and still not fall further behind any trunk changes. Efficient batching of attention/effort is really crucial. I'll try merging your PR to see what Github's interfaces do, but if it winds up hiding my ability to see just the narrow intended changes of this branch/PR, I'm going to roll-back to my original plan. |
b5794ee
to
02ebe60
Compare
Tried the Github default 'squash & merge' (as the option to 'retain all commits' was "not enabled for this repository", & 'rebase & merge' seemed worse). Alas, that left this PR/branch appearing to have changed all 65 files with all the changes of the ~40 commits in Github's 'files changed' interface -- which would make this PR/branch useless. Github's 'revert' button appeared to create only an additive undo PR. So I force-pushed my working branch ( So I suggest you ignore the gojomo#11 to here (#2892) to Regarding the other potential-next-steps items listed at #2892 (comment) - I wasn't in-progress or imminently expecting to address any of them, so feel free to tackle any of them. (I suspect the |
Github's "squash" = rebase, AFAIU. Anything but plain merge is bound to create trouble on collaborative PRs. The outstanding tickets & PRs for 4.0.0 are marked in the 4.0.0 milestone. That's what I'm working off of – @gojomo @mpenkov if you see anything else to be included in 4.0.0, please add it there. |
Continued in #2939, closing here. |
The 2Vec classes were doing serialization fixups in save()/load() that should really, by
utils.SaveLoad
design, be in_save_specials()
/_load_specials()
- to better handle cases when oneSaveLoad
contains another.Also looks like there may be no existing functional tests of loading of
gensim-3.8.3
FastText
model, so it's possible that functionality is broken-with-an-unhelpful-failure. If so, I'll add a test & fix for that here, if possible.(This builds atop #2891, which can & should be applied asap.)