-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime error in phrases.py #3031
Comments
OK, I can see the issue. Phrases employs The proper fix will be to get rid of this unwanted mutation – either by replacing @thalishsajeed are you up for it? |
@piskvorky Yep, I vote for replacing defaultdict. If you agree with that, I'll make a PR with the change. (after checking for any other unwanted issues) |
Yes, please go ahead. You can re-use #3030 , no need to start a new PR. |
@piskvorky Is it okay to replace instances of the following code pattern
with
I don't know of any way to do updates in-place in a regular dict. There's also this line - which needs to be modified for similar reasons and making sure generator isn't called twice if you use .get method. Still comfortable moving to dict right? Also I read some SO posts that defaultdict is more performant that dict, wondering if that is still the case and needs to be considered for this change. |
Yes, that's the way to do it! And yes, I'd expect defaultdict to be (slightly) more performant than dict. But having correct, readable code is more important. If we ever optimize Phrases, it'll be by translating its code to Cython / C, for a proper 10x+ boost. Not chasing a few percent here and there with defaultdict. |
@piskvorky Thanks! Well, I'm done with the changes, can you point me to how I can run the unit test suite locally? |
Hmm. @mpenkov will |
I usually do You may need to do something like |
@mpenkov Thanks! Do you have anything for me where I can read regarding the current tests which are passing? Also I need some help with writing test cases, why is the test case called testExportPhrases present but with another function i.e find phrases being called inside.? Do i need to create a separate test case or is there an existing test case for export_phrases that I need to fix. |
Yes, add a new test case please. I don't know why a test called |
Other than the developer wiki, which @piskvorky mentioned above, no, I'm not aware of any unit-test related documentation. Looks like export_phrases got renamed to find_phrases, but the test case wasn't renamed. Renaming the test case should resolve the problem. |
@thalishsajeed did you manage? |
Hi yes, I'll work on this tomorrow and update you :) |
@piskvorky : Soo, just a quick update. The current implementation of analyze_sentence seems to modify the vocabulary (the count is zero naturally because defaultdict) So doing something like -
Will modify the vocabulary and so that bigams.vocab would end up looking like this -
This then seems to be affecting I guess my question is - is this expected behavior? P.S - I hope I was able to explain the issue otherwise , let me know if you want like a more detailed write up of this behaviour. |
Yes, not desirable, as discussed above. Can you replace the |
@piskvorky Yes, I've done that bit. I stumbled upon this while trying to create a test case. My test cases kept failing because the phrase scores were not matching up when I compared the currrent branch to the BugFix branch and it took some time to figure out what was happening. |
Wait, this still happens after getting rid of defaultdict? Something else must be afoot then. |
No no. Let me explain. So after getting rid of default dict I tried to make sure everything else works exactly the same. That's when I stumbled upon this phenomenon where the scores for phrases were different between the develop branch and the bugfix branch. Naturally I assumed that I was making some mistake while getting rid of defaultdict which is why the scores were different. After diving a little deeper in the debug mode I realized the root cause i.e the vocab being changed when calling analyze_sentence in the develop branch so just wanted to be sure that I'm not messing with some expected behavior. |
No at all, you found a nasty hidden bug – well done! |
@piskvorky Hi, what is the workflow for closing this issue? |
The issue will be automatically closed once its corresponding PR gets merged. |
* fix typo * fix test cases for test_export_phrases * add test cases for test_find_phrases * Fix #3031 Runtime error in phrases.py * remove unused variable reference * fix newline to end of file * fix formattingpy * Update CHANGELOG.md * Update CHANGELOG.md Co-authored-by: Michael Penkov <[email protected]>
Problem description
Trying to use export_phrases function on a phrases model.
Instead getting Runtime error
Steps/code/corpus to reproduce
Versions
Please provide the output of:
The text was updated successfully, but these errors were encountered: