-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In telemetry data, hash all imports and not just known ones so that in the future new packages can be found. #3555
Comments
For #4852 <!-- If an item below does not apply to you, then go ahead and check it off as "done" and strikethrough the text, e.g.: - [x] ~Has unit tests & system/integration tests~ --> - [x] Pull request represents a single change (i.e. not fixing disparate/unrelated things in a single PR) - [x] Title summarizes what is changing - [x] Has a [news entry](https://github.com/Microsoft/vscode-python/tree/master/news) file (remember to thank yourself!) - [ ] Has sufficient logging. - [x] Has telemetry for enhancements. - [x] Unit tests & system/integration tests are added/updated - [ ] [Test plan](https://github.com/Microsoft/vscode-python/blob/master/.github/test_plan.md) is updated as appropriate - [ ] [`package-lock.json`](https://github.com/Microsoft/vscode-python/blob/master/package-lock.json) has been regenerated by running `npm install` (if dependencies have changed) - [ ] The wiki is updated with any design decisions/details.
Some thoughts on perf:
Proposal
Impact:
|
https://github.com/Microsoft/vscode-python/blob/master/news/3%20Code%20Health/4852.md Is this correct, can we |
There's no unhashing. It's comparing against known hashes. Like the unit test does. |
We should measure the perf impact of this before we go off and design something more complicated. It only looks at the first 1K of lines. |
Text is already in memory as far as I can tell, so we don't add any extra overhead by reading it. Spawning a python process may take more cpu time than applying a regex against 1000 lines of text. I'll measure it to find out though. |
@DonJayamanne debouncing sounds like a good idea. No need to queue up a ton of these if the file is just being saved a bunch. I'll add that too while I look at the perf. |
You might want to use the |
The key difference is, it's out of proc, i.e. The extension process time isn't used. Chewing extension process time is where we need to be careful. |
I didn't mean the time to run the python code, I mean simply the time to call proc.spawn (and hook up the stdio handlers and read from them). Iterating over a 1000 lines is really damn fast. Maybe faster than the amount of code that runs when calling proc.spawn. I'm profiling now and for a single save of a 3000K line file, the profiler doesn't even register the time for the import. I have to save 3 or 4 times to get anything to show up for the import in the profile. Our code lens parsing is a much bigger part of on save. And this is with debug bits. Minimized bits are likely even faster but it's hard to tell what the CPU time is from. |
Here's a detail analysis (of debug bits) Essentially most of the time doing this work (on a 1000 lines) and it's not much time, just 5 ms, is generating the hashes. We could do the hash generation in the python code, but doesn't seem worth it. Reparsing codelens on average is around 24ms each save on this same file. And it seems there's a python spawn in the profile. Spawn itself takes 9ms. So this is actually faster. I'll do the debounce though. |
|
Right now we return the package name for imports if it comes from a known list.
This doesn't work if we want to look for new ones in the future.
Instead this should use a hash of the package name.
The text was updated successfully, but these errors were encountered: