-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to sort out extension ehowing huge amount of "pending changes" #5861
Comments
Have you tried running |
Hi Matt, thanks for looking at this. I had not done a pull just before starting it to be fair so I did, got the number down to 108K. My git status shows nothing so this is all the status added by DVC. You are bringing a good point though, that some of these pipilines are maintained only by some teams and ideally other teams don't have to pull the data from all other pipelines if they don't need it. Cheers, |
Also, shouldn't these all be ignored since I specified a specific folder through |
I would guess that there are duplicates between uncommitted and not in cache as there is an unknown uncommitted status that things fall into when they are not in the cache (I forget the history on this), you can search the DVC issues if you want to know more. For focused projects, these relate to - "A subset of paths to the workspace's available DVC projects. Using this option will override project auto-discovery." I am guessing that you've set this to a pipeline instead of a project, so it failed validation and bypassed the option. Try using the "Select Project(s) to Focus (set dvc.focusedProjects)" quick pick to select a project to focus. This should give a list of valid options. |
@dibus2 thank you for reporting this! |
@dibus2 thanks for creating the issue! a few questions:
|
For instance in project_1/data/
Now most of these are usually not being pulled because they are not used anymore or not yet and maybe that's a problem to keep the .dvc files around? |
I actually do not find the quick pick to select the project option @mattseddon |
@dibus2 sorry, just to clarify. I specifically mean
yes, if you don't need them - better to drop them I guess (you can always recover them from the Git history - that's the beauty of DVC). |
ah sorry, yes I have only one in the top folder @shcheklein |
so, if the projects are independent - can we consider making them subprojects? that might help I think DVC and extension and Studio a lot. (each DVC command won't be analyzing all the existing pipelines, it can "focus" only on a single one at a time). |
actually let me explore that option. How do I do that? |
I think you can just try to do |
Hi @shcheklein, However, I did cleanup the repo removing all the .dvc files that were not needed in the current commit and I got it down to only 14 pending piles which I think can be ignored. However, I still can't do anything with the extension. It's getting stuck into I'm not sure None of the commands are responding. I did notice that the data dvc status takes about 40 seconds and it seems to run this quite a bit and I wonder if it's not just spinning its wheels on this? |
I guess util files are fine (Python files?) Datasets - it depends a bit. It's not usually a problem in DVC to duplicate it (if cache is shared and you use symlinks / reflinks / hardlinks - there will be no impact on space or anything). I see that plots diff also takes quite a long time. I suspect that even collecting a full dag is probably an expensive operation - could you run I see also that even |
so @shcheklein regarding time to run dvc status see screenshot Regarding the git log I m not sure where you see it in the log it doesn't take any time in the cli. Regarding the plots/ I noticed that we are actually tracking output folders in a lot of the pipelines / and that lead to not being able to show them through the show plots in the extension (at least I assume because it says there is no plots to show) |
I suspect that processing the data for these plots is blocking the extension host thread. We could add something like a focus pipeline option but that would be at least a couple of days of work. |
@mattseddon do we run Since It seems it just takes time to get all the DVC files together. How many |
When starting the VS code extension it shows 110K Pending Files.
Application is extremely laggy to the point where it makes using the EC2 machine impossible.
The DVC section in the GIT tab shows nothing (as I suspect there are too many files).
The way the repo is setup is as such:
I also tried to specialized the dvc extension to a single pipeline but it doens't seem to be helping. This is what I have in my settings.json
"dvc.focusedProjects": [
"project_1/pipeline_1"]
Note that pipeline_1 folder contains the dvc.yaml file.
Any help on sorting this out is appreciated.
Thanks.
The text was updated successfully, but these errors were encountered: