Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement hacky variant of jj purge #4171

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

0xdeafbeef
Copy link
Member

It's a hacky variant to retrieve a list of files that are too large to be snapshotted, but I haven't found a better way to do it. If you think it's acceptable, I can further polish it. Testing it on #3616 (comment) gives this result:

The following files are too large to be added to the working copy:
  /tmp/tmp.CisahnuEM7/rand_10m
Removed 1 files totaling 10.0MiB

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@0xdeafbeef 0xdeafbeef requested a review from yuja July 28, 2024 10:39
@yuja
Copy link
Contributor

yuja commented Jul 28, 2024

If we plan to turn "files too large" error into warning (either by default, or by command-line flag), it makes sense that snapshot() collects a list of large files. Maybe snapshot() can return SnapshotStats { Vec<NewFileTooLarge> .. }, and the caller will translate it to an error or warning?

Regarding the UI, I don't think jj purge should be the command to purge large files by default. Maybe this behavior should be behind a flag like jj purge --untracked-large-files.

Regarding purge() API, it might have be a method of LockedWorkingCopy, and cmd_purge() might call it like this:

let stat = locked_wc.snapshot();
let matcher = if args.untracked_large_files {
    FilesMatcher::new(stat.files_too_large);
} else {
    EverythingMatcher // all untracked files will be purged?
};
locked_wc.purge(&matcher);

@0xdeafbeef 0xdeafbeef force-pushed the 0xdeafbeef/push-rwlyqszrwzqs branch from 1895759 to 3154e5e Compare August 10, 2024 20:46
@0xdeafbeef
Copy link
Member Author

If we plan to turn "files too large" error into warning (either by default, or by command-line flag), it makes sense that snapshot() collects a list of large files. Maybe snapshot() can return SnapshotStats { Vec<NewFileTooLarge> .. }, and the caller will translate it to an error or warning?

Regarding the UI, I don't think jj purge should be the command to purge large files by default. Maybe this behavior should be behind a flag like jj purge --untracked-large-files.

Regarding purge() API, it might have be a method of LockedWorkingCopy, and cmd_purge() might call it like this:

let stat = locked_wc.snapshot();
let matcher = if args.untracked_large_files {
    FilesMatcher::new(stat.files_too_large);
} else {
    EverythingMatcher // all untracked files will be purged?
};
locked_wc.purge(&matcher);

Snapshot of LockedWorkingCopy returns MergedTreeId, so it's unclear how to add stats to it.
https://github.com/martinvonz/jj/blob/aa0fbd9f3f78ab23e35ce64a5af1a657c9c7476f/lib/src/working_copy.rs#L99

I can change it to

struct Snapshotresults{
    tree_id: MergedTreeId,
    stats: SnapshotStats
}

with deref() to MergedTreeId

snapshot() is used in not so many places, so it shouldn't break code

@martinvonz
Copy link
Member

Snapshot of LockedWorkingCopy returns MergedTreeId, so it's unclear how to add stats to it.

https://github.com/martinvonz/jj/blob/aa0fbd9f3f78ab23e35ce64a5af1a657c9c7476f/lib/src/working_copy.rs#L99

I can change it to

struct Snapshotresults{
    tree_id: MergedTreeId,
    stats: SnapshotStats
}

Sounds good.

with deref() to MergedTreeId

I think it's better to have the callers explicitly get the tree id and the stats. See https://doc.rust-lang.org/std/ops/trait.Deref.html#when-to-implement-deref-or-derefmut

@0xdeafbeef 0xdeafbeef force-pushed the 0xdeafbeef/push-rwlyqszrwzqs branch from 3154e5e to 4f861bd Compare August 18, 2024 17:23
@0xdeafbeef
Copy link
Member Author

0xdeafbeef commented Aug 18, 2024

@yuja @martinvonz I'm not sure what to do when we can't snapshot files.
I see such flow:

jj status or any other command fails to snapshot →
hint to user that we can't snapshot files, so he can either delete or tweak snapshot.max-new-file-size
proceed with command.

Also, maybe jj purge should be left in util? Because I can't find any other use case for it than removing files that failed to snapshot. I'll add purge to the backend later, also not so sure if it should be here, because file removal doesn't have anything in common with the backend. And if we can't snapshot a file, it means the backend doesn't know about it :)

UPD. I forgot about deleting ignored files. But it seems like backend wouldn't know about them anyway

@martinvonz
Copy link
Member

@yuja @martinvonz I'm not sure what to do when we can't snapshot files. I see such flow:

jj status or any other command fails to snapshot → hint to user that we can't snapshot files, so he can either delete or tweak snapshot.max-new-file-size → proceed with command.

Yes, printing a warning and a hint and then proceeding sounds good to me. However, I think there's some risk of losing data then. In particular, I think we would lose the file in this scenario:

  1. Add a small file in commit X
  2. Modify the small file in commit Y
  3. Make the file large in the working copy
  4. jj new X

We are careful not to overwrite files when updating between commits, but only if the file was added. If the file existed in the old commit (i.e. the working-copy commit on top of Y in this case, which was snapshotted at the start of jj new X), we don't check that it has not changed.

I suppose we could be more careful in this case and not update files if their mtime etc don't match our expectation.

Oh, I suppose we can also fix it for the large-files case by making the snapshotting remove files from the snapshotted tree (but not from the working copy) if they are too large. I think that would make logical sense too.

Also, maybe jj purge should be left in util? Because I can't find any other use case for it than removing files that failed to snapshot. I'll add purge to the backend later, also not so sure if it should be here, because file removal doesn't have anything in common with the backend. And if we can't snapshot a file, it means the backend doesn't know about it :)

UPD. I forgot about deleting ignored files. But it seems like backend wouldn't know about them anyway

Maybe that will need to be implemented by a different function on the LockedWorkingCopy trait, or maybe it would be a flag in SnapshotOptions. It would probably involve a similar process as snapshotting. Always returning a list of all ignored files is probably not an option because there can be very many, and the information would usually be ignored.

@yuja
Copy link
Contributor

yuja commented Aug 19, 2024

Yes, printing a warning and a hint and then proceeding sounds good to me. However, I think there's some risk of losing data then. In particular, I think we would lose the file in this scenario:

  1. Add a small file in commit X
  2. Modify the small file in commit Y
  3. Make the file large in the working copy
  4. jj new X

If the file has already been tracked, I think it will be snapshotted at 4.

It might be good to disable working-copy mutation if snapshot emitted a warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants