Implement hacky variant of `jj purge` #4171

0xdeafbeef · 2024-07-28T10:38:42Z

It's a hacky variant to retrieve a list of files that are too large to be snapshotted, but I haven't found a better way to do it. If you think it's acceptable, I can further polish it. Testing it on #3616 (comment) gives this result:

The following files are too large to be added to the working copy:
  /tmp/tmp.CisahnuEM7/rand_10m
Removed 1 files totaling 10.0MiB

Checklist

If applicable:

I have updated CHANGELOG.md
I have updated the documentation (README.md, docs/, demos/)
I have updated the config schema (cli/src/config-schema.json)
I have added tests to cover my changes

yuja · 2024-07-28T13:57:06Z

If we plan to turn "files too large" error into warning (either by default, or by command-line flag), it makes sense that snapshot() collects a list of large files. Maybe snapshot() can return SnapshotStats { Vec<NewFileTooLarge> .. }, and the caller will translate it to an error or warning?

Regarding the UI, I don't think jj purge should be the command to purge large files by default. Maybe this behavior should be behind a flag like jj purge --untracked-large-files.

Regarding purge() API, it might have be a method of LockedWorkingCopy, and cmd_purge() might call it like this:

let stat = locked_wc.snapshot();
let matcher = if args.untracked_large_files {
    FilesMatcher::new(stat.files_too_large);
} else {
    EverythingMatcher // all untracked files will be purged?
};
locked_wc.purge(&matcher);

0xdeafbeef · 2024-08-18T14:42:24Z

If we plan to turn "files too large" error into warning (either by default, or by command-line flag), it makes sense that snapshot() collects a list of large files. Maybe snapshot() can return SnapshotStats { Vec<NewFileTooLarge> .. }, and the caller will translate it to an error or warning?

Regarding the UI, I don't think jj purge should be the command to purge large files by default. Maybe this behavior should be behind a flag like jj purge --untracked-large-files.

Regarding purge() API, it might have be a method of LockedWorkingCopy, and cmd_purge() might call it like this:
let stat = locked_wc.snapshot();
let matcher = if args.untracked_large_files {
    FilesMatcher::new(stat.files_too_large);
} else {
    EverythingMatcher // all untracked files will be purged?
};
locked_wc.purge(&matcher);

Snapshot of LockedWorkingCopy returns MergedTreeId, so it's unclear how to add stats to it.
https://github.com/martinvonz/jj/blob/aa0fbd9f3f78ab23e35ce64a5af1a657c9c7476f/lib/src/working_copy.rs#L99

I can change it to

struct Snapshotresults{
    tree_id: MergedTreeId,
    stats: SnapshotStats
}

with deref() to MergedTreeId

snapshot() is used in not so many places, so it shouldn't break code

martinvonz · 2024-08-18T16:16:42Z

Snapshot of LockedWorkingCopy returns MergedTreeId, so it's unclear how to add stats to it.

https://github.com/martinvonz/jj/blob/aa0fbd9f3f78ab23e35ce64a5af1a657c9c7476f/lib/src/working_copy.rs#L99

I can change it to
struct Snapshotresults{
    tree_id: MergedTreeId,
    stats: SnapshotStats
}

Sounds good.

with deref() to MergedTreeId

I think it's better to have the callers explicitly get the tree id and the stats. See https://doc.rust-lang.org/std/ops/trait.Deref.html#when-to-implement-deref-or-derefmut

0xdeafbeef · 2024-08-18T17:50:24Z

@yuja @martinvonz I'm not sure what to do when we can't snapshot files.
I see such flow:

jj status or any other command fails to snapshot →
hint to user that we can't snapshot files, so he can either delete or tweak snapshot.max-new-file-size →
proceed with command.

Also, maybe jj purge should be left in util? Because I can't find any other use case for it than removing files that failed to snapshot. I'll add purge to the backend later, also not so sure if it should be here, because file removal doesn't have anything in common with the backend. And if we can't snapshot a file, it means the backend doesn't know about it :)

UPD. I forgot about deleting ignored files. But it seems like backend wouldn't know about them anyway

martinvonz · 2024-08-18T20:16:54Z

@yuja @martinvonz I'm not sure what to do when we can't snapshot files. I see such flow:

jj status or any other command fails to snapshot → hint to user that we can't snapshot files, so he can either delete or tweak snapshot.max-new-file-size → proceed with command.

Yes, printing a warning and a hint and then proceeding sounds good to me. However, I think there's some risk of losing data then. In particular, I think we would lose the file in this scenario:

Add a small file in commit X
Modify the small file in commit Y
Make the file large in the working copy
jj new X

We are careful not to overwrite files when updating between commits, but only if the file was added. If the file existed in the old commit (i.e. the working-copy commit on top of Y in this case, which was snapshotted at the start of jj new X), we don't check that it has not changed.

I suppose we could be more careful in this case and not update files if their mtime etc don't match our expectation.

Oh, I suppose we can also fix it for the large-files case by making the snapshotting remove files from the snapshotted tree (but not from the working copy) if they are too large. I think that would make logical sense too.

Also, maybe jj purge should be left in util? Because I can't find any other use case for it than removing files that failed to snapshot. I'll add purge to the backend later, also not so sure if it should be here, because file removal doesn't have anything in common with the backend. And if we can't snapshot a file, it means the backend doesn't know about it :)

UPD. I forgot about deleting ignored files. But it seems like backend wouldn't know about them anyway

Maybe that will need to be implemented by a different function on the LockedWorkingCopy trait, or maybe it would be a flag in SnapshotOptions. It would probably involve a similar process as snapshotting. Always returning a list of all ignored files is probably not an option because there can be very many, and the information would usually be ignored.

yuja · 2024-08-19T02:46:08Z

Yes, printing a warning and a hint and then proceeding sounds good to me. However, I think there's some risk of losing data then. In particular, I think we would lose the file in this scenario:

Add a small file in commit X

Modify the small file in commit Y

Make the file large in the working copy

jj new X

If the file has already been tracked, I think it will be snapshotted at 4.

It might be good to disable working-copy mutation if snapshot emitted a warning.

0xdeafbeef requested a review from yuja July 28, 2024 10:39

0xdeafbeef force-pushed the 0xdeafbeef/push-rwlyqszrwzqs branch from 1895759 to 3154e5e Compare August 10, 2024 20:46

Implement hacky variant of jj purge

4f861bd

0xdeafbeef force-pushed the 0xdeafbeef/push-rwlyqszrwzqs branch from 3154e5e to 4f861bd Compare August 18, 2024 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement hacky variant of `jj purge` #4171

Implement hacky variant of `jj purge` #4171

0xdeafbeef commented Jul 28, 2024

yuja commented Jul 28, 2024

0xdeafbeef commented Aug 18, 2024

martinvonz commented Aug 18, 2024

0xdeafbeef commented Aug 18, 2024 •

edited

Loading

martinvonz commented Aug 18, 2024

yuja commented Aug 19, 2024

Implement hacky variant of jj purge #4171

Are you sure you want to change the base?

Implement hacky variant of jj purge #4171

Conversation

0xdeafbeef commented Jul 28, 2024

Checklist

yuja commented Jul 28, 2024

0xdeafbeef commented Aug 18, 2024

martinvonz commented Aug 18, 2024

0xdeafbeef commented Aug 18, 2024 • edited Loading

martinvonz commented Aug 18, 2024

yuja commented Aug 19, 2024

Implement hacky variant of `jj purge` #4171

Implement hacky variant of `jj purge` #4171

0xdeafbeef commented Aug 18, 2024 •

edited

Loading