-
-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary snapshots #489
Binary snapshots #489
Conversation
This fixes the trailing_crlf_inline test.
I like this in theory. What might make this whole thing more robust would be to say that the following has to be true:
It would be interesting to also add a |
I've made some adjustments: As you suggested I've changed it so the contents are first written to a temporary file with the At the start of the I had to change the This problem could also be solved in different ways: Also the Maybe I can add the quick-look thing for macos soon, but I'll have to borrow a mac from a colleague to test it. Also I'm not sure if opening both files in quick-look makes sense. Maybe two different shortcuts makes more sense. |
insta/src/snapshot.rs
Outdated
let mut this = File::open(this).unwrap(); | ||
let mut other = File::open(other).unwrap(); | ||
|
||
file_eq(&mut this, &mut other).unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically we've had issues when running multiple tests at the same time; files can disappear under insta. Can we handle that a bit better here rather than blindly panicking? For example, at least with .expect
with a message so we can debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to return Result
's with Box<dyn Error>
to be more consistent with the rest of the code. It's not quite optimal for debugging because the origin of the error is missing, but I thought it would be better to be consistent.
Variants in a new error enum might be an alternative. Those could then also be returned as dyn Error
's. Or just using expect
as you suggested. I'm not quite sure what is best here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agree there are tradeoffs, and just ignoring the errors might not be ideal.
A more detailed approach would be to think through when errors are really errors. For example, if a pending snapshot goes missing, that's fine — another process might have removed it. In contrast, if a target snapshot isn't available that's probably unexpected.
But so far ignoring errors with reading files seems to have been OK in practice; possibly we could print a message to protect against confusing results (though maybe others will disagree, ofc welcome).
Also fixes missing call to memoize_snapshot_file in the binary case.
While we wait for a more informed review, one thought (not confident): I notice we add new enums — would there be any case for collapsing "Inline | Named" enums and the new "String | Binary" enums into a single enum? Every Binary snapshot must be Named. But I'm not sure if that makes the structure simpler... Edit: one low-hanging fruit as an example — the proposed code has |
I suppose the type SnapshotName<'a> = Option<Cow<'a, str>>;
pub enum SnapshotValue<'a> {
String {
name: SnapshotName<'a>,
content: &'a str,
},
InlineString {
reference_content: &'a str,
content: &'a str,
},
Binary {
name: SnapshotName<'a>,
write: &'a mut dyn FnMut(&mut File),
extension: &'a str,
},
} or maybe type SnapshotName<'a> = Option<Cow<'a, str>>;
pub enum ReferenceValue<'a> {
Named(SnapshotName<'a>),
Inline(&'a str),
}
pub enum SnapshotValue<'a> {
String {
refval: ReferenceValue<'a>,
content: &'a str,
},
Binary {
name: SnapshotName<'a>,
write: &'a mut dyn FnMut(&mut File),
extension: &'a str,
},
} I'm a bit less sure about |
Yes those both seem reasonable, though on reflection not obvious changes either. (I note we're handling the contents differently between named string and named binary snapshots — string snapshots we hold the value in memory while binary snapshots we hold only the path, which leads to some of the split. Maybe that's because we expect the binary snapshots to be larger?)
(edit) — I tried creating a |
Yes, the reason for storing them directly in files is the size. I was concerned about large files, especially when multiple tests are running concurrently. But this is definitely a tradeoff since passing around a |
Yeah. And IIUC we are storing them in memory when writing them at the moment. (+ agree given they're in git, the likelihood of multi-GB values seems very remote...) |
@lasernoises I guess we're still waiting on a review. Not a verdict, but on further reflection, I do think simplifying the code to load snapshots into memory would make sense. The prospect of multi-GB sized snapshots seems quite unlikely — they're stored in git, the computational cost of diffing snapshots of that size will likely bind before the memory size becomes an issue. Even before your changes, the code handling different types of snapshots is a bit complicated. (But ofc no need to make changes based on my opinion) |
Sorry for replying so late.
Because it's a callback that get's an
I think you're right and that for real usecases you'll want to minimize size anyways. In my PDF case I avoid embedding fonts for example and use the builtin fonts instead. I think I'll try to take some time next week to try it with the binaries in memory. |
Thanks for resolving the merge conflicts! That must have been a decent amount of work... One note — I've recently spent lots of time simplifying the internals, and I'd be very hesitant to merge the approach that uses files rather than in-memory values — I think that brings another layer of complication without much benefit. (at least — I want to understand whether extremely large snapshots are really being used, such that it's worth maintaining the code for two different approaches)... |
Yes, my plan is still to try it with the binary in memory (it just took me way longer to get to it than I thought..). I just decided to still try basing it on the previous work on this branch because after looking over the changes again it seems that a bunch of them will still be needed. Although, maybe for a clean history and preventing unnecessary changes I now think I'll try to start clean. (Also because there's already more conflicts...) |
OK great! Hopefully it could use at least some of the existing code... One note is that a couple of outstanding PRs change the We could adjust that to make I'd like to merge that ASAP, though waiting on feedback from @mitsuhiko |
This is the somewhat simplified version of mitsuhiko#489 where we keep the full files in-memory during the assert and when reviewing.
I ended up getting relatively far today. I currently have the new version at: https://github.com/lasernoises/insta/tree/in-memory-binary-snapshots. There's still some smaller stuff I need to take care of and then I'll probably have some questions about some of the details. (I'll also need to take a more detailed look at #581 and see how compatible that one is with my changes.) But first: should I create a new PR and close this one or reset this branch to the other one? |
Totally as you wish! |
I decided to open a new PR because the implementation is sufficiently different and it might be nice to be able to easily compare them. |
The biggest difference to the previous attempt in #489 is that the snapshot contents are now kept in-memory instead of being written into a temporary file directly. For `SnapshotContents` I decided to make it an enum with two variants like in the previous attempt. I first attempted to change it to three variants so it would also replace the `SnapshotKind` enum but I realized that a lot of logic is still shared between inline and file text snapshots. That approach would definitely also work though and it's perhaps also a question of how much the logic between the two is expected to diverge in the future. In `runtime::assert_snapshot` I replaced the combination of the `refval` and `new_snapshot_value` with an enum. In that case it is a three variant enum and some `From` impls for two value tuples for the macros. Maybe it would be better to make it a two variant enum there too for consistency and keep the `ReferenceValue` enum for the text case. That might be a bit weird because both binary and non-inline text snapshot have a name and that would then be in two different enums. I should probably also add some tests in `cargo-insta`. I'll do that as soon as possible, but I need to figure out how the testing there works. --------- Co-authored-by: Maximilian Roos <[email protected]> Co-authored-by: Maximilian Roos <[email protected]> Co-authored-by: Frank Elsinga <[email protected]>
This is a relatively basic implementation of binary snapshots. I decided to adapt my implementation mentioned in #196 (comment) to handle binary files in general and remove the whole thing that prints images to the terminal.
The basic approach is store a file next to the snapshot file with the provided extension appended. The snapshot macro gets passed a closure with an
&mut File
parameter. Just using aVec<u8>
would be somewhat simpler in implementation, but I would be worried about memory usage if there's a bunch of tests running with large files.I decided not to return a
Result
from the closure, because if there's an I/O error inside it probably makes sense just to panic there because we're inside a test anyway. But maybe it would still make sense to return a Result there just so there is no need for.unwrap()
inside the closure?Currently it directly writes the file at the location where it will end up at the beginning of the
assert_snapshot
function. This means the binary file is created assnap.new.$extension
much sooner than the metadata is saved because we need the file to be able to compare it. What is a bit weird is if there is a crash somewhere between overwriting the binary file and storing the .snap file it will leave the two files in an inconsistent state. Creating it as.snap.new.$extension.tmp
and then moving it when saving the metadata might be a good alternative approach.There is also a problem if the extension changes between two runs where the old pending binary file will stay around. It doesn't seem to a problem in practice though because when running with
cargo insta test
becausecargo-insta
removes the old pending snapshots first. Maybe it would make sense to also do that generally at the start ofassert_snapshot
or alternatively cleanup all.snap.new.*
somewhere.The review TUI looks like this:
The open action opens both files (or just one if there's no existing snapshot, or the existing one is text) in an external application using the
open
crate.The file paths are using the OSC-8 escape sequence (described here) to link to the files. Here it says that
file:///
without a hostname should be avoided, but at least my terminal (WezTerm) seems to support it without a hostname and so I left that out for now since it would mean an extra dependency on a crate to get the hostname. As an alternative we could also just print the path includingfile://
to the terminal since many terminals seem to support clicking on those as well.