Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore duplicates if those are hard links #234

Merged
merged 1 commit into from
Feb 20, 2021
Merged

Conversation

blob79
Copy link
Contributor

@blob79 blob79 commented Jan 30, 2021

This is a proof of concept.

$ echo a > hardlinks/a
$ cp hardlinks/{a,b}
$ ln hardlinks/{a,c}
$ cargo run --bin czkawka_cli dup -m 1 --directories $(pwd)/hardlinks -f /dev/stderr > /dev/null
-------------------------------------------------Files with same hashes-------------------------------------------------
Found 1 duplicated files which in 1 groups which takes 2 B.

---- Size 2 B (2) - 2 files
/home/thomas/Development/czkawka/hardlinks/a
/home/thomas/Development/czkawka/hardlinks/b

Open:

  • Windows support
  • Probably this should be a cli option

@blob79 blob79 force-pushed the hard_links branch 2 times, most recently from 4ffe4ae to 2669688 Compare January 30, 2021 11:53
Comment on lines 912 to 915
let dups = self.filter_hard_links(vec_file_entry);
if dups.len() > 1 {
self.files_with_identical_hashes.entry(size).or_insert_with(Vec::new);
self.files_with_identical_hashes.get_mut(&size).unwrap().push(vec_file_entry);
self.files_with_identical_hashes.get_mut(&size).unwrap().push(dups);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved into check_file_size function, because for now this only works for full file hashing and checking files before hashing will allow to drop some records before doing such resource heavy operation.

@qarmin qarmin added the enhancement New feature or request label Feb 5, 2021
@Sbgodin
Copy link
Contributor

Sbgodin commented Feb 5, 2021

Thanks for this improvement, I guess it would be also nice to choose whether it will be ignored or not.

@blob79 blob79 force-pushed the hard_links branch 3 times, most recently from f7efde7 to ba705df Compare February 20, 2021 09:34
This is a proof of concept.

```
$ echo a > hardlinks/a
$ cp hardlinks/{a,b}
$ ln hardlinks/{a,c}
$ cargo run --bin czkawka_cli dup -m 1 --directories $(pwd)/hardlinks -f /dev/stderr > /dev/null
-------------------------------------------------Files with same hashes-------------------------------------------------
Found 1 duplicated files which in 1 groups which takes 2 B.

---- Size 2 B (2) - 2 files
/home/thomas/Development/czkawka/hardlinks/a
/home/thomas/Development/czkawka/hardlinks/b
```

Open:
- Windows support
- Probably this should be a cli option
@qarmin qarmin merged commit 1e94587 into qarmin:master Feb 20, 2021
@qarmin
Copy link
Owner

qarmin commented Feb 20, 2021

Thanks!

LJason77 pushed a commit to LJason77/czkawka that referenced this pull request Feb 20, 2021
This is a proof of concept.

```
$ echo a > hardlinks/a
$ cp hardlinks/{a,b}
$ ln hardlinks/{a,c}
$ cargo run --bin czkawka_cli dup -m 1 --directories $(pwd)/hardlinks -f /dev/stderr > /dev/null
-------------------------------------------------Files with same hashes-------------------------------------------------
Found 1 duplicated files which in 1 groups which takes 2 B.

---- Size 2 B (2) - 2 files
/home/thomas/Development/czkawka/hardlinks/a
/home/thomas/Development/czkawka/hardlinks/b
```

Open:
- Windows support
- Probably this should be a cli option
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants