Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard links are counted as duplicates #230

Closed
blob79 opened this issue Jan 25, 2021 · 3 comments
Closed

Hard links are counted as duplicates #230

blob79 opened this issue Jan 25, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@blob79
Copy link
Contributor

blob79 commented Jan 25, 2021

Hard linked files shouldn't be counted as duplicates as they do not take up space on disk.

$ echo a > a
$ ln a b
$ cp a c
$ stat a b c  | grep Inode
Device: 10302h/66306d	Inode: 25463856    Links: 2
Device: 10302h/66306d	Inode: 25463856    Links: 2
Device: 10302h/66306d	Inode: 25463858    Links: 1

$ cargo run dup  --directories ~/Development/czkawka/czkawka_cli -m1 -f /dev/stdout | grep -A99 "Files with"
-------------------------------------------------Files with same hashes-------------------------------------------------
Found 2 duplicated files which in 1 groups which takes 4 B.

---- Size 2 B (2) - 3 files
/home/thomas/Development/czkawka/czkawka_cli/a
/home/thomas/Development/czkawka/czkawka_cli/c
/home/thomas/Development/czkawka/czkawka_cli/b

$ rm c
$ cargo run dup  --directories ~/Development/czkawka/czkawka_cli -m1 -f /dev/stdout | grep -A99 "Files with"
-------------------------------------------------Files with same hashes-------------------------------------------------
Found 1 duplicated files which in 1 groups which takes 2 B.

---- Size 2 B (2) - 2 files
/home/thomas/Development/czkawka/czkawka_cli/a
/home/thomas/Development/czkawka/czkawka_cli/b

It also looks like the duplicate disk space counting goes wrong in this case it should be 2. If you double count the hard link it should 3 not 4.

@qarmin qarmin added the enhancement New feature or request label Jan 26, 2021
@TANDEXX
Copy link

TANDEXX commented Feb 10, 2021

Command ls or any file manager see hard links as normal files. Program can't check do it is hard link, for program it's regular file and this program can't check this too.

@Sbgodin
Copy link
Contributor

Sbgodin commented Feb 10, 2021

@TANDEXX, not exactly. ls command knows and shows when a file was hardlinked – indeed there are several ways to reach the file – by counting the reference number:

$ echo "hello" > file_1

$ ls -l file_1
-rw-r--r-- 1 user group 6 11 févr. 00:03 file_1

$ ln file_1 file_2
$ ls -l file_1 file 2
-rw-r--r-- 2 user group 6 11 févr. 00:03 file_1
-rw-r--r-- 2 user group 6 11 févr. 00:03 file_2

The two files have now a reference count of 2.

find . -samefile file_1
./file_2
./file_1

The find command knows it too.

@blob79
Copy link
Contributor Author

blob79 commented Feb 11, 2021

You can look at the inode with:

❯ ls -i
13644513 a
13644513 b

ls man page:
-i, --inode
print the index number of each file

#234 implements the fix.

@qarmin qarmin closed this as completed Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants