Skip to content

Windows 32/64 binary, 64 bit-HW accelerated

Compare
Choose a tag to compare
@fcorbelli fcorbelli released this 12 Jul 12:23
· 221 commits to main since this release
4b612bc

Fixed a small but nasty bug in t for big files

Example in this thread
Added during refactoring sorting until 10-chars long instead of 40. It doesn't actually invalidate anything, but still it is unpleasant

Automagically add files

Every promise is a debt
zpaqfranz a z:\58_5 -key pippo => if ./58_5 file|folder does exists, automagically add to the archive

New fasttxt switch / format

zpaqfranz now can automagically calculate the CRC-32 of the archive (without, of course, re-reading from filesystem), writing down in archivename_crc32.txt file

C:\zpaqfranz>zpaqfranz a z:\1.zpaq *.cpp -fasttxt
zpaqfranz v58.5o-JIT-GUI-L,HW SHA1/2,SFX64 v55.1,(2023-07-12)
franz:-fasttxt -hw
Creating z:/1.zpaq at offset 0 + 0                                  )
Add 2023-07-12 14:00:13        27         89.286.021 (  85.15 MB) 32T (0 dirs)
27 +added, 0 -removed.

0 + (89.286.021 -> 16.670.812 -> 2.069.455) = 2.069.455 @ 57.38 MB/s
62655: CRC-32 EXPECTED E948770C
62682: Updating fasttxt z:/1_crc32.txt :OK

1.500 seconds (000:00:01) (all OK)

Getting something like that

C:\zpaqfranz>type z:\1_crc32.txt
$zpaqfranz fasttxt|1|2023-07-12 14:00:14|z:/1.zpaq
E948770C 8293084830611972 0 [2.069.455] (0)

In this example the first data (E948770C) is the (expected) CRC-32 of the archive.
The second 8293084830611972 , is the getted "quick" hash, the third (0) in this case the initial CRC-32, then filesizes
"Quick hash" is the heuristic hash introduced some release earlier

Using the versum command, with -fasttxt, it is possible to check very quickly

C:\zpaqfranz>zpaqfranz versum z:\1.zpaq -fasttxt
zpaqfranz v58.5o-JIT-GUI-L,HW SHA1/2,SFX64 v55.1,(2023-07-12)
franz:versum                                    | - command
franz:-fasttxt -hw
66764: Test CRC-32 of .zpaq against _crc32.txt
87163: Bytes to be checked 2.069.455 (1.97 MB) in files 1

66323: OK CRC-32: z:/1.zpaq
====================================================================
66356: TOTAL          1
66357: OK             1
66358: WARN           0
66359: ERROR          0

0.016 seconds (00:00:00) (all OK)

with -quick in (almost) no time

C:\zpaqfranz>zpaqfranz versum z:\1.zpaq -fasttxt -quick
zpaqfranz v58.5o-JIT-GUI-L,HW SHA1/2,SFX64 v55.1,(2023-07-12)
franz:versum                                    | - command
franz:-quick -fasttxt -hw
66764: Test QUICK of .zpaq against _crc32.txt
87163: Bytes to be checked 2.069.455 (1.97 MB) in files 1

66323: OK QUICK: z:/1.zpaq
====================================================================
66356: TOTAL          1
66357: OK             1
66358: WARN           0
66359: ERROR          0

0.031 seconds (00:00:00) (all OK)

You can run even with .zpaq (on Linux ".zpaq")

zpaqfranz versum *.zpaq -fasttxt

Why this "thing", most like -checktxt ?

Because the CRC-32 calculation is performed during the writing phase to the disk, so it has minimal impact in terms of time and CPU, and is ONLY performed on the added part

Let's take a concrete example, otherwise it is difficult to understand the incredible usefulness (in certain scenarios, of course)

Suppose you make a backup with a certain tool (e.g. 7z, rar, tar) of a certain folder.
Suppose the archive is 500GB in size and resides (as normal) on a slow device, e.g. a NAS with magnetic disks, used by many others

Suppose you want to transfer it to another device (as normal), e.g. with rsync.
This will require reading all 500GB (locally, maybe painfully slow), calculating the relevant checksums (for rsync they are basically md5, high CPU usage), remotely sending all 500GB (=saturating all bandwidth), remotely calculating 500GB (=high I/O and CPU) of md5 hashes, and comparing them.

Now you are paranoid: your archive is full of precious data, therefore you launch a local CRC-32 (for the .7z, rar, tar...) AND a remote CRC-32, just to be sure

So far, so good, zpaqfranz pay the same "cost" (for the FIRST run)

The backups, however, are typically always repeated, say daily (even more often, say at night as a typical case)

On the 2nd run, with tar, 7z, rar etc, you will be in the exact situation
Suppose the new archive is 501GB (in the source folder 2GB changed)
Creating (aka: writing) a 501GB giant file, read everything back, calculate md5, calculate (remotely by rsync) 500GB and and and... hours in local, hours in remote, a LOT of I/O local, a LOT of CPUs

  1. local: Read 2GB
  2. local: Write 1GB
  3. local: MD5 of 501GB
  4. local: Send ~1GB
  5. remote: MD5 of 500GB
  6. remote: Write of 1GB

With zpaqfranz 58.4 and checktxt...

  1. local: Read 2GB
  2. local: Write 1GB
  3. local: MD5 of 501GB
  4. local: Send 1GB
  5. remote: Write of 1GB
  6. remote: MD5 of 501GB

With zpaqfranz 58.5 and fasttxt...

  1. local: Read 2GB
  2. local: Write 1GB
  3. local: Send 1GB
  4. remote: Write of 1GB
  5. remote: CRC-32 of 501GB

In future release the 5) step will become "CRC-32 of 1GB"

Real-world Windows example

Therefore here a little (!) Windows batch file

Suppose you want to backup to a remote server (a Linux box) "something", some Windows' data, using a local encryption password

Since you are lazy, you want not only the local copy to be verified, but also the remote one CRC-32 compared with the local, and you want a different e-mail depending on the verification (in case of error or not) BUT DO NOT WANT TO SEND THE PASSWORD TO THE REMOTE SERVER

Since you use an FTTH connection you really want to send the minimum amount of information changed, and you do NOT want to run rsync on huge files (hundreds of GB) that can take hours

We have a key-based authentication (for ssh, then rsync-over-ssh)

First step: make the archive, in this example into k:\franco\test\zpaqfranz_pippo.zpaq
Of the two folders c:\zpaqfranz c:\stor
with password (key) pippo
support for longer than 255 files (-longpath)
using CRC-32 for late cloud test (-fasttxt)
no ETA (this is a batch file afterall, who cares, -noeta)
and we want a BIG confirmation (-big) easier to spot on e-mails

@echo off

date /t  >c:\stor\result.txt
time /t >>c:\stor\result.txt

c:\stor\bin\zpaqfranz a k:\franco\test\zpaqfranz_pippo.zpaq c:\zpaqfranz c:\stor -longpath -key pippo -fasttxt -noeta -big >>c:\stor\result.txt

Now we want to list all the versions, just to make sure the update is done (few things are worse than a backup update that does not update anything)

c:\stor\bin\zpaqfranz i k:\franco\test\zpaqfranz_pippo.zpaq -key pippo -noeta                               >>c:\stor\result.txt

Now we want to (locally) test the archive.
Please note: locally. The password "pippo" is NOT sent over internet

c:\stor\bin\zpaqfranz t k:\franco\test\zpaqfranz_pippo.zpaq -key pippo -noeta -big                          >>c:\stor\result.txt

OK, we make the same thing, for a second archive file (just an example) k:\franco\test\nz_pippo.zpaq

c:\stor\bin\zpaqfranz a k:\franco\test\nz_pippo.zpaq c:\nz -longpath -key pippo  -fasttxt -big >>c:\stor\result.txt
c:\stor\bin\zpaqfranz i k:\franco\test\nz_pippo.zpaq -key pippo -noeta                         >>c:\stor\result.txt
c:\stor\bin\zpaqfranz t k:\franco\test\nz_pippo.zpaq -key pippo -noeta -big                    >>c:\stor\result.txt

Now we upload everything with --append
Only the data changed from the last run will be sended over rsync (on ssh) to the remote Linux box
This will usually takes minute

c:\stor\bin\rsync -e "c:\stor\bin\ssh.exe -p 22 -i c:\stor\bin\thekey"  -I -r --append --partial --progress              --chmod=a=rwx,Da+x /k/franco/test/ [email protected]:/home/theuser/copie/test/ >>c:\stor\result.txt

Now we enforce the upload of the *.txt files (forcing to "refresh" the *_crc32.txt) with --checksum

c:\stor\bin\rsync -e "c:\stor\bin\ssh.exe -p 22 -i c:\stor\bin\thekey"  -I -r --include="*.txt" --exclude="*" --checksum --chmod=a=rwx,Da+x /k/franco/test/ [email protected]:/home/theuser/copie/test/ >>c:\stor\result.txt

Now we get the size (of the /home/theuser) folder, and the space free, with the command s
BEWARE you may need something like /usr/local/bin/zpaqfranz, it depend on PATH

c:\stor\bin\ssh -p22 -i c:\stor\bin\thekey [email protected] zpaqfranz s /home/theuser          >>c:\stor\result.txt

Run some other remote command, for example ls all things (zpool status, df -h, whatever, just an example)

echo --------- >>c:\stor\result.txt
c:\stor\bin\ssh -p22 -i c:\stor\bin\thekey [email protected] ls -l '/home/theuser/copie/test/*' >>c:\stor\result.txt

And now remotely test (by CRC-32) the uploaded *.zpaq, with the _crc32.txt, NO PASSWORD sent

c:\stor\bin\ssh -p22 -i c:\stor\bin\thekey [email protected] zpaqfranz versum '/home/theuser/copie/test/*.zpaq' -fasttxt -noeta -big >>c:\stor\result.txt

Now well'do a very dirty trick, counting the OK in the output log, with grep
In this example should be 5
Beware: you need the very latest zpaqfranz here (58.5m+)
We make two of them, one for the body, one for the attachment of the email

echo ==================================== >>c:\stor\result.txt
echo ============ COUNT OK    =========== >>c:\stor\result.txt
echo ==================================== >>c:\stor\result.txt
echo 5 >c:\stor\countok.txt
echo 5 >c:\stor\countbody.txt
c:\stor\bin\egrep "#     # ###!" c:\stor\result.txt -c >>c:\stor\countok.txt
c:\stor\bin\egrep "#     # ###!" c:\stor\result.txt -c >>c:\stor\countbody.txt
c:\stor\bin\zpaqfranz last2 c:\stor\countok.txt -big >>c:\stor\result.txt

Pack the report with 7z (reports can become very BIG in case of errors)

date /t >>c:\stor\result.txt
time /t >>c:\stor\result.txt

del c:\stor\report.7z
c:\stor\bin\7z a c:\stor\report.7z c:\stor\result.txt

Now make another results (for email body)

echo ==================================== >c:\stor\body.txt
echo ========== COUNT OK BODY =========== >>c:\stor\body.txt
echo ==================================== >>c:\stor\body.txt
c:\stor\bin\zpaqfranz last2 c:\stor\countbody.txt -big >>c:\stor\body.txt

Finally send two different e-mail (usually you will change even the -to to your primary email in case of error)

if not errorlevel 1 goto va
if errorlevel 1 goto nonva

:nonva
c:\stor\bin\mailsend -t [email protected] -cc [email protected] -f [email protected] -starttls -port 587 -auth -smtp smtp.mymail.com -sub "***ERROR *** Backup (theuser)" -user [email protected] -pass mygoodpassword -mime-type "application/x-7z-compressed" -enc-type "base64" -aname "report.7z" -attach "c:\stor\report.7z" -mime-type "text/plain" -disposition "inline"    -attach "c:\stor\body.txt"
goto fine

:va
c:\stor\bin\mailsend -t [email protected] -cc [email protected] -f [email protected] -starttls -port 587 -auth -smtp smtp.mymail.com -sub "Backup (theuser)" -user [email protected] -pass mygoodpassword -mime-type "application/x-7z-compressed" -enc-type "base64" -aname "report.7z" -attach "c:\stor\report.7z" -mime-type "text/plain" -disposition "inline"    -attach "c:\stor\body.txt"
:fine

On *nix it is not possible to do a synchronous t (test) on ssh, it depends on the shell creation (it is long to explain, I would say that is enough for now).
On Windows, however, you can

Short version (!)

You can get a compliance check of a local and a remote file, through CRC-32, by "paying" only the cost of CRC-32 calculation on the remote computer. The remote CRC-32 calculation can also be done, for example, via a crontab for multiple archives by using wildcards ("*.zpaq")
By using a switch -quick you can make heuristic checks (i.e., on the start, middle, and end of files), so you can be fairly sure against the switch --append mismatch of rsync, in a few milliseconds (if you don't want to have the entire MD5 or CRC-32 of the remote file re-calculated. Backup files can be hundreds of gigabytes in size)
If you are paranoid instead, you can use -checktxt, which implies (default) the use of MD5, or (optional) XXH3.
This, however, can get "expensive" for very large backups

In future, of course, this will be a become a zpaqfranz-over-TCP

Download zpaqfranz