archive.extracted downloads the file even if checksums are the same #55443
Labels
Feature
new functionality including changes to functionality and code refactors, etc.
ZRelease-Sodium
retired label
Milestone
Description of Issue
Right now
archive.extracted
looks for cached copy of the file on a minion and cache it(download file) if it's not present in the minion local cache to do an archive list and check the archive files against the destination directory. I think it's overkill in some cases.Example:
What is going to happen in the example above first time you run it:
http://path/to/archive.gz.hash
in local minion cache. There is none.http://path/to/archive.gz
into minion cache and will create ahttp://path/to/archive.gz.hash
file with checksum of the file.archive.list
module to compare the output of the archive contents with the destination directory and if there is a difference(there is in our case) it will extract it.keep_source
set toFalse
local cached filehttp://path/to/archive.gz
will be removed, yethttp://path/to/archive.gz.hash
won't be.So far so good. Now let's see what is going to happen if we run this state once again assuming nor
archive.gz
norarchive.gz.md5
changed:http://path/to/archive.gz.hash
in local minion cache. There is such file.http://path/to/archive.gz.md5
. They will match.http://path/to/archive.gz
into minion cache since there is no such file due tokeep_source: False
archive.list
module to compare the output of the archive contents with the destination directory and if there is a difference(there is none in our case).keep_source
set toFalse
local cached filehttp://path/to/archive.gz
will be removed, yethttp://path/to/archive.gz.hash
won't be.And this re-download of the
archive.gz
will be done over and over again. And now imagine it's 1000 nodes and archives are 1Gb+.What is the alternative?
keep_source: True
. It solves the issue by the cost of having file cache on all minions which will never be cleaned due to #34369 not implemented. And now imagine it's 1000 nodes and archives are 1Gb+.So right now it's a choice between re-downloading it every time OR keeping all archives in the cache without any ability to purge it efficiently.
My proposal is to add an optional argument like
trust_source_hash: BOOL
which will change the behavior to this one:source_hash
is provided - download it and check it against the local copy of.hash
file if it exists.This will significantly improve time and traffic efficiency for someone willing to commit to relying on
source_hash
.The text was updated successfully, but these errors were encountered: