Helper script to run your data exports. It works kind of like *tee* command, but:
- a: writes output atomically
- r: supports retrying command
- c: supports compressing output
You can read more on how it’s used here.
Many things are very common to all data exports, regardless of the source. In the vast majority of cases, you want to fetch some data, save it in a file (e.g. JSON) along with a timestamp and potentially compress.
This script aims to minimize the common boilerplate:
path
argument allows easy ISO8601 timestamping and guarantees atomic writing, so you’d never end up with corrupted exports.--compression
allows to compress simply by passing the extension. No moretar -zcvf
!--retries
allows easy exponential backoff in case service you’re querying is flaky.
Example:
arctee '/exports/rtm/{utcnow}.ical.zstd' --compression zstd --retries 3 -- /soft/export/rememberthemilk.py
- runs
/soft/export/rememberthemilk.py
, retrying it up to three times if it failsThe script is expected to dump its result in stdout; stderr is simply passed through.
- once the data is fetched it’s compressed as
zstd
- timestamp is computed and compressed data is written to
/exports/rtm/20200102T170015Z.ical.zstd
- why not use
date
command for timestamps?passing
$(date -Iseconds --utc).json
aspath
works, however I need it for most of my exports; so it ends up polluting my crontabs.
Next, I want to do several things one after another here. That sounds like a perfect candidate for pipes, right? Sadly, there are serious caveats:
- pipe errors don’t propagate. If one parts of your pipe fail, it doesn’t fail everything
That’s a major problem that often leads to unexpected behaviours.
In bash you can fix this by setting
set -o pipefail
. However:- default cron shell is
/bin/sh
. Ok, you can change it toSHELL=/bin/bash
, but - you can’t set it to
/bin/bash -o pipefail
You’d have to prepend all of your pipes with
set -o pipefail
, which is quite boilerplaty
- default cron shell is
- you can’t use pipes for retrying; you need some wrapper script anyway
E.g. similar to how you need a wrapper script when you want to stop your program on timeout.
- it’s possible to use pipes for atomically writing output to a file, however I haven’t found any existing tools to do that
E.g. I want something like
curl https://some.api/get-data | tee --atomic /path/to/data.sjon
.If you know any existing tool please let me know!
- it’s possible to pipe compression
However due to the above concerns (timestamping/retrying/atomic writing), it has to be part of the script as well.
It feels that cron isn’t a suitable tool for my needs due to pipe handling and the need for retries, however I haven’t found a better alternative. If you think any of these things can be simplified, I’d be happy to know and remove them in favor of more standard solutions!
This can be installed with pip by running: pip3 install --user git+https://github.com/karlicoss/arctee
You can also manually install this by installing atomicwrites
(pip3 install atomicwrites
) and downloading and running arctee.py
directly
pip3 install --user backoff
backoff is a library to simplify backoff and retrying. Only necessary if you want to use –retries–.
apt install atool
atool is a tool to create archives in any format. Only necessary if you want to use compression.
usage: arctee [-h] [-r RETRIES] [-c COMPRESSION] path Wrapper for automating boilerplate for reliable and regular data exports. Example: arctee '/exports/rtm/{utcnow}.ical.zstd' --compression zstd --retries 3 -- /soft/export/rememberthemilk.py --user "[email protected]" Arguments past '--' are the actuall command to run. positional arguments: path Path with borg-style placeholders. Supported: {utcnow}, {hostname}, {platform}. Example: '/exports/pocket/pocket_{utcnow}.json' (see https://manpages.debian.org/testing/borgbackup/borg-placeholders.1.en.html) optional arguments: -h, --help show this help message and exit -r RETRIES, --retries RETRIES Total number of tries, 1 (default) means only try once. Uses exponential backoff. -c COMPRESSION, --compression COMPRESSION Set compression format. See 'man apack' for list of supported formats. In addition, 'zstd' is also supported.