Skip to content

karlicoss/arctee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Helper script to run your data exports. It works kind of like *tee* command, but:

  • a: writes output atomically
  • r: supports retrying command
  • c: supports compressing output

You can read more on how it’s used here.

Motivation

Many things are very common to all data exports, regardless of the source. In the vast majority of cases, you want to fetch some data, save it in a file (e.g. JSON) along with a timestamp and potentially compress.

This script aims to minimize the common boilerplate:

  • path argument allows easy ISO8601 timestamping and guarantees atomic writing, so you’d never end up with corrupted exports.
  • --compression allows to compress simply by passing the extension. No more tar -zcvf!
  • --retries allows easy exponential backoff in case service you’re querying is flaky.

Example:

arctee '/exports/rtm/{utcnow}.ical.zstd' --compression zstd --retries 3 -- /soft/export/rememberthemilk.py
  1. runs /soft/export/rememberthemilk.py, retrying it up to three times if it fails

    The script is expected to dump its result in stdout; stderr is simply passed through.

  2. once the data is fetched it’s compressed as zstd
  3. timestamp is computed and compressed data is written to /exports/rtm/20200102T170015Z.ical.zstd

Do you really need a special script for that?

  • why not use date command for timestamps?

    passing $(date -Iseconds --utc).json as path works, however I need it for most of my exports; so it ends up polluting my crontabs.

Next, I want to do several things one after another here. That sounds like a perfect candidate for pipes, right? Sadly, there are serious caveats:

  • pipe errors don’t propagate. If one parts of your pipe fail, it doesn’t fail everything

    That’s a major problem that often leads to unexpected behaviours.

    In bash you can fix this by setting set -o pipefail. However:

    • default cron shell is /bin/sh. Ok, you can change it to SHELL=/bin/bash, but
    • you can’t set it to /bin/bash -o pipefail

      You’d have to prepend all of your pipes with set -o pipefail, which is quite boilerplaty

  • you can’t use pipes for retrying; you need some wrapper script anyway

    E.g. similar to how you need a wrapper script when you want to stop your program on timeout.

  • it’s possible to use pipes for atomically writing output to a file, however I haven’t found any existing tools to do that

    E.g. I want something like curl https://some.api/get-data | tee --atomic /path/to/data.sjon.

    If you know any existing tool please let me know!

  • it’s possible to pipe compression

    However due to the above concerns (timestamping/retrying/atomic writing), it has to be part of the script as well.

It feels that cron isn’t a suitable tool for my needs due to pipe handling and the need for retries, however I haven’t found a better alternative. If you think any of these things can be simplified, I’d be happy to know and remove them in favor of more standard solutions!

Installation

This can be installed with pip by running: pip3 install --user git+https://github.com/karlicoss/arctee

You can also manually install this by installing atomicwrites (pip3 install atomicwrites) and downloading and running arctee.py directly

Optional Dependencies

  • pip3 install --user backoff

    backoff is a library to simplify backoff and retrying. Only necessary if you want to use –retries–.

  • apt install atool

    atool is a tool to create archives in any format. Only necessary if you want to use compression.

Usage

usage: arctee [-h] [-r RETRIES] [-c COMPRESSION] path

Wrapper for automating boilerplate for reliable and regular data exports.

Example: arctee '/exports/rtm/{utcnow}.ical.zstd' --compression zstd --retries 3 -- /soft/export/rememberthemilk.py --user "[email protected]"

Arguments past '--' are the actuall command to run.

positional arguments:
  path                  Path with borg-style placeholders. Supported: {utcnow}, {hostname}, {platform}.

                        Example: '/exports/pocket/pocket_{utcnow}.json'

                        (see https://manpages.debian.org/testing/borgbackup/borg-placeholders.1.en.html)

optional arguments:
  -h, --help            show this help message and exit
  -r RETRIES, --retries RETRIES
                        Total number of tries, 1 (default) means only try once. Uses exponential backoff.
  -c COMPRESSION, --compression COMPRESSION
                        Set compression format.

                        See 'man apack' for list of supported formats. In addition, 'zstd' is also supported.