Skip to content
This repository has been archived by the owner on Aug 13, 2024. It is now read-only.

Factotum Server User Guide

Josh edited this page Jun 26, 2018 · 5 revisions

User guide

Prerequisites

In order to operate Factotum Server, you need to have a local copy of Factotum and Consul running on the same server.

CLI Arguments

The only required arg is --factotum-bin which defines the path to the Factotum binary.

All other args are optional and have predefined defaults, as listed in the table below.

Arg Default Description
factotum-bin=<path> None - must be provided Path to Factotum binary file.
ip=<address> 0.0.0.0 Specify binding IP address.
port=<number> 3000 Specify port number.
log-level=<level> WARN Specify logging level.
max-jobs=<size> 1000 Max size of job requests queue.
max-workers=<size> 20 Max number of workers.
webhook=<url> None Factotum arg to post updates on job execution to the specified URL.
no-colour false Factotum arg to turn off ANSI terminal colours/formatting in output.
consul-name=<name> factotum Specify node name of Consul server agent.
consul-ip=<address> 127.0.0.1 Specify IP address for Consul server agent.
consul-port=<number> 8500 Specify port number for Consul server agent.
consul-namespace=<namespace> com.snowplowanalytics/factotum Specify namespace of job references stored in Consul persistence.
max-stdouterr-size=<bytes> 10000 The maximum size of the individual stdout/err sent via the webhook functions for job updates.

Server Modes

The server currently has two modes of operation: run and drain:

  • run mode: normal behaviour where job requests are validated, accepted, and queued for processing
  • drain mode: stops accepting any further job requests, but continues to work on anything that is queued

Job Requests

Jobs are scheduled by accepting job requests in a certain JSON format that specifies the factfile location and any additional arguments that would normally be sent to the Factotum CLI - here is a simple example:

{
    "jobName": "echotest",
    "factfilePath": "/factotum/samples/echo.factfile",
    "factfileArgs": [ "--tag", "foo,bar", "--no-colour" ]
}

An additional field is generated in the same way as the Factotum's job reference, which is a hash of the factfile contents and any tags appended to it. This id, unique to that particular job, is used to ensure that no duplicate jobs are queued or are running at the same time. Saving state through Consul makes this possible.

Consul Persistence

Consul is used as a persistence layer to track state of job requests that is compatible in a distributed system. Using the job ID, a job entry exists in the storage for each unique job that has been sent:

{
    "state": "DONE",
    "jobRequest": {
        "jobId": "6ef5cf55f2815a098695861ec5f8e9c7122dd2e5339cf954d7e2cb2e761dd583",
        "jobName": "echotest",
        "factfilePath":"/vagrant/sleep.factfile",
        "factfileArgs": ["--tag", "foo,bar", "--no-colour"]
    },
    "lastRunFrom": "factotum",
    "lastOutcome": "SUCCEEDED"
}

In addition to the original job request, the other fields associated with an entry are:

  • state: One of QUEUED, WORKING, or DONE
    • If an entry is QUEUED or WORKING, then any new requests for that particular job will not be accepted by the server
    • Only jobs that have completed (DONE) or have never run before are valid to be scheduled
  • lastRunFrom: The Consul server id that last updated the entry of the last job run
  • lastOutcome: Outcome of the last run WAITING, RUNNING, SUCCEEDED, or FAILED

Scheduling

A "worker queue" model is used for scheduling:

  • New requests are validated and appended to a queue of job requests (state == QUEUED)
  • The next available worker is notified to check the queue
  • The worker processes the job by executing Factotum in a separate thread (state == WORKING)
  • The worker persists outcome on completion (state == DONE)
  • The worker checks the queue again for any further requests

REST API

Overview

  • GET /status General statistics on server and queue
    • version
    • state
    • start time, uptime
    • workers
    • queue size
    • curl http://localhost:3000/status?pretty=1
  • GET /check Check status of a job request
    • retrieves job entry from persistence storage by id
    • curl http://localhost:3000/check?pretty=1&id=6ef5cf55f2815a098695861ec5f8e9c7122dd2e5339cf954d7e2cb2e761dd583
  • POST /settings Update server settings and behaviour
    • run: all normal behaviour
    • drain: refuse any more job requests
    • curl http://localhost:3000/settings -X POST -d '{"state":"run"}'
  • POST /submit Submit a job with a DAG to the queue
    • validates job request
    • performs dry run
    • checks job is not already running
    • adds request to queue
    • curl http://localhost:3000/submit -X POST -d '{"jobName":"echotest","factfilePath":"/tmp/echo.factfile","factfileArgs":["--tag","foo,bar","--no-colour"]}'
Clone this wiki locally