Skip to content

Commit

Permalink
Add change in import to usage section and list lib. features (apache#58)
Browse files Browse the repository at this point in the history
* Add change in import to usage section and list lib. features

Signed-off-by: wslulciuc <[email protected]>

* Remove whitespace

Signed-off-by: wslulciuc <[email protected]>

* List run-level metadat collected

Signed-off-by: wslulciuc <[email protected]>
  • Loading branch information
wslulciuc authored Sep 9, 2020
1 parent eb5390c commit 513f142
Showing 1 changed file with 39 additions and 1 deletion.
40 changes: 39 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,28 @@

A library that integrates [Airflow `DAGs`]() with [Marquez](https://github.com/MarquezProject/marquez) for automatic metadata collection.

## Features

**Metadata**

* Task lifecycle
* Task parameters
* Task runs linked to **versioned** code
* Task inputs / outputs

**Lineage**

* Track inter-DAG dependencies

**Built-in**

* SQL parser
* Link to code builder (ex: **GitHub**)
* Metadata extractors

## Status

This library is under active development at [Datakin](https://twitter.com/DatakinHQ).
This library is under active development with a rapidly evolving API and we'd love your help!

## Requirements

Expand All @@ -24,11 +43,14 @@ This library is under active development at [Datakin](https://twitter.com/Dataki
$ pip3 install marquez-airflow
```

> **Note:** You can also add `marquez-airflow` to your `requirements.txt` for Airflow.
To install from source, run:

```bash
$ python3 setup.py install
```

## Settings

### Pointing to your Marquez service
Expand Down Expand Up @@ -58,6 +80,22 @@ It's important to understand the inputs and outputs are lists and relate directl

## Usage

To begin collecting Airflow DAG metadata with Marquez, use:

```diff
- from airflow import DAG
+ from marquez_airflow import DAG
```

When enabled, the library will:

1. On DAG **start**, collect metadata for each task using an `Extractor` (the library defines a _default_ extractor to use otherwise)
2. Collect task input / output metadata (`source`, `schema`, etc)
3. Collect task run-level metadata (execution time, state, parameters, etc)
4. On DAG **complete**, also mark the task as _complete_ in Marquez

## Example

```python
from datetime import datetime
from marquez_airflow import DAG
Expand Down

0 comments on commit 513f142

Please sign in to comment.