Skip to content

Commit

Permalink
Merge pull request #3 from AutoResearch/carlosg/struct
Browse files Browse the repository at this point in the history
refactor: Restructure the repo to accomodate both training and inference
  • Loading branch information
carlosgjs authored Dec 4, 2023
2 parents 32d23e9 + 2b5102b commit 600b53f
Show file tree
Hide file tree
Showing 25 changed files with 328 additions and 89 deletions.
7 changes: 7 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[flake8]
max-line-length = 120
output-file = flake8_log.txt
tee = True
extend-ignore= E203, BLK100
#BLK100 designates where Black would make changes
#E203 Whitespace rule relating to ':' that directly conflicts with black standards
21 changes: 21 additions & 0 deletions .github/actions/deps/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: Setup
description: Setup Dependencies for testing
inputs:
python-version:
required: true
description: Python version to install
runs:
using: composite
steps:
- name: Setup Python
uses: actions/[email protected]
with:
python-version: ${{inputs.python-version}}
- name: Setup pip
shell: sh
run: |
python3 -m ensurepip
python3 -m pip install --upgrade pip
- name: Install project
shell: sh
run: pip install ".[dev,train]"
10 changes: 2 additions & 8 deletions .github/workflows/build-documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,10 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up Python 3.10
uses: actions/setup-python@v4
- name: Setup Dependencies
uses: ./.github/actions/deps
with:
python-version: '3.10'
- name: Install dependencies
run: |
sudo apt-get update
python -m pip install --upgrade pip
if [ -f docs/requirements.txt ]; then pip install -r docs/requirements.txt; fi
pip install .
- name: Install notebook requirements
run: |
sudo apt-get install pandoc
Expand Down
11 changes: 2 additions & 9 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,10 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
- name: Setup Dependencies
uses: ./.github/actions/deps
with:
python-version: '3.10'
- name: Install dependencies
run: |
sudo apt-get update
python -m pip install --upgrade pip
pip install .
pip install .[dev]
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Analyze code with linter
uses: psf/black@stable
with:
Expand Down
11 changes: 2 additions & 9 deletions .github/workflows/pre-commit-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,10 @@ jobs:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v4
- name: Setup Dependencies
uses: ./.github/actions/deps
with:
python-version: '3.10'
- name: Install dependencies
run: |
sudo apt-get update
python -m pip install --upgrade pip
pip install .
pip install .[dev]
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- uses: pre-commit/[email protected]
with:
extra_args: --from-ref ${{ github.event.pull_request.base.sha }} --to-ref ${{ github.event.pull_request.head.sha }}
Expand Down
11 changes: 2 additions & 9 deletions .github/workflows/smoke-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,10 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
- name: Setup Dependencies
uses: ./.github/actions/deps
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
sudo apt-get update
python -m pip install --upgrade pip
pip install .
pip install .[dev]
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: List dependencies
run: |
pip list
Expand Down
13 changes: 3 additions & 10 deletions .github/workflows/testing-and-coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,12 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
- name: Setup Dependencies
uses: ./.github/actions/deps
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
sudo apt-get update
python -m pip install --upgrade pip
pip install .
pip install .[dev]
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Run unit tests with pytest
run: |
python -m pytest tests --cov=autora_doc --cov-report=xml
python -m pytest tests/* --cov=autora --cov-report=xml
- name: Upload coverage report to codecov
uses: codecov/codecov-action@v3
11 changes: 2 additions & 9 deletions .github/workflows/type-checking.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,10 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
- name: Setup Dependencies
uses: ./.github/actions/deps
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
sudo apt-get update
python -m pip install --upgrade pip
pip install .
pip install .[dev]
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Analyze code with mypy

run: |
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -145,3 +145,6 @@ tmp/
# Airspeed Velocity performance results
_results/
_html/

# mlflow output
mlruns/
10 changes: 10 additions & 0 deletions .mypy.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[mypy]
strict = True
mypy_path = src/.
explicit_package_bases = True

[mypy-transformers.*]
ignore_missing_imports = True

[mypy-mlflow.*]
ignore_missing_imports = True
6 changes: 0 additions & 6 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,6 @@ repos:
language: system
types: [python]
files: ^(src|tests)/
args:
[

"--strict", # Use mypy strict mode to enforce type hints

]


# Run unit tests, verify that they pass. Note that coverage is run against
Expand Down
58 changes: 47 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,25 @@
# autora-doc
# AutoDoc

[![Template](https://img.shields.io/badge/Template-LINCC%20Frameworks%20Python%20Project%20Template-brightgreen)](https://lincc-ppt.readthedocs.io/en/latest/)

[![PyPI](https://img.shields.io/pypi/v/autora-doc?color=blue&logo=pypi&logoColor=white)](https://pypi.org/project/autora-doc/)
[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/AutoResearch/autora-doc/smoke-test.yml)](https://github.com/AutoResearch/autora-doc/actions/workflows/smoke-test.yml)
[![codecov](https://codecov.io/gh/AutoResearch/autora-doc/branch/main/graph/badge.svg)](https://codecov.io/gh/AutoResearch/autora-doc)
[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/AutoResearch/autodoc/smoke-test.yml)](https://github.com/AutoResearch/autodoc/actions/workflows/smoke-test.yml)
[![codecov](https://codecov.io/gh/AutoResearch/autodoc/branch/main/graph/badge.svg)](https://codecov.io/gh/AutoResearch/autodoc)
[![Read the Docs](https://img.shields.io/readthedocs/autora-doc)](https://autora-doc.readthedocs.io/)

This project was automatically generated using the LINCC-Frameworks
[python-project-template](https://github.com/lincc-frameworks/python-project-template).

A repository badge was added to show that this project uses the python-project-template, however it's up to
you whether or not you'd like to display it!

For more information about the project template see the
[python-project-template](https://github.com/lincc-frameworks/python-project-template). For more information about the project template see the
[documentation](https://lincc-ppt.readthedocs.io/en/latest/).

## Dev Guide - Getting Started

Before installing any dependencies or writing code, it's a great idea to create a
virtual environment. LINCC-Frameworks engineers primarily use `conda` to manage virtual
virtual environment. We recommend using `conda` to manage virtual
environments. If you have conda installed locally, you can run the following to
create and activate a new environment.

```
>> conda create env -n <env_name> python=3.10
>> conda create env -n <env_name> python=3.8
>> conda activate <env_name>
```

Expand All @@ -47,3 +42,44 @@ Notes:
into documentation for ReadTheDocs works as expected. For more information, see
the Python Project Template documentation on
[Sphinx and Python Notebooks](https://lincc-ppt.readthedocs.io/en/latest/practices/sphinx.html#python-notebooks)


## Running AzureML pipelines

This repo contains the evaluation and training pipelines for AutoDoc.

### Prerequisites

[Install Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli)

Add the ML extension:
```
az extension add --name ml
```

Configure the CLI:

```
az login
az account set --subscription "<your subscription name>"
az configure --defaults workspace=<aml workspace> group=<resource group> location=<location, e.g. westus3>
```

### Uploading data

Example:
```sh
az storage blob upload --account-name <account> --container <container>> --file data/data.jsonl -n data/sweetpea/data.jsonl
```

### Running jobs

Prediction
```sh
az ml job create -f azureml/predict.yml --set display_name="Test prediction job" --web
```

Notes:
- `--name` will set the mlflow run id
- `--display_name` becomes the name in the experiment dashboard
- `--web` argument will pop-up a browser window for tracking the job.
18 changes: 18 additions & 0 deletions azureml/conda.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
channels:
- defaults
dependencies:
- python=3.8
- pip
- pip:
- mlflow
- azureml-mlflow
- azureml-core
- typer
- jsonlines
- accelerate>=0.24.1
- bitsandbytes>=0.41.2.post2
- transformers>=4.35.2
- xformers
- scipy
# This works, while installing from pytorch and cuda from conda does not
- torch==2.0.1
25 changes: 25 additions & 0 deletions azureml/predict.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: python -m autora.doc.pipelines.main predict ${{inputs.data_dir}}/data.jsonl ${{inputs.model_dir}}/llama-2-7b-chat-hf
code: ../src
inputs:
data_dir:
type: uri_folder
path: azureml://datastores/workspaceblobstore/paths/data/sweetpea/
model_dir:
type: uri_folder
path: azureml://datastores/workspaceblobstore/paths/base_models
# using a curated environment doesn't work because we need additional packages
environment: # azureml://registries/azureml/environments/acpt-pytorch-2.0-cuda11.7/versions/21
image: mcr.microsoft.com/azureml/curated/acpt-pytorch-2.0-cuda11.7:21
# These didn't work
# image: mcr.microsoft.com/aifx/acpt/stable-ubuntu2004-cu117-py38-torch201:biweekly.202310.3
# image: mcr.microsoft.com/azureml/curated/acpt-pytorch-1.13-cuda11.7:latest
# image: mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.0.3-cudnn8-ubuntu18.04
# image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
# image: mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.6-cudnn8-ubuntu20.04
# image: nvcr.io/nvidia/pytorch:23.10-py3
conda_file: conda.yml
display_name: autodoc_prediction
compute: azureml:v100cluster
experiment_name: autodoc_prediction
description: |
3 changes: 3 additions & 0 deletions data/data.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{"instruction": "from sweetpea import *\n\ncolor = Factor('color', ['red', 'green', 'blue', 'yellow'])\nword = Factor('word', ['red', 'green', 'blue', 'yellow'])\n\ndef is_congruent(word, color):\n return (word == color)\n\ndef is_not_congruent(word, color):\n return not is_congruent(word, color)\n\ncongruent = DerivedLevel('congruent', WithinTrial(is_congruent, [word, color]))\nincongruent = DerivedLevel('incongruent', WithinTrial(is_not_congruent, [word, color]))\n\ncongruency = Factor('congruency', [congruent, incongruent])\n\nconstraints = [MinimumTrials(48)]\ndesign = [word, color, congruency]\ncrossing = [word, congruency]\n\nblock = CrossBlock(design, crossing, constraints)\n\nexperiment = synthesize_trials(block, 1)\n\nsave_experiments_csv(block, experiment, 'code_1_sequences/seq')", "output": "There are two regular factors: color and word. The color factor consists of four levels: \"red\", \"green\", \"blue\", and \"yellow\". The word factor also consists of the four levels: \"red\", \"green\", \"blue\", and \"yellow\". There is another derived factor referred to as congruency. The congruency factor depends on the regular factors word and color and has two levels: \"congruent\" and \"incongruent\". A trial is considered \"congruent\" if the word matches the color, otherwise, it is considered \"incongruent\". We counterbalanced the word factor with the congruency factor. All experiment sequences contained at least 48 trials."}
{"instruction": "from sweetpea import *\nfrom sweetpea.primitives import *\nfrom sweetpea.constraints import *\n\nletter = Factor('letter', ['b', 'c', 'd', 'e'])\n\ndef is_target(letter):\n return letter[0] == letter[2]\ndef is_not_target(letter):\n return not is_target(letter)\n\ntarget_true = DerivedLevel(1, window(is_target, [letter], 3, 1), 2)\ntarget_false = DerivedLevel(0, window(is_not_target, [letter], 3, 1), 3)\n\ntarget = Factor('target', [target_true, target_false])\n\n\nblock = CrossBlock([letter, target], [letter, target], [])\n\nexperiment = synthesize_trials(block, 1)\n\nsave_experiments_csv(block, experiment, 'code_1_sequences/seq')", "output": "There is one regular factor: letter. The letter factor consists of the four letters: \"b\", \"c\", \"d\", and \"e\". There is another derived factor referred to as target. The target factor has two levels: 1 and 0. It depends on a moving trial window (for the last three trials) defined by the letter factor. If the letter on the current trial matches the letter two trials back, then the trial is 1. Conversely, if the letter on the current trial does not match the letter two trials back, then the trial is 0. The ratio between the 1 level and the 0 level was 2 to 3. We counterbalanced the letter factor with the target factor."}
{"instruction": "from sweetpea import *\nfrom sweetpea.primitives import *\n\nnumber_list = [125, 132, 139, 146, 160, 167, 174, 181]\nletter_list = ['b', 'd', 'f', 'h', 's', 'u', 'w', 'y']\n\nnumber = Factor(\"number\", number_list)\nletter = Factor(\"letter\", letter_list)\ntask = Factor(\"task\", [\"number task\", \"letter task\", \"free choice task\"])\n\n\ndef is_forced_trial_switch(task):\n return (task[-1] == \"number task\" and task[0] == \"letter task\") or \\\n (task[-1] == \"letter task\" and task[0] == \"number task\")\n\n\ndef is_forced_trial_repeat(task):\n return (task[-1] == \"number task\" and task[0] == \"number task\") or \\\n (task[-1] == \"letter task\" and task[0] == \"letter task\")\n\n\ndef is_free_trial_transition(task):\n return task[-1] != \"free choice task\" and task[0] == \"free choice task\"\n\n\ndef is_free_trial_repeat(task):\n return task[-1] == \"free choice task\" and task[0] == \"free choice task\"\n\n\ndef is_not_relevant_transition(task):\n return not (is_forced_trial_repeat(task) or is_forced_trial_switch(task) or is_free_trial_repeat(\n task) or is_free_trial_transition(task))\n\n\ntransit = Factor(\"task transition\", [\n DerivedLevel(\"forced switch\", transition(is_forced_trial_switch, [task]), 3),\n DerivedLevel(\"forced repeat\", transition(is_forced_trial_repeat, [task])),\n DerivedLevel(\"free transition\", transition(is_free_trial_transition, [task]), 4),\n DerivedLevel(\"free repeat\", transition(is_free_trial_repeat, [task]), 4),\n DerivedLevel(\"forced first\", transition(is_not_relevant_transition, [task]), 4)\n])\ndesign = [letter, number, task, transit]\ncrossing = [[letter], [number], [transit]]\nconstraints = [MinimumTrials(256)]\n\nblock = MultiCrossBlock(design, crossing, constraints)\n\nexperiment = synthesize_trials(block, 1)\n\nsave_experiments_csv(block, experiment, 'code_1_sequences/seq')\n", "output": "There are three regular factors: number, letter, and task. The number factor consists of eight levels: 125, 132, 139, 146, 160, 167, 174, and 181. The letter factor consists of eight levels: \"b\", \"d\", \"f\", \"h\", \"s\", \"u\", \"w\", and \"y\". The task factor consists of three levels: \"number task\", \"letter task\", and \"free choice task\". There is another derived factor referred to as task transition. The task transition factor depends on the transition of the task factor and has five levels: \"forced switch\", \"forced repeat\", \"free transition\", \"free repeat\", and \"forced first\". If the task on the current trial is \"number task\" and the task on the previous trial is \"letter task\" or if the task on the current trial is \"letter task\" and the task on the previous trial is \"number task\", then the trial is considered a \"forced switch\". If the task on the current trial is \"number task\" and the task on the previous trial is \"number task\" or if the task on the current trial is \"letter task\" and the task on the previous trial is \"letter task\", then the trial is considered a \"forced repeat\". If the task on the current trial is \"free choice task\" and the task on the previous trial is not \"free choice task\", then the trial is considered a \"free transition\". If the task on the current trial is \"free choice task\" and the task on the previous trial is \"free choice task\", then the trial is considered a \"free repeat\". If the task on the current trial is not \"free choice task\" and the task on the previous trial is \"free choice task\", then the trial is considered a \"forced first\". The ratio between the \"forced switch\" level and the \"forced repeat\" level was 3 to 1. The ratio between the \"free transition\" level and the \"free repeat\" level was 4 to 4. The ratio between the \"forced first\" level and the \"free repeat\" level was 4 to 4. The factors letter, number and task transition were counterbalanced individually. All experiment sequences contained at least 256 trials."}
Loading

0 comments on commit 600b53f

Please sign in to comment.