Skip to content

Commit

Permalink
Major refactor to align with rprojroot and here
Browse files Browse the repository at this point in the history
  • Loading branch information
jamesmyatt committed Sep 14, 2020
1 parent 9fa6093 commit a315c57
Show file tree
Hide file tree
Showing 9 changed files with 295 additions and 107 deletions.
60 changes: 47 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# Find relative paths from a project root directory
# Project-oriented workflow in Python

Finding project directories in Python (data science) projects, just like there R [`here`][here] and [`rprojroot`][rprojroot] packages.
Finding project directories in Python (data science) projects.

This library aims to provide both
the programmatic functionality from the R [`rprojroot`][rprojroot] package
and the interactive functionality from the R [`here`][here] package.

## Motivation

**Problem**: I have a project that has a specific folder structure,
for example, one mentioned in [Noble 2009][noble2009] or something similar to [this project template][project-template],
Expand All @@ -11,60 +17,86 @@ and I want to be able to:
3. Reference datasets from a root directory when using a jupyter notebook because everytime I use a jupyter notebook,
the working directory changes to the location of the notebook, not where I launched the notebook server.

**Solution**: `pyprojroot` finds the root working directory for your project as a `pathlib` object.
**Solution**: `pyprojroot` finds the root working directory for your project as a `pathlib.Path` object.
You can now use the `here` function to pass in a relative path from the project root directory
(no matter what working directory you are in the project),
and you will get a full path to the specified file.
That is, in a jupyter notebook,
you can write something like `pandas.read_csv(here('./data/my_data.csv'))`
you can write something like `pandas.read_csv(here('data/my_data.csv'))`
instead of `pandas.read_csv('../data/my_data.csv')`.
This allows you to restructure the files in your project without having to worry about changing file paths.

Great for reading and writing datasets!

Further reading:

* [Project-oriented workflows](https://www.tidyverse.org/articles/2017/12/workflow-vs-script/)
* [Stop the working directory insanity](https://gist.github.com/jennybc/362f52446fe1ebc4c49f)
* [Ode to the here package](https://github.com/jennybc/here_here)

## Installation

### pip

```bash
pip install pyprojroot
python -m pip install pyprojroot
```

### conda

https://anaconda.org/conda-forge/pyprojroot

```bash
conda install -c conda-forge pyprojroot
conda install -c conda-forge pyprojroot
```

## Usage
## Example Usage

### Interactive

This is based on the R [`here`][here] library.

```python
from pyprojroot import here
from pyprojroot.here import here

here()
```

### Example
### Programmatic

This based on the R [`rprojroot`][rprojroot] library.

```python
import pyprojroot

base_path = pyprojroot.find_root(pyprojroot.has_dir(".git"))
```

## Demonstration

Load the packages

```
In [1]: from pyprojroot import here
In [1]: from pyprojroot.here import here
In [2]: import pandas as pd
```

The current working directory is the "notebooks" folder

```
In [3]: !pwd
/home/dchen/git/hub/scipy-2019-pandas/notebooks
```

In the notebooks folder, I have all my notebooks

```
In [4]: !ls
01-intro.ipynb 02-tidy.ipynb 03-apply.ipynb 04-plots.ipynb 05-model.ipynb Untitled.ipynb
```

If I wanted to access data in my notebooks I'd have to use `../data`

```
In [5]: !ls ../data
billboard.csv country_timeseries.csv gapminder.tsv pew.csv table1.csv table2.csv table3.csv table4a.csv table4b.csv weather.csv
Expand All @@ -73,8 +105,9 @@ billboard.csv country_timeseries.csv gapminder.tsv pew.csv table1.csv table
However, with there `here` function, I can access my data all from the project root.
This means if I move the notebook to another folder or subfolder I don't have to change the path to my data.
Only if I move the data to another folder would I need to change the path in my notebook (or script)

```
In [6]: pd.read_csv(here('./data/gapminder.tsv'), sep='\t').head()
In [6]: pd.read_csv(here('data/gapminder.tsv'), sep='\t').head()
Out[6]:
country continent year lifeExp pop gdpPercap
0 Afghanistan Asia 1952 28.801 8425333 779.445314
Expand All @@ -84,9 +117,10 @@ Out[6]:
4 Afghanistan Asia 1972 36.088 13079460 739.981106
```

By the way, you get a `pathlib` object path back!
By the way, you get a `pathlib.Path` object path back!

```
In [7]: here('./data/gapminder.tsv')
In [7]: here('data/gapminder.tsv')
Out[7]: PosixPath('/home/dchen/git/hub/scipy-2019-pandas/data/gapminder.tsv')
```

Expand Down
5 changes: 2 additions & 3 deletions pyprojroot/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
from .pyprojroot import *

__version__ = "0.2.0"
from .criterion import *
from .root import find_root, find_root_with_reason
81 changes: 81 additions & 0 deletions pyprojroot/criterion.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
"""
This module is inspired by the `rprojroot` library for R.
See https://github.com/r-lib/rprojroot.
It is intended for interactive or programmatic only.
"""

import pathlib as _pathlib
import typing
from os import PathLike as _PathLike

# TODO: It would be nice to have a class that encapsulates these checks,
# so that we can implement methods like |, !, &, ^ operators

# TODO: Refactor in a way that allows creation of reasons


def as_root_criterion(criterion) -> typing.Callable:
if callable(criterion):
return criterion

# criterion must be a Collection, rather than just Iterable
if isinstance(criterion, _PathLike):
criterion = [criterion]
criterion = list(criterion)

def f(path: _pathlib.Path) -> bool:
for c in criterion:
if isinstance(c, _PathLike):
if (path / c).exists():
return True
else:
if c(path):
return True
return False

return f


def has_file(file: _PathLike) -> typing.Callable:
"""
Check that specified file exists in path.
Note that a directory with that name will not match.
"""

def f(path: _pathlib.Path) -> bool:
return (path / file).is_file()

return f


def has_dir(file: _PathLike) -> typing.Callable:
"""
Check that specified directory exists.
Note that a regular file with that name will not match.
"""

def f(path: _pathlib.Path) -> bool:
return (path / file).is_dir()

return f


def matches_glob(pat: str) -> typing.Callable:
"""
Check that glob has at least one match.
"""

def f(path: _pathlib.Path) -> bool:
matches = path.glob(pat)
try:
# Only need to get one item from generator
next(matches)
except StopIteration:
return False
else:
return True

return f
55 changes: 55 additions & 0 deletions pyprojroot/here.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
"""
This module is inspired by the `here` library for R.
See https://github.com/r-lib/here.
It is intended for interactive use only.
"""

import pathlib as _pathlib
import warnings as _warnings
from os import PathLike as _PathLike

from . import criterion
from .root import find_root, find_root_with_reason

CRITERIA = [
criterion.has_file(".here"),
criterion.has_dir(".git"),
criterion.matches_glob("*.Rproj"),
criterion.has_file("requirements.txt"),
criterion.has_file("setup.py"),
criterion.has_dir(".dvc"),
criterion.has_dir(".spyproject"),
criterion.has_file("pyproject.toml"),
criterion.has_dir(".idea"),
criterion.has_dir(".vscode"),
]


def get_here():
# TODO: This should only find_root once per session
start = _pathlib.Path.cwd()
path, reason = find_root_with_reason(CRITERIA, start=start)
return path, reason


# TODO: Implement set_here


def here(relative_project_path: _PathLike = "", warn_missing=False) -> _pathlib.Path:
"""
Returns the path relative to the projects root directory.
:param relative_project_path: relative path from project root
:param project_files: list of files to track inside the project
:param warn_missing: warn user if path does not exist (default=False)
:return: pathlib path
"""
path, reason = get_here()
# TODO: Show reason when requested

if relative_project_path:
path = path / relative_project_path

if warn_missing and not path.exists():
_warnings.warn(f"Path doesn't exist: {path!s}")
return path
52 changes: 0 additions & 52 deletions pyprojroot/pyprojroot.py

This file was deleted.

66 changes: 66 additions & 0 deletions pyprojroot/root.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
"""
This module is inspired by the `rprojroot` library for R.
See https://github.com/r-lib/rprojroot.
It is intended for interactive or programmatic only.
"""

import pathlib as _pathlib
import typing as _typing
from os import PathLike as _PathLike

from .criterion import as_root_criterion as _as_root_criterion


def as_start_path(start: _PathLike) -> _pathlib.Path:
if start is None:
return _pathlib.Path.cwd()
if not isinstance(start, _pathlib.Path):
start = _pathlib.Path(start)
# TODO: consider `start = start.resolve()`
return start


def find_root_with_reason(
criterion, start: _PathLike = None
) -> _typing.Tuple[_pathlib.Path, str]:
"""
Find directory matching root criterion with reason.
Recursively search parents of start path for directory
matching root criterion with reason.
"""
# TODO: Implement reasons

# Prepare inputs
criterion = _as_root_criterion(criterion)
start = as_start_path(start)

# Check start
if start.is_dir() and criterion(start):
return start, "Pass"

# Iterate over all parents
# TODO: Consider adding maximum depth
# TODO: Consider limiting depth to path (e.g. "if p == stop: raise")
for p in start.parents:
if criterion(p):
return p, "Pass"

# Not found
raise RuntimeError("Project root not found.")


def find_root(criterion, start: _PathLike = None, **kwargs) -> _pathlib.Path:
"""
Find directory matching root criterion.
Recursively search parents of start path for directory
matching root criterion.
"""
try:
root, _ = find_root_with_reason(criterion, start=start, **kwargs)
except RuntimeError as ex:
raise ex
else:
return root
Loading

0 comments on commit a315c57

Please sign in to comment.