Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruff #175

Merged
merged 7 commits into from
Feb 7, 2024
Merged

Ruff #175

Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,4 @@
### Checklist
- [ ] Consider if documentation (like in `docs/`) needs to be updated
- [ ] Consider if tests should be added
- [ ] Run pylint if you're making changes beyond adding studies
- [ ] Update template repo if there are changes to study configuration
4 changes: 2 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,10 @@ jobs:
- name: Run sqlfluff on jinja templates
run: |
sqlfluff lint
- name: Run black
- name: Run ruff
if: success() || failure() # still run black if above checks fails
run: |
black --check --verbose .
ruff
mikix marked this conversation as resolved.
Show resolved Hide resolved
regression:
runs-on: ubuntu-22.04
permissions:
Expand Down
15 changes: 5 additions & 10 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,10 @@
repos:
- repo: https://github.com/psf/black
#this version is synced with the black mentioned in .github/workflows/ci.yml
rev: 23.10.0
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.2.1
hooks:
- id: black
entry: bash -c 'black "$@"; git add -u' --
# It is recommended to specify the latest version of Python
# supported by your project here, or alternatively use
# pre-commit's default_language_version, see
# https://pre-commit.com/#top_level-default_language_version
language_version: python3.9
- id: ruff
args: [--fix]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah here's the philosophical rub. We've talked about how appropriate it is to automatically fix bad code for the developer in a way that they aren't brought into the loop so they can learn to be better.

This does bring them into the loop a little bit, by not auto-committing the results at least. So I'd like to keep it that way, if/when we make the auto-formatting auto-committed.

I'd personally prefer to keep such checks out of the commit hooks altogether because of my WIP-commit-heavy workflow. I assume your workflow is different.

Anyway, we don't need to agree on this kind of thing. Just raising the point again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, per this and our chat off PR about this last night, I did some digging and found that pre-commit supports hooks at basically every lifecycle stage, so i've moved this check to pre-push. It didn't look like it is installed normally, so you have to specify the stage with pre-commit install --hook-type pre-push, but it then does the linting only on push to remote.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... OK - we have the pre-commit instructions in our CONTRIBUTING.md docs usually (I think that's what we named them). - maybe those need updating.

It's annoying that pre-commit isn't more turnkey, but fine 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yep, meant to do that and spaced out - bumped the docs.

- id: ruff-format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, formatting I find it awkward to push in front of the dev - they didn't do anything wrong really. So this I'd like to see return to auto-commit status at some point, once we trust it (which off-PR we talked about as the reason for this change).


- repo: https://github.com/sqlfluff/sqlfluff
rev: 2.3.4
Expand Down
13 changes: 7 additions & 6 deletions cumulus_library/base_table_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,11 @@
import pathlib
import re
import sys

from abc import ABC, abstractmethod
from typing import final

from cumulus_library.databases import DatabaseCursor
from cumulus_library import base_utils
from cumulus_library.databases import DatabaseCursor


class BaseTableBuilder(ABC):
Expand Down Expand Up @@ -70,8 +69,8 @@ def execute_queries(
)

table_name = table_name[0]
# if it contains a schema, remove it (usually it won't, but some CTAS
# forms may)
# if it contains a schema, remove it (usually it won't, but some
# CTAS forms may)
if "." in table_name:
table_name = table_name.split(".")[1].replace('"', "")
table_names.append(table_name)
Expand All @@ -94,7 +93,7 @@ def execute_queries(

self.post_execution(cursor, schema, verbose, drop_table, *args, **kwargs)

def post_execution(
def post_execution( # noqa: B027 - this looks like, but is not, an abstract method
self,
cursor: DatabaseCursor,
schema: str,
Expand Down Expand Up @@ -122,7 +121,9 @@ def comment_queries(self, doc_str=None):
commented_queries.pop()
self.queries = commented_queries

def write_queries(self, path: pathlib.Path = pathlib.Path.cwd() / "output.sql"):
def write_queries(self, path: pathlib.Path | None = None):
if path is None:
path = pathlib.Path.cwd() / "output.sql"
"""writes all queries constructed by prepare_queries to disk"""
dogversioning marked this conversation as resolved.
Show resolved Hide resolved
path.parents[0].mkdir(parents=True, exist_ok=True)
with open(path, "w", encoding="utf-8") as file:
Expand Down
14 changes: 6 additions & 8 deletions cumulus_library/base_utils.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
""" Collection of small commonly used utility functions """

import datetime
import os
import json

import os
from contextlib import contextmanager
from typing import List

from rich import progress

Expand All @@ -15,16 +13,16 @@ def filepath(filename: str) -> str:


def load_text(path: str) -> str:
with open(path, "r", encoding="UTF-8") as fp:
with open(path, encoding="UTF-8") as fp:
return fp.read()


def load_json(path: str) -> dict:
with open(path, "r", encoding="UTF-8") as fp:
with open(path, encoding="UTF-8") as fp:
return json.load(fp)


def parse_sql(sql_text: str) -> List[str]:
def parse_sql(sql_text: str) -> list[str]:
commands = []

for statement in sql_text.split(";"):
Expand All @@ -36,11 +34,11 @@ def parse_sql(sql_text: str) -> List[str]:
return filter_strip(commands)


def filter_strip(commands) -> List[str]:
def filter_strip(commands) -> list[str]:
return list(filter(None, [c.strip() for c in commands]))


def list_coding(code_display: dict, system=None) -> List[dict]:
def list_coding(code_display: dict, system=None) -> list[dict]:
as_list = []
for code, display in code_display.items():
if system:
Expand Down
25 changes: 11 additions & 14 deletions cumulus_library/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,15 @@
import sys
import sysconfig


from typing import Dict, List, Optional

import rich

from cumulus_library import (
__version__,
base_utils,
cli_parser,
databases,
enums,
errors,
base_utils,
protected_table_builder,
study_parser,
upload,
Expand Down Expand Up @@ -60,8 +57,8 @@ def update_transactions(self, prefix: str, status: str):

def clean_study(
self,
targets: List[str],
study_dict: Dict,
targets: list[str],
study_dict: dict,
*,
stats_clean: bool,
prefix: bool = False,
Expand Down Expand Up @@ -106,7 +103,7 @@ def clean_and_build_study(
target: pathlib.Path,
*,
stats_build: bool,
continue_from: str = None,
continue_from: str | None = None,
) -> None:
"""Recreates study views/tables

Expand Down Expand Up @@ -176,7 +173,7 @@ def run_single_table_builder(
parser=self.db.parser(),
)

def clean_and_build_all(self, study_dict: Dict, stats_build: bool) -> None:
def clean_and_build_all(self, study_dict: dict, stats_build: bool) -> None:
"""Builds views for all studies.

NOTE: By design, this method will always exclude the `template` study dir,
Expand Down Expand Up @@ -206,7 +203,7 @@ def export_study(self, target: pathlib.Path, data_path: pathlib.Path) -> None:
studyparser = study_parser.StudyManifestParser(target, data_path)
studyparser.export_study(self.db, data_path)

def export_all(self, study_dict: Dict, data_path: pathlib.Path):
def export_all(self, study_dict: dict, data_path: pathlib.Path):
"""Exports all defined count tables to disk"""
for key in study_dict.keys():
self.export_study(study_dict[key], data_path)
Expand Down Expand Up @@ -254,7 +251,7 @@ def create_template(path: str) -> None:
dest_path.write_bytes(source_path.read_bytes())


def get_study_dict(alt_dir_paths: List) -> Optional[Dict[str, pathlib.Path]]:
def get_study_dict(alt_dir_paths: list) -> dict[str, pathlib.Path] | None:
"""Gets valid study targets from ./studies/, and any pip installed studies

:returns: A list of Path objects
Expand All @@ -264,11 +261,12 @@ def get_study_dict(alt_dir_paths: List) -> Optional[Dict[str, pathlib.Path]]:

# first, we'll get any installed public studies
with open(
pathlib.Path(cli_path, "./module_allowlist.json"), "r", encoding="utf-8"
pathlib.Path(cli_path, "./module_allowlist.json"), encoding="utf-8"
) as study_allowlist_json:
study_allowlist = json.load(study_allowlist_json)["allowlist"]
site_packages_dir = sysconfig.get_path("purelib")
site_packages_dir = "".join(sysconfig.get_path("purelib"))
dogversioning marked this conversation as resolved.
Show resolved Hide resolved
for study, subdir in study_allowlist.items():
print(site_packages_dir)
dogversioning marked this conversation as resolved.
Show resolved Hide resolved
study_path = pathlib.Path(site_packages_dir, subdir)
if study_path.exists():
manifest_studies[study] = study_path
Expand All @@ -295,7 +293,7 @@ def get_studies_by_manifest_path(path: pathlib.Path) -> dict:
return manifest_paths


def run_cli(args: Dict):
def run_cli(args: dict):
"""Controls which library tasks are run based on CLI arguments"""
if args["action"] == "create":
create_template(args["create_dir"])
Expand All @@ -312,7 +310,6 @@ def run_cli(args: Dict):
runner.verbose = True
print("Testing connection to database...")
runner.cursor.execute("SHOW DATABASES")

study_dict = get_study_dict(args["study_dir"])
if "prefix" not in args.keys():
if args["target"]:
Expand Down
18 changes: 11 additions & 7 deletions cumulus_library/cli_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,10 @@ def add_study_dir_argument(parser: argparse.ArgumentParser) -> None:
action="append",
help=(
"Optionally add one or more directories to look for study definitions in. "
"Default is in project directory and CUMULUS_LIBRARY_STUDY_DIR, if present, "
"followed by any supplied paths. Target, and all its subdirectories, "
"are checked for manifests. Overriding studies with the same namespace "
"supersede earlier ones."
"Default is in project directory and CUMULUS_LIBRARY_STUDY_DIR, "
"if present, followed by any supplied paths. Target, and all its "
"subdirectories, are checked for manifests. Overriding studies with the"
" same namespace supersede earlier ones."
),
)

Expand Down Expand Up @@ -87,10 +87,14 @@ def add_db_config(parser: argparse.ArgumentParser) -> None:
)
group.add_argument(
"--database",
# In Athena, we use this as the schema_name (which is also called a Database in their UX).
# In Athena, we use this as the schema_name (which is also called a Database
# in their UX).
#
# In DuckDB, we use this as the path to the filename to store tables.
# Since we started as an Athena-centric codebase, we mostly keep referring to this as
# name "schema_name". But to the user, both uses are still conceptually a "database".
#
# Since we started as an Athena-centric codebase, we mostly keep referring to
# this as name "schema_name". But to the user, both uses are still conceptually
# a "database".
dest="schema_name",
help="Database name (for Athena) or file (for DuckDB)",
)
Expand Down
41 changes: 24 additions & 17 deletions cumulus_library/databases.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
import os
import sys
from pathlib import Path
from typing import Optional, Protocol, Union
from typing import Protocol

import cumulus_fhir_support
import duckdb
Expand All @@ -31,13 +31,13 @@ class DatabaseCursor(Protocol):
def execute(self, sql: str) -> None:
pass

def fetchone(self) -> Optional[list]:
def fetchone(self) -> list | None:
pass

def fetchmany(self, size: Optional[int]) -> Optional[list[list]]:
def fetchmany(self, size: int | None) -> list[list] | None:
pass

def fetchall(self) -> Optional[list[list]]:
def fetchall(self) -> list[list] | None:
pass


Expand Down Expand Up @@ -131,6 +131,7 @@ def execute_as_pandas(self, sql: str) -> pandas.DataFrame:
def parser(self) -> DatabaseParser:
"""Returns parser object for interrogating DB schemas"""

@abc.abstractmethod
def close(self) -> None:
"""Clean up any resources necessary"""

Expand Down Expand Up @@ -174,6 +175,9 @@ def execute_as_pandas(self, sql: str) -> pandas.DataFrame:
def parser(self) -> DatabaseParser:
return AthenaParser()

def close(self) -> None:
return self.connection.close()


class AthenaParser(DatabaseParser):
def validate_table_schema(
Expand All @@ -190,7 +194,7 @@ def __init__(self, db_file: str):
super().__init__("main")
self.connection = duckdb.connect(db_file)
# Aliasing Athena's as_pandas to duckDB's df cast
setattr(duckdb.DuckDBPyConnection, "as_pandas", duckdb.DuckDBPyConnection.df)
duckdb.DuckDBPyConnection.as_pandas = duckdb.DuckDBPyConnection.df

# Paper over some syntax differences between Athena and DuckDB
self.connection.create_function(
Expand Down Expand Up @@ -238,22 +242,24 @@ def __init__(self, db_file: str):
)

def insert_tables(self, tables: dict[str, pyarrow.Table]) -> None:
"""Ingests all ndjson data from a folder tree (often the output folder of Cumulus ETL)"""
"""Ingests all ndjson data from a folder tree.

This is often the output folder of Cumulus ETL"""
for name, table in tables.items():
self.connection.register(name, table)

@staticmethod
def _compat_array_join(
value: Optional[list[Optional[str]]], delimiter: str
) -> Optional[str]:
value: list[str | None] | None, delimiter: str
) -> str | None:
Comment on lines -247 to +254
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be None alright?! Geeze...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was my least favorite addition. we could disable it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't help here exactly, because these were explicit Optionals not implicit.

But I can see the logic for disallowing implicit optional - like if change the default value to not be None anymore, it's a little surprising for that to also change the type of the arg itself. But that's the kind of argument for a library with a strict API and not just devs making code.

I'm fine either way. Disabling individual rules is a little annoying, config wise, but maybe it's a win overall.

if value is None:
return None
return delimiter.join(v for v in value if v is not None)

@staticmethod
def _compat_date(
value: Union[str, datetime.datetime, datetime.date, None]
) -> Optional[datetime.date]:
value: str | datetime.datetime | datetime.date | None,
) -> datetime.date | None:
if value is None:
return None
elif isinstance(value, str):
Expand All @@ -266,14 +272,14 @@ def _compat_date(
raise ValueError("Unexpected date() argument:", type(value), value)

@staticmethod
def _compat_to_utf8(value: Optional[str]) -> Optional[datetime.date]:
def _compat_to_utf8(value: str | None) -> datetime.date | None:
"""See the create_function() call for to_utf8 for more background"""
return value

@staticmethod
def _compat_from_iso8601_timestamp(
value: Optional[str],
) -> Optional[datetime.datetime]:
value: str | None,
) -> datetime.datetime | None:
if value is None:
return None

Expand Down Expand Up @@ -329,12 +335,13 @@ def read_ndjson_dir(path: str) -> dict[str, pyarrow.Table]:
"""Loads a directory tree of raw ndjson into schema-ful tables.

:param path: a directory path
:returns: dictionary of table names (like 'documentreference') to table data (with schema)
:returns: dictionary of table names (like 'documentreference') to table
data (with schema)
"""
all_tables = {}

# Manually specify the list of resources because we want to create each table even if the
# folder does not exist.
# Manually specify the list of resources because we want to create each table
# even if the folder does not exist.
resources = [
"AllergyIntolerance",
"Condition",
Expand Down Expand Up @@ -364,7 +371,7 @@ def read_ndjson_dir(path: str) -> dict[str, pyarrow.Table]:
# Read all ndjson directly into memory
rows = []
for filename in filenames:
with open(filename, "r", encoding="utf8") as f:
with open(filename, encoding="utf8") as f:
for line in f:
rows.append(json.loads(line))

Expand Down
Loading
Loading