Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display ID registration prep, formatter registration fixes, index updates, dtype handlers, expanded data generators #16

Merged
merged 128 commits into from
Aug 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
f2383e6
add random dataframe generator for convenience
shouples Jul 29, 2022
57c9b3b
add logging for renderables
shouples Jul 29, 2022
99a4e60
more callout options; set display ID
shouples Jul 29, 2022
311168c
update black/flake8 configs
shouples Jul 29, 2022
338aba9
version bump; add pyarrow
shouples Jul 29, 2022
89d031d
register display ID; update black configs
shouples Jul 29, 2022
e1404cf
caffeine fever dream that needs a lot of cleanup later once prototypi…
shouples Jul 29, 2022
18df06d
add success!
shouples Jul 29, 2022
d4d568b
logging/callout updates
shouples Jul 30, 2022
f7f3816
callout update
shouples Jul 30, 2022
cd0b331
flatten multiindex series better; pass metadata in ipydisplay
shouples Aug 3, 2022
4da05d1
add settings.LOG_LEVEL; fix stringify columns
shouples Aug 3, 2022
f76eba6
version bump; add helper for generating random dataframe
shouples Aug 4, 2022
729933f
rename "default" display mode to "plain"
shouples Aug 4, 2022
a578b56
add More Info user help section after truncating dataframe
shouples Aug 4, 2022
1e505ec
ensure temporary dx.display calls revert properly
shouples Aug 4, 2022
672d27d
Merge branch 'djs/1.1.3' into djs/display-id-registration
shouples Aug 4, 2022
dfbb7f7
Merge branch 'main' into djs/display-id-registration
shouples Aug 8, 2022
332b645
hotfix
shouples Aug 8, 2022
48704df
remove old code
shouples Aug 8, 2022
eee047c
remove cell_id tracking
shouples Aug 9, 2022
4b953c6
add poetry.toml
shouples Aug 9, 2022
6bf0273
add loggers
shouples Aug 10, 2022
e4939c3
fix bug with .sample() resetting index
shouples Aug 10, 2022
e1a3715
cleanup
shouples Aug 10, 2022
d0257ef
don't mess with the original dataframe object during registration
shouples Aug 10, 2022
de29b31
remove structlog; update loggers
shouples Aug 10, 2022
e29aa04
SQL instead of parquet/pyarrow
shouples Aug 10, 2022
82a083d
more attempts at fixing column dtype wrangling
shouples Aug 10, 2022
3c96172
move filtering out of main formatter
shouples Aug 11, 2022
fda965c
check for dataframe subset associations before display ID registration
shouples Aug 11, 2022
f23b53c
push-down filter override for max display rows
shouples Aug 11, 2022
16eda13
fix set_option reference
shouples Aug 11, 2022
80c7a47
add comms config
shouples Aug 15, 2022
036b8ac
add geopandas as extra install
shouples Aug 15, 2022
d8c7fd8
move utils out of formatters dir
shouples Aug 15, 2022
9a62877
handle circular imports; add subset filter tracking; extra logging
shouples Aug 15, 2022
5614052
separate sampling from utils
shouples Aug 15, 2022
4270d0c
remove column handling from sampling
shouples Aug 15, 2022
71a9bdb
add convenience function for adding renderable
shouples Aug 15, 2022
6bcd6e7
fix reference for testing
shouples Aug 15, 2022
8c69fb2
handle missing key
shouples Aug 15, 2022
935c643
fix ref
shouples Aug 15, 2022
46a095b
disable re-rendering for updates; pass applied filters to frontend
shouples Aug 15, 2022
b143dd9
pull filters from comm msg
shouples Aug 15, 2022
b8edbbd
fix subset filter tracking; update logging
shouples Aug 15, 2022
0b3a7b7
update to default random sampling
shouples Aug 15, 2022
7ef82b0
allow passing ipython shell for registering/getting display IDs
shouples Aug 15, 2022
ef36dcf
pass new display id into sampling if needed
shouples Aug 15, 2022
8848669
update applied_filters assignment
shouples Aug 15, 2022
0b173bb
add settings_context; switch renderables to Set instead of List
shouples Aug 17, 2022
e53ee71
move sampling tests to their own file
shouples Aug 17, 2022
532a39d
add renderables test
shouples Aug 17, 2022
05957b9
remove truncating/sampling tests, fix custom index test
shouples Aug 17, 2022
e9fe281
assign unique name for unassigned variable tracking
shouples Aug 17, 2022
59f5c7f
check setting before index resets
shouples Aug 17, 2022
bdada92
flake8
shouples Aug 17, 2022
395bb4c
adjust pandas options for display/schema changes
shouples Aug 18, 2022
b6c843c
add structlog back in
shouples Aug 18, 2022
c4ba755
pull display ID registration out of get_display_id(); update logging
shouples Aug 18, 2022
eaa85f3
update display mode with settings_context; add log level changes
shouples Aug 18, 2022
e942d9e
pass index flag to dataframe_info metadata; update logging
shouples Aug 18, 2022
0dbb0f6
updates for logging
shouples Aug 18, 2022
392d131
use settings_context; don't display callout with unassigned dataframe…
shouples Aug 18, 2022
94dad9e
use update_display for user query
shouples Aug 18, 2022
e2447ed
remove hyphens from df uuid to enable sqlite filtering
shouples Aug 18, 2022
696883d
add top margin
shouples Aug 18, 2022
0a6413a
update logging; handle display ID register after rendering; update us…
shouples Aug 18, 2022
499d2e3
fix index and column stringifcation for multiindexing
shouples Aug 19, 2022
7d3ac24
more docstrings
shouples Aug 19, 2022
95ef2df
clean up settings
shouples Aug 19, 2022
d1ee209
add pandas option transfer on row/column validation
shouples Aug 19, 2022
4299b05
add flatten_index; handle index/column flattening behind settings
shouples Aug 19, 2022
0ce78bf
add docstring
shouples Aug 19, 2022
2fccd3b
don't reset multiindex level names
shouples Aug 19, 2022
60a23ca
more explicit multiindex handling
shouples Aug 19, 2022
ecbde74
add media type prefixes back to settings so env vars don't overwrite …
shouples Aug 19, 2022
9a64a77
check for custom index before normalizing
shouples Aug 19, 2022
365884a
ugh multiindex.
shouples Aug 19, 2022
2c514c2
remove import
shouples Aug 19, 2022
6e47f41
fix log message
shouples Aug 19, 2022
3ffeb76
update .gitignore
shouples Aug 19, 2022
0aca3c8
break apart utils/helpers into more readable structure to handle new …
shouples Aug 19, 2022
868eef5
remove helpers
shouples Aug 19, 2022
b19be6c
remove function imports
shouples Aug 19, 2022
0eec800
store datetime string format
shouples Aug 19, 2022
501d6f8
move geopandas check out of config
shouples Aug 19, 2022
1b56327
add faker as extra
shouples Aug 19, 2022
97bf0f6
remove comment
shouples Aug 19, 2022
723216f
handle flattening/expanding lists/sets/tuples
shouples Aug 19, 2022
1b71cfc
handle extra dtype cleaning
shouples Aug 20, 2022
4308007
fix for mixed dtypes
shouples Aug 20, 2022
b4742ed
updates for dtype generation and testing
shouples Aug 20, 2022
ce89bbd
change log time format
shouples Aug 20, 2022
4d93a26
check default index earlier; don't generate hash over and over
shouples Aug 20, 2022
4d70ef1
add ENABLE_DATALINK setting to toggle off all the tracking/hashing/etc
shouples Aug 20, 2022
962020a
remove get_applied_filters
shouples Aug 20, 2022
f39fc09
separate cleaning functions between build_table_schema/hash_pandas_ob…
shouples Aug 20, 2022
99da8b4
fix cleaning; remove extra hashing calls; update docstrings
shouples Aug 20, 2022
5d0bc58
add geopandas and faker dev dependencies for dtype testing; add isort…
shouples Aug 20, 2022
a798dc2
make sure we can toggle datalink setting on/off without errors
shouples Aug 20, 2022
0103541
generate display ID if not passed (datalink enabled)
shouples Aug 20, 2022
6924800
clean columns before rendering with datalink disabled
shouples Aug 20, 2022
41eab9a
update random_dataframe columns and testing
shouples Aug 20, 2022
e34857f
verbose unit testing in github workflow
shouples Aug 20, 2022
0345d26
handle no args, test for default data
shouples Aug 20, 2022
a8bb72e
enable datalink setting by default
shouples Aug 20, 2022
fa586fa
use settings context
shouples Aug 21, 2022
ef51366
trigger html.table_schema pandas changes on settings changes
shouples Aug 21, 2022
2b94859
remove config.py
shouples Aug 21, 2022
945d5ac
refactor display formatter registration
shouples Aug 21, 2022
c97f547
remove configs
shouples Aug 21, 2022
55b9752
remove configs
shouples Aug 21, 2022
006af14
fix settings tests
shouples Aug 21, 2022
9b7be70
fix registering tests
shouples Aug 21, 2022
e2e177b
refactor dx media type formatter registration; remove configs
shouples Aug 21, 2022
d086b4a
updates for testing to remove redundant mediatype nesting
shouples Aug 21, 2022
1eb1c78
more debug logging
shouples Aug 21, 2022
834de26
remove flatten_index and fix index/column flattening logic
shouples Aug 21, 2022
1f8013f
comms config behind datalink setting
shouples Aug 21, 2022
ba90c9b
bump up log level, disable datalink, be done with this PR
shouples Aug 21, 2022
cc2bd04
ugh patch this when datalink is disabled
shouples Aug 21, 2022
a1a7371
disable logging auto-config so it doesn't start showing other loggers
shouples Aug 21, 2022
f1cbfa5
fix log message
shouples Aug 21, 2022
ea7c4f5
don't update other loggers levels to INFO
shouples Aug 21, 2022
491c07b
turn logging back on
shouples Aug 21, 2022
95e8eaf
changelog
shouples Aug 21, 2022
04c6a77
this needs more work with the new display formatter registration
shouples Aug 21, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/unit-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ jobs:
poetry install
- name: Pytest - Unit tests
run: |
poetry run pytest dx/tests -x
poetry run pytest dx/tests -xv
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,6 @@ __pycache__/
dist/

.pytest_cache
.python-versions
.venv
.vscode
27 changes: 27 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,33 @@
All notable changes will be documented here.

---
## Unreleased
_2022-08-21_

### Added
- `pd.Series` as a default renderable type (to go with the existing `pd.DataFrame` and `np.ndarray` types)
- Support for the following data types inside `pd.DataFrame` columns:
- `type` and `np.dtype`
- `shapely.geometry` objects
- `pd.Timedelta` and `datetime.timedelta`
- `pd.Period`
- `pd.Interval`
- `complex` numbers
- `ipaddress.IPv4Address` and `.IPv6Address`
- Extra dataset generation functions for development/testing under `dx.utils.datatypes`
- `dx.quick_random_dataframe(n_rows, n_columns)` to get a `pd.DataFrame` of 0.0-1.0 floats (convenience wrapper for `pd.DataFrame(np.random.rand(n_rows, n_columns))`)
- `dx.random_dataframe()` with different boolean values to enable based on available datatypes (`dx.DX_DATATYPES`)
- `settings_context` context manager to allow temporarily changing a setting (or multiple)
- Logging via `structlog` (default level: `logging.WARNING`)

### Changed
- Default sampling method changed from `outer` to `random`

### Fixed
- Displaying a dataframe with an out-of-order index (like with `.sample()`) no longer resets the index before sending data to the frontend.
- Index/column flattening and string-formatting is behind settings and is handled more explicitly
- `dx` should no longer interfere with other media type / mime bundles (e.g. matplotlib) formatted by the existing IPython display formatter

## `1.1.3`
_2022-08-05_
### Added
Expand Down
8 changes: 5 additions & 3 deletions dx/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
from .config import *
from .comms import *
from .dx import *
from .formatters import *
from .helpers import *
from .loggers import *
from .settings import *
from .utils import *

__version__ = "1.1.3"
__version__ = "1.2.0"

configure_logging()
set_display_mode("simple")
37 changes: 37 additions & 0 deletions dx/comms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import structlog
from IPython import get_ipython

from dx.settings import get_settings

settings = get_settings()
logger = structlog.get_logger(__name__)


# ref: https://jupyter-notebook.readthedocs.io/en/stable/comms.html#opening-a-comm-from-the-frontend
def target_func(comm, open_msg):
@comm.on_msg
def _recv(msg):
from dx.filtering import update_display_id

data = msg["content"]["data"]
if "display_id" in data:
update_display_id(
display_id=data["display_id"],
pandas_filter=data.get("pandas_filter"),
sql_filter=data.get("sql_filter"),
filters=data.get("filters"),
output_variable_name=data.get("output_variable_name"),
limit=data["limit"],
)

comm.send({"connected": True})


ipython_shell = get_ipython()
if (
ipython_shell is not None
and getattr(ipython_shell, "kernel", None)
and settings.ENABLE_DATALINK
):
COMM_MANAGER = ipython_shell.kernel.comm_manager
COMM_MANAGER.register_target("datalink", target_func)
33 changes: 0 additions & 33 deletions dx/config.py

This file was deleted.

8 changes: 3 additions & 5 deletions dx/dx.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from IPython.core.interactiveshell import InteractiveShell
from IPython.display import display as ipydisplay

from dx.settings import set_display_mode, settings
from dx.settings import settings_context
from dx.types import DXDisplayMode


Expand All @@ -29,11 +29,9 @@ def display(
raise ValueError(f"Unsupported file type: `{path.suffix}`")

df = pd.DataFrame(data)
with settings_context(display_mode=mode, ipython_shell=ipython_shell):
ipydisplay(df)

orig_mode = settings.DISPLAY_MODE.value
set_display_mode(mode, ipython_shell=ipython_shell)
ipydisplay(df)
set_display_mode(orig_mode, ipython_shell=ipython_shell)
return


Expand Down
88 changes: 88 additions & 0 deletions dx/filtering.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
from typing import Optional

import pandas as pd
import structlog
from IPython.display import update_display

from dx.formatters.callouts import display_callout
from dx.settings import get_settings, settings_context
from dx.utils.formatting import expand_sequences
from dx.utils.tracking import (
DATAFRAME_HASH_TO_VAR_NAME,
DISPLAY_ID_TO_DATAFRAME_HASH,
SUBSET_TO_DATAFRAME_HASH,
generate_df_hash,
)

logger = structlog.get_logger(__name__)

settings = get_settings()


SUBSET_FILTERS = {}


def update_display_id(
display_id: str,
sql_filter: str,
pandas_filter: Optional[str] = None,
filters: Optional[dict] = None,
output_variable_name: Optional[str] = None,
limit: Optional[int] = None,
) -> None:
"""
Filters the dataframe in the cell with the given display_id.
"""
from dx.utils.tracking import sql_engine

global SUBSET_FILTERS

row_limit = limit or settings.DISPLAY_MAX_ROWS
df_hash = DISPLAY_ID_TO_DATAFRAME_HASH[display_id]
df_name = DATAFRAME_HASH_TO_VAR_NAME[df_hash]
table_name = f"{df_name}__{df_hash}"

query_string = sql_filter.format(table_name=table_name)
logger.debug(f"sql query string: {query_string}")
new_df = pd.read_sql(query_string, sql_engine)

# in the event there were nested values stored,
# try to expand them back to their original datatypes
for col in new_df.columns:
new_df[col] = new_df[col].apply(expand_sequences)

# this is associating the subset with the original dataframe,
# which will be checked when the DisplayFormatter.format() is called
# during update_display(), which will prevent re-registering the display ID to the subset
new_df_hash = generate_df_hash(new_df)

# store filters to be passed through metadata to the frontend
logger.debug(f"applying {filters=}")
filters = filters or []
SUBSET_FILTERS[new_df_hash] = filters

logger.debug(f"assigning subset {new_df_hash} to parent {df_hash=}")
SUBSET_TO_DATAFRAME_HASH[new_df_hash] = df_hash

# allow temporary override of the display limit
with settings_context(DISPLAY_MAX_ROWS=row_limit):
logger.debug(f"updating {display_id=} with {min(row_limit, len(new_df))}-row resample")
update_display(new_df, display_id=display_id)

# we can't reference a variable type to suggest to users to perform a `df.query()`
# type operation since it was never declared in the first place
if not df_name.startswith("unk_dataframe_"):
# TODO: replace with custom callout media type
output_variable_name = output_variable_name or "new_df"
filter_code = f"""{output_variable_name} = {df_name}.query("{pandas_filter.format(df_name=df_name)}", engine="python")"""
filter_msg = f"""Copy the following snippet into a cell below to save this subset to a new dataframe:
<pre style="background-color:white; padding:0.5rem; border-radius:5px;">{filter_code}</pre>
"""
display_callout(
filter_msg,
header=False,
icon="info",
level="success",
display_id=display_id + "-primary",
update=True,
)
22 changes: 18 additions & 4 deletions dx/formatters/callouts.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import uuid
from typing import Optional

from IPython.display import HTML, display
from IPython.display import HTML, display, update_display
from pydantic import BaseModel


Expand Down Expand Up @@ -37,11 +37,18 @@ def html(self):
callout_classes.append(f"bp3-icon-{self.icon.value}-sign")
callout_class_str = " ".join(callout_classes)

msg = self.message
if self.use_header:
heading_html = f"<h6 class='bp3-heading'>{self.level.value.title()}</h6>"
return f"""<div class="{callout_class_str}" style="margin-bottom: 0.5rem">{heading_html}{self.message}</div>"""
msg = f"{heading_html}{self.message}"

return f"""<div class="{callout_class_str}" style="margin-bottom: 0.5rem">{self.message}</div>"""
style = ";".join(
[
"margin-bottom: 0.5rem",
"margin-top: 0.5rem",
]
)
return f"""<div class="{callout_class_str}" style="{style}">{msg}</div>"""


def display_callout(
Expand All @@ -50,6 +57,7 @@ def display_callout(
header: bool = True,
icon: Optional[CalloutIcon] = None,
display_id: str = None,
update: bool = False,
) -> None:
callout = Callout(
message=message,
Expand All @@ -61,4 +69,10 @@ def display_callout(

# TODO: coordinate with frontend to replace this with a standalone media type
# instead of rendering HTML with custom classes/styles
display(HTML(callout.html), display_id=display_id)
if update:
update_display(HTML(callout.html), display_id=display_id)
else:
display(
HTML(callout.html),
display_id=display_id,
)
Loading