Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add guide to indexing and slicing #225

Merged
merged 7 commits into from
Feb 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/assets/guide/data_structures/img_0.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/guide/data_structures/img_1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/guide/data_structures/img_2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ recommonmark
toml
pydata_sphinx_theme
ipython
jupyter-sphinx
sphinx-panels
7 changes: 0 additions & 7 deletions docs/source/apidocs/modules.rst

This file was deleted.

7 changes: 7 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,14 @@
"sphinx.ext.coverage",
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinx.ext.todo",
"sphinx_rtd_theme",
"nbsphinx",
"recommonmark",
"sphinx_panels",
"IPython.sphinxext.ipython_directive",
"IPython.sphinxext.ipython_console_highlighting",
"jupyter_sphinx",
]

# Add any paths that contain templates here, relative to this directory.
Expand All @@ -65,6 +69,7 @@
#
html_theme = "pydata_sphinx_theme"
html_logo = "../assets/meerkat_banner_padded.svg"
html_favicon = "../assets/meerkat_logo.png"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
Expand All @@ -82,3 +87,5 @@
"tabs-color-label-active": "rgb(108,72,232)",
"tabs-color-label-inactive": "rgba(108,72,232,0.5)",
}

todo_include_todos = True
9 changes: 9 additions & 0 deletions docs/source/display.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import meerkat as mk


def display_dp(dp: mk.DataPanel, name: str):
body_html = dp._repr_html_()
css = open("source/html/display/datapanel.css", "r").read()
body_html = body_html.replace("\n", f"\n <style> {css} </style>", 1)
open(f"source/html/display/{name}.html", "w").write(body_html)
return dp
9 changes: 9 additions & 0 deletions docs/source/guide/column_types.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@

Overview of Column Types
=========================

.. todo::

Fill in this stub.
Envisioning an overview of common column types, and when to use them.

2 changes: 1 addition & 1 deletion docs/source/guide/copying.rst
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ copies of the the columns in ``dp2``
``EntityDataPanel``) Same as “View” above.

Behavior when Indexing
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~

Indexing rows
--------------
Expand Down
87 changes: 80 additions & 7 deletions docs/source/guide/data_structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,97 @@ Introduction to Data Structures
================================

Meerkat provides two data structures, the column and the datapanel, that together help
machine learning practitioners wrangle their data. Everything you do with Meerkat will
you build, manage, and explore machine learning datasets . Everything you do with Meerkat will
involve one or both of these data structures, so we begin this user guide with their
high-level introduction.


Column
-------
A column is a sequential data structure (analagous to a `Series <https://pandas.pydata.org/docs/reference/api/pandas.Series.html>`_ in Pandas or a `Vector <https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Simple-manipulations-numbers-and-vectors>`_ in R).
Meerkat supports a diverse set of column types (*e.g.* :class:`~meerkat.NumpyArrayColumn`,
:class:`~meerkat.ImageColumn`), each with its own backend for storage.
All columns are subclasses of :class:`~meerkat.AbstractColumn` and share a common
interface, which includes :meth:`~meerkat.AbstractColumn.__len__`, :meth:`~meerkat.AbstractColumn.__getitem__`, :meth:`~meerkat.AbstractColumn.__setitem__`, :meth:`~meerkat.AbstractColumn.filter`, :meth:`~meerkat.AbstractColumn.map`, and :meth:`~meerkat.AbstractColumn.concat`.
:class:`~meerkat.ImageColumn`), each intended for different kinds of data. To see a
list of the core column types and their capabilities, see :doc:`column_types`.

Below we create a simple column to hold a set of images stored on disk. To create it,
we simply pass filepaths to the :class:`~meerkat.ImageColumn` constructor.

.. ipython:: python

import os
import meerkat as mk
img_col = mk.ImageColumn(
["img_0.jpg", "img_1.jpg", "img_2.jpg"],
base_dir="assets/guide/data_structures"
)
img_col

@suppress
from display import display_dp
@suppress
display_dp(img_col, "simple_column")

.. raw:: html
:file: ../html/display/simple_column.html

Some column types may also have additional functionality. For example,
All Meerkat columns are subclasses of :class:`~meerkat.AbstractColumn` and share a common
interface, which includes :meth:`~meerkat.AbstractColumn.__len__`, :meth:`~meerkat.AbstractColumn.__getitem__`, :meth:`~meerkat.AbstractColumn.__setitem__`, :meth:`~meerkat.AbstractColumn.filter`, :meth:`~meerkat.AbstractColumn.map`, and :meth:`~meerkat.AbstractColumn.concat`. Below we get the length of the column we just created.

.. ipython:: python

len(img_col)


Certain column types may expose additional functionality. For example,
:class:`~meerkat.NumpyArrayColumn` inherits most of the functionality of an
`ndarray <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html>`_.

.. ipython:: python

id_col = mk.NumpyArrayColumn([0, 1, 2])
id_col.sum()
id_col == 1

To see the full list of methods available to a column type,

If you don't know which column type to use, you can just pass a familiar data
structure like a ``list``, ``np.ndarray``, ``pd.Series``, and ``torch.Tensor`` to
:meth:`~meerkat.AbstractColumn.from_data` and Meerkat will automatically pick an
appropriate column type.

.. ipython:: python

import torch
tensor = torch.tensor([1,2,3])
mk.AbstractColumn.from_data(tensor)

DataPanel
----------
A :class:`DataPanel` is a collection of equal-length columns (analagous to a `DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame>`_ in Pandas or R).
A :class:`DataPanel` is a collection of equal-length columns (analagous to a `DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame>`_ in Pandas or R).
DataPanels in Meerkat are used to manage datasets and per-example artifacts (*e.g.* model predictions and embeddings).

Below we combine the columns we created above into a single DataPanel. We also add an
additional column containing labels for the images. Note that we can pass non-Meerkat data
structures like ``list``, ``np.ndarray``, ``pd.Series``, and ``torch.Tensor`` directly to the
DataPanel constructor and Meerkat will infer the column type. We do not need to first
convert to a Meerkat column.

.. ipython:: python

dp = mk.DataPanel(
{
"img": img_col,
"label": ["boombox", "truck", "dog"],
"id": id_col,
}
)
dp

@suppress
from display import display_dp
@suppress
display_dp(dp, "simple_dp")

.. raw:: html
:file: ../html/display/simple_dp.html

Read on to learn how we access the data in Columns and DataPanels.
8 changes: 8 additions & 0 deletions docs/source/guide/dataset_creation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

Creating Machine Learning Datasets with Meerkat
================================================

.. todo::

Fill in this stub.

7 changes: 7 additions & 0 deletions docs/source/guide/error_analysis.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@

Performing Error Analysis with Meerkat
=======================================

.. todo::

Fill in this stub.
22 changes: 21 additions & 1 deletion docs/source/guide/guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,31 @@ Meerkat Basics
:maxdepth: 2

data_structures
slicing
lambda
column_types
io
map
ops
visualization
patterns

Common Use Cases
----------------

.. toctree::
:maxdepth: 2

dataset_creation
model_training
model_evaluation
error_analysis

Advanced Topics
----------------

.. toctree::
:maxdepth: 2

copying
copying
subclass
33 changes: 33 additions & 0 deletions docs/source/guide/io.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
I/O
====

Writing to Disk
----------------

.. todo::

Fill in this stub.


Reading from Disk
-------------------

.. todo::

Fill in this stub.


Importing into Meerkat
-----------------------

.. todo::

Fill in this stub.


Exporting from Meerkat
-----------------------

.. todo::

Fill in this stub.
46 changes: 46 additions & 0 deletions docs/source/guide/lambda.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@

Lambda Columns and Lazy Selection
==================================

Lambda Columns
--------------

If you check out the implementation of :class:`~meerkat.ImageColumn`, you'll notice that it's a super simple subclass of :class:`~meerkat.LambdaColumn`.

*What's a LambdaColumn?* In Meerkat, high-dimensional data types like images and videos are typically stored in a :class:`~meerkat.LambdaColumn`. A :class:`~meerkat.LambdaColumn` wraps around another column and applies a function to it's content as it is indexed.

Consider the following example, where we create a simple Meerkat column...

.. ipython:: python

import meerkat as mk

col = mk.NumpyArrayColumn([0,1,2])
col[1]


...and wrap it in a lambda column.

.. ipython:: python

lambda_col = col.to_lambda(fn=lambda x: x + 10)
lambda_col[1] # the function is only called at this point!


Critically, the function inside a lambda column is only called at the time the column is indexed! This is very useful for columns with large data types that we don't want to load all into memory at once. For example, we could create a :class:`~meerkat.LambdaColumn` that lazily loads images...

.. ipython:: python
:verbatim:

filepath_col = mk.PandasSeriesColumn(["path/to/image0.jpg", ...])
img_col = filepath_col.to_lambda(fn=load_image)


An :class:`~meerkat.ImageColumn` is a just a :class:`~meerkat.LambdaColumn` like this one, with a few more bells and whistles!

Lazy Selection
--------------

.. todo::

Fill in this stub.
26 changes: 26 additions & 0 deletions docs/source/guide/map.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@

Map, Filter, and Update
========================

Map
----

.. todo::

Fill in this stub.


Filter
-------

.. todo::

Fill in this stub.

Update
-------

.. todo::

Fill in this stub.

7 changes: 7 additions & 0 deletions docs/source/guide/model_evaluation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@

Evaluating Models with Meerkat
===============================

.. todo::

Fill in this stub.
7 changes: 7 additions & 0 deletions docs/source/guide/model_training.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@

Training Models with Meerkat
=============================

.. todo::

Fill in this stub.
26 changes: 26 additions & 0 deletions docs/source/guide/ops.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@

Operations
==================

Merge
------

.. todo::

Fill in this stub.


Concat
------

.. todo::

Fill in this stub.


GroupBy
-------

.. todo::

Fill in this stub.
7 changes: 7 additions & 0 deletions docs/source/guide/patterns.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@

Patterns and Anti-patterns
===========================

.. todo::

Fill in this stub.
Loading