Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Changing to MIME based rendering #216

Merged
merged 18 commits into from
Jul 4, 2017
Merged

Conversation

ellisonbg
Copy link
Collaborator

@ellisonbg ellisonbg commented Sep 29, 2016

Initial prototype that works with JupyterLab.

Closes #294 and #172.

@jakevdp
Copy link
Collaborator

jakevdp commented Oct 12, 2016

Is this close to being ready to merge? I'm excited to see Altair working with JupyterLab!

@sanga
Copy link

sanga commented Oct 29, 2016

Any news on this? I've mostly switched to jupyterlab at this point but I'm sorely missing Altair.

@jakevdp
Copy link
Collaborator

jakevdp commented Oct 29, 2016

You must be the earliest of early-adopters 😄

@gnestor
Copy link
Collaborator

gnestor commented Jan 17, 2017

I just checked this out and tested with jupyterlab_vega and it's for working me!

lab:

lab

notebook:

lab

altair/api.py Outdated
from vega import VegaLite
display(VegaLite(self.to_dict()))
spec = self.to_dict()
data = {'application/vnd.vegalite+json': spec}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding fallback mimetypes:

data = {
    'application/vnd.vegalite+json': spec,
    'application/json': spec,
    'text/plain': '<altair.VegaLite object>'
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add the text/plain one, but because the spec includes the full data, I don't think we should ship it twice...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok:

data = {
    'application/vnd.vegalite.v1+json': spec,
    'text/plain': '<altair.VegaLite object>'
}

@ellisonbg
Copy link
Collaborator Author

Before we merge this PR, other things to add:

  • Bring over utils.py from ipyvega
  • Depend on jupyterlab_vega instead of ipyvega

@gnestor
Copy link
Collaborator

gnestor commented Jan 20, 2017

☑️ Bring over utils.py from ipyvega: altair-viz/jupyter_vega#8

@gnestor
Copy link
Collaborator

gnestor commented Mar 11, 2017

Ok, last thing to do is move utils.py from jupyterlab_vega: https://github.com/altair-viz/jupyterlab_vega/blob/master/jupyterlab_vega/utils.py

From my understanding, the need for utils.py is to sanitize DataFrames before displaying. It does this by:

  • Copying the DaraFrame
  • Raise ValueError if it has a hierarchical index
  • Convert categoricals to strings
  • Convert np.int dtypes to Python int objects
  • Convert floats to objects and replace NaNs by None
  • Convert DateTime dtypes into appropriate string representations

Another option is:

data = df.to_json(orient='records')

This should sanitize the DataFrame as well and it doesn't require that we maintain this sanitization code in the future.

Thoughts?

@ellisonbg
Copy link
Collaborator Author

ellisonbg commented Mar 11, 2017 via email

@gnestor
Copy link
Collaborator

gnestor commented Mar 11, 2017

Ok, regarding the classes, there isn't much in there that's relevant to Altair. Here is the Vega (Base) class:

class Vega():
    """A display class for displaying Vega visualizations in the Jupyter Notebook and IPython kernel.
    
    Vega expects a spec (a JSON-able dict) and data (dict) argument
    not already-serialized JSON strings.
    Scalar types (None, number, string) are not allowed, only dict containers.
    """
    
    # wrap data in a property, which warns about passing already-serialized JSON
    _spec = None
    _data = None
    _read_flags = 'r'
    
    def __init__(self, spec=None, data=None, url=None, filename=None, metadata=None):
        """Create a Vega display object given raw data.
        Parameters
        ----------
        spec : dict
            Vega spec. Not an already-serialized JSON string.
        data : dict
            A dict of Vega datasets where the key is the dataset name and the 
            value is the data values. Not an already-serialized JSON string.
            Scalar types (None, number, string) are not allowed, only dict
            or list containers.
        url : unicode
            A URL to download the data from.
        filename : unicode
            Path to a local file to load the data from.
        metadata: dict
            Specify extra metadata to attach to the json display object.
        """
        
        if spec is not None and isinstance(spec, str):
            if spec.startswith('http') and url is None:
                url = spec
                filename = None
                spec = None
            elif _safe_exists(spec) and filename is None:
                url = None
                filename = spec
                spec = None
        
        self.spec = spec
        self.data = data
        self.metadata = metadata
        self.url = url
        self.filename = filename
        
        self.reload()
        self._check_data()

    def reload(self):
        """Reload the raw spec from file or URL."""
        if self.filename is not None:
            with open(self.filename, self._read_flags) as f:
                self.spec = json.loads(f.read())
        elif self.url is not None:
            try:
                # Deferred import
                from urllib.request import urlopen
                response = urlopen(self.url)
                self.spec = response.read()
                # extract encoding from header, if there is one:
                encoding = None
                for sub in response.headers['content-type'].split(';'):
                    sub = sub.strip()
                    if sub.startswith('charset'):
                        encoding = sub.split('=')[-1].strip()
                        break
                # decode spec, if an encoding was specified
                if encoding:
                    self.spec = self.spec.decode(encoding, 'replace')
            except:
                self.spec = None
    
    def _check_data(self):
        if self.spec is not None and not isinstance(self.spec, dict):
            raise TypeError("%s expects a JSONable dict, not %r" % (self.__class__.__name__, self.spec))
        if self.data is not None and not isinstance(self.data, dict):
            raise TypeError("%s expects a dict, not %r" % (self.__class__.__name__, self.data))

    @property
    def spec(self):
        return self._spec
        
    @property
    def data(self):
        return self._data
        
    @spec.setter
    def spec(self, spec):
        if isinstance(spec, str):
            # warnings.warn("%s expects a JSONable dict, not %r" % (self.__class__.__name__, spec))
            spec = json.loads(spec)
        self._spec = spec

    @data.setter
    def data(self, data):
        if isinstance(data, str):
            # warnings.warn("%s expects a dict, not %r" % (self.__class__.__name__, data))
            data = json.loads(data)
        self._data = data
        
    def _ipython_display_(self):
        bundle = {
            'application/vnd.vega.v2+json': prepare_vega_spec(self.spec, self.data),
            'text/plain': '<jupyterlab_vega.Vega object>'
        }
        display(bundle, raw=True) 

Altair already handles URLs and data validation, pandas can load data from files, so it seems to me that the only relevant part is _ipython_display_.

If we take a look at utils.py where prepare_vegalite_spec is imported from:

def prepare_vegalite_spec(spec, data=None):
    """Prepare a Vega-Lite spec for sending to the frontend.
    This allows data to be passed in either as part of the spec
    or separately. If separately, the data is assumed to be a
    pandas DataFrame or object that can be converted to to a DataFrame.
    Note that if data is not None, this modifies spec in-place
    """

    if isinstance(data, pd.DataFrame):
        # We have to do the isinstance test first because we can't
        # compare a DataFrame to None.
        data = sanitize_dataframe(data)
        spec['data'] = {'values': data.to_dict(orient='records')}
    elif data is None:
        # Data is either passed in spec or error
        if 'data' not in spec:
            raise ValueError('No data provided')
    else:
        # As a last resort try to pass the data to a DataFrame and use it
        data = pd.DataFrame(data)
        data = sanitize_dataframe(data)
        spec['data'] = {'values': data.to_dict(orient='records')}
    return spec

Given that Altair only stores data in a DataFrame and .to_dict() returns data in the expected format ({'values': data}), the only relevant line is data = sanitize_dataframe(data).

Now, .to_dict() uses ToDict from utils/visitors.py and it looks like:

class ToDict(Visitor):
    """Crawl object structure to output dictionary"""
    def generic_visit(self, obj, *args, **kwargs):
        return obj

    def visit_list(self, obj, *args, **kwargs):
            return [self.visit(o) for o in obj]

    def visit_BaseObject(self, obj, *args, **kwargs):
        D = {}
        for k in obj.traits():
            if k in obj and k not in obj.skip:
                v = getattr(obj, k)
                if v is not None:
                    D[k] = self.visit(v)
        return D

    def _visit_with_data(self, obj, data=True):
        D = self.visit_BaseObject(obj)
        if data:
            if isinstance(obj.data, schema.Data):
                D['data'] = self.visit(obj.data)
            elif isinstance(obj.data, pd.DataFrame):
                values = sanitize_dataframe(obj.data).to_dict(orient='records')
                D['data'] = self.visit(schema.Data(values=values))
        else:
            D.pop('data', None)
        return D

    visit_Chart = _visit_with_data
    visit_LayeredChart = _visit_with_data
    visit_FacetedChart = _visit_with_data

Notice values = sanitize_dataframe(obj.data).to_dict(orient='records')! sanitize_dataframe is already in utils/core.py.

In conclusion, I don't think we need to migrate anything from jupyterlab_vega because sanitize_dataframe is already being used in the display method:

def _ipython_display_(self):
        from IPython.display import display
        spec = self.to_dict()
        data = {
            'application/vnd.vegalite.v1+json': spec,
            'text/plain': '<altair.VegaLite object>'
        }
        display(data, raw=True)

@gnestor
Copy link
Collaborator

gnestor commented Mar 12, 2017

TL;DR:

Altair's _ipython_display_ calls spec = self.to_dict() which calls ToDict which calls sanitize_dataframe(obj.data).to_dict(orient='records'), so Altair's display method is already sanitizing DataFrames, and the rest jupyterlab_vega's display classes is a reimplementation of IPython.display.DisplayObject which involves a) loading data from a file path or URL, b) validating data, and c) getting/setting data, all of which Altair already does (except for loading from a file path).

@gnestor
Copy link
Collaborator

gnestor commented Apr 20, 2017

@jakevdp Any idea why this task is failing? https://travis-ci.org/altair-viz/altair/jobs/223802066

@jakevdp
Copy link
Collaborator

jakevdp commented Apr 20, 2017

Looks like it's trying to install IPython 6.0 on Python 2.7. I think the fix is to update the travis.yml so that it runs pip install pip --upgrade before installing the other dependencies.

@jakevdp
Copy link
Collaborator

jakevdp commented Apr 20, 2017

Hopefully will be fixed by #319

@jakevdp
Copy link
Collaborator

jakevdp commented Apr 20, 2017

OK – if you pull in those commits from master, things should work now.

@gnestor
Copy link
Collaborator

gnestor commented Apr 20, 2017

It worked! Thanks @jakevdp! This is ready for review. Feel free to check out and test with JupyterLab.

@jakevdp
Copy link
Collaborator

jakevdp commented Apr 20, 2017

Looks good! The only issues I see reading through the changes are that requirements.txt and INSTALL_REQUIRES in setup.py need to be changed to reflect the updated dependency.

I'll try to give it a test run in JupyterLab today.

@gnestor
Copy link
Collaborator

gnestor commented Apr 20, 2017

Good catch! Updated.

@jakevdp
Copy link
Collaborator

jakevdp commented Apr 20, 2017

Oops, I should have thought of that... travis uses the contents of requirements.txt to install the requirements, so they have to be up-to-date on PyPI and conda-forge before this will work.

Brings me to altair-viz/jupyter_vega#32, which we need to think about before moving forward here.

@ellisonbg
Copy link
Collaborator Author

I have picked this work up again in preparation for the JupyterLab beta, which is planned for this Friday. We have decided that JupyterLab will ship with vega/vegalite rendering built in. Because of that the jupyterlab_vega package won't be needed.

Based on that, I have done the following:

  • Added a enable|disable_mime_rendering that can enable/disable the logic for jupyterlab/nteract.
  • That logic requires the latest release of ipython.
  • In the classic notebook, things should continue to work with ipvega, but that package is not required for jupyterlab/nteract.
  • I have moved all of the various utilities around to prepare for the day when ipyvega is not needed.
  • This adds public Vega and VegaLite classes to Altair that work with lower-level dict objects that represent the raw specs.

I am finishings this up, but things are already working well:

screen shot 2017-06-20 at 8 41 59 pm

@ellisonbg
Copy link
Collaborator Author

@rgbkrk this PR should allow Altair to work again in nteract. Just need to call enable_mime_rendering.

@gnestor

doc/index.rst Outdated

# load built-in dataset as a pandas DataFrame
cars = load_dataset('cars')

# Needed for rendering in JupyterLab, nteract, omit in the classic Jupyter Notebook
enable_mime_rendering()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an irreversible operation, we might comment it out for people who do a mass copy-paste and run the code before reading it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@jakevdp
Copy link
Collaborator

jakevdp commented Jul 1, 2017

I've not had a chance to try it yet, but the code looks good on a read-through. I'm happy to have this merged.

@ellisonbg
Copy link
Collaborator Author

Going ahead with the merge! Once this jupyterlab/jupyterlab#2578 is merged everything should be working fine with master of both.

@ellisonbg ellisonbg merged commit e669ab9 into vega:master Jul 4, 2017
@gnestor
Copy link
Collaborator

gnestor commented Jul 4, 2017

lgeiger added a commit to lgeiger/nteract that referenced this pull request Jul 5, 2017
rgbkrk pushed a commit to nteract/nteract that referenced this pull request Jul 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants