Skip to content

Commit

Permalink
Added figure to InferenceData tutorial (#510)
Browse files Browse the repository at this point in the history
* Added figure to InferenceData tutorial

Slight rewording of some comments as well

* Remove html for proper github rendering

* Update per feedback

* PyMC3 and PyStan

* and a few lines below where it says "the inspiration between InferenceData" I guess it should be "the inspiration for InferenceData"
  • Loading branch information
canyon289 authored and aloctavodia committed Jan 11, 2019
1 parent 72070a4 commit 144c6d9
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 6 deletions.
Binary file added doc/notebooks/InferenceDataStructure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 9 additions & 6 deletions doc/notebooks/XarrayforArviZ.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to xarray, InferenceData, and netCDF for ArviZ"
"# Introduction to xarray, InferenceData, and NetCDF for ArviZ"
]
},
{
Expand Down Expand Up @@ -42,9 +42,12 @@
"\n",
"\n",
"## Why not Pandas Dataframes or Numpy Arrays?\n",
"Data from probabilistic programming is naturally high dimensional. To add to the complexity ArviZ must handle the data generated from multiple Bayesian Modeling libraries, such as pymc3 and pystan. This is an application that the *xarray* package handles quite well. The xarray package lets users manage high dimensional data with human readable dimensions and coordinates quite easily.\n",
"Data from probabilistic programming is naturally high dimensional. To add to the complexity ArviZ must handle the data generated from multiple Bayesian Modeling libraries, such as PyMC3 and PyStan. This is an application that the *xarray* package handles quite well. The xarray package lets users manage high dimensional data with human readable dimensions and coordinates quite easily.\n",
"\n",
"Although seemingly more complex at a glance the Arviz devs believe that the usage of *xarray*, *InferenceData*, and *NetCDF* will simplify the handling, referencing, and serialization of data generated by MCMC runs."
"![InferenceData Structure](InferenceDataStructure.png) \n",
"\n",
"Above is a visual representation of the data structures and their relationships. Although seemingly more complex at a glance the ArviZ devs believe that the usage of *xarray*, *InferenceData*, and *NetCDF* will simplify the handling, referencing, and serialization of data generated during Bayesian analysis. \n",
"\n"
]
},
{
Expand Down Expand Up @@ -171,15 +174,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"It should be noted that the observed dataset contains only 8 data variables and doesn't have a chain and draw dimension or coordinates unlike posterior. This difference in sizes is the motivating reason behind *InferenceData*. Rather than force multiple different sized arrays into one array, or force users to manage multiple objects corresponding to different datasets, it is easier to hold references to each *xarray.Dataset* in an *InferenceData* object."
"It should be noted that the observed dataset contains only 8 data variables and doesn't have a chain and draw dimension or coordinates unlike posterior. This difference in sizes is the motivating reason behind *InferenceData*. Rather than force multiple different sized arrays into one array, or have users to manage multiple objects corresponding to different datasets, it is easier to hold references to each *xarray.Dataset* in an *InferenceData* object."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## NetCDF\n",
"[NetCDF](https://www.unidata.ucar.edu/software/netcdf/) is a standard for referencing array oriented files. In other words while, *xarray.Dataset*s, and by extension *InferenceData*, are convenient for accessing arrays in Python memory, *NetCDF* provides a convenient mechanism for persistence of model data on disk. In fact the NetCDF dataset was the inspiration between *InferenceData* as NetCDF4 supports the concepts of groups. *InferenceData* merely wraps xarray.Dataset with the same functionality,\n",
"[NetCDF](https://www.unidata.ucar.edu/software/netcdf/) is a standard for referencing array oriented files. In other words while, *xarray.Dataset*s, and by extension *InferenceData*, are convenient for accessing arrays in Python memory, *NetCDF* provides a convenient mechanism for persistence of model data on disk. In fact the NetCDF dataset was the inspiration for *InferenceData* as NetCDF4 supports the concept of groups. *InferenceData* merely wraps xarray.Dataset with the same functionality,\n",
"\n",
"Most users will not have to concern themselves with the *NetCDF* standard but for completeness it is good to make its usage transparent. It is also worth noting that the NetCDF4 file standard is interoperable with HDF5 which may be familiar from other contexts.\n",
"\n",
Expand Down Expand Up @@ -256,7 +259,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
"version": "3.5.6"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 144c6d9

Please sign in to comment.