Inconsistency between docstring and implementation in explanation_dashboard_input.py #1270

smrnvdhy · 2022-03-15T16:12:12Z

The docstring mentions dataset must have fewer than 10000 rows and fewer than 1000 columns. However, the code checks for 100000 rows and 1000 columns.

Should be an easy fix that anyone could do. I'd fork and PR, but I don't know which one is the intended value. Slightly leaning towards 10000, because it also appears in explanation_dashboard.py docstring

imatiach-msft · 2022-03-15T16:23:07Z

@smrnvdhy great find! I think we should make it 100,000 rows, since we can sometimes handle 10k-20k rows for certain datasets. However, after 15k rows usually the visualization becomes very slow to load in the browser, and past 20k rows it usually fails with some error. There are some ideas in our team on how we might be able to lift these restrictions - specifically, by doing more of the calculations/clustering in the backend and using streaming so that we don't load all of the data into the UI. I'll send a PR to update the docstrings.

smrnvdhy · 2022-03-16T14:57:41Z

@imatiach-msft Thank you for addressing this so quickly! I also observe long load time just like you mentioned (40 minutes wall time for 14k rows). I'm wondering if you could share which functionality/calculation is making the UI slow to load. Is there an easy way to make that part optional? My team love that we can easily investigate model performance and feature importance by subgroups, but the load time is a major concern.

imatiach-msft · 2022-03-16T15:52:30Z

@smrnvdhy
"which functionality/calculation is making the UI slow to load"
This is mostly due to loading all of the datapoints into the typescript (compiled to javascript) code. The browser is just not able to handle so much data. The solution seems to be to perform aggregations in the backend and then load summaries in the frontend, similar to how the ErrorAnalysisDashboard is able to load a lot of data (as long as a sample is passed for the dataset explorer, which may also be resolved in the future with similar aggregations or by streaming specific datapoints, depending on the scenario):

https://github.com/microsoft/responsible-ai-toolbox/blob/main/raiwidgets/tests/test_error_analysis_dashboard.py#L70

And also in the notebook (where the sample_dataset parameter doesn't need to be passed since the original explanation was downsampled):
https://github.com/microsoft/responsible-ai-toolbox/blob/main/notebooks/individual-dashboards/erroranalysis-dashboard/erroranalysis-interpretability-dashboard-census.ipynb

I even have a POC working of running Error Analysis Dashboard on spark (millions of rows) for the tree view.
However, for explanation dashboard currently we load all data in the dataset explorer, so the browser simply starts to fail at some point. Browsers have much lower limits than the python environment, and typescript code is also slower/less efficient. It's considered a bad pattern to have the front-end do a lot of the calculations anyway.

imatiach-msft · 2022-03-22T15:20:51Z

closing issue as PR has been merged:
#1271

imatiach-msft mentioned this issue Mar 15, 2022

update docstring for explanation dashboard in regards to min number of rows #1271

Merged

18 tasks

imatiach-msft closed this as completed Mar 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency between docstring and implementation in explanation_dashboard_input.py #1270

Inconsistency between docstring and implementation in explanation_dashboard_input.py #1270

smrnvdhy commented Mar 15, 2022

imatiach-msft commented Mar 15, 2022

smrnvdhy commented Mar 16, 2022

imatiach-msft commented Mar 16, 2022 •

edited

Loading

imatiach-msft commented Mar 22, 2022

Inconsistency between docstring and implementation in explanation_dashboard_input.py #1270

Inconsistency between docstring and implementation in explanation_dashboard_input.py #1270

Comments

smrnvdhy commented Mar 15, 2022

imatiach-msft commented Mar 15, 2022

smrnvdhy commented Mar 16, 2022

imatiach-msft commented Mar 16, 2022 • edited Loading

imatiach-msft commented Mar 22, 2022

imatiach-msft commented Mar 16, 2022 •

edited

Loading