What causes visualization to be blank? #22

ifshine · 2023-09-18T09:18:58Z

Thanks for your excellent work! Question in title. Here is my result.

HYLcool · 2023-09-18T09:30:38Z

Thanks for your question!

If you conduct the diversity analysis on the default demo dataset, you might need to use a smaller count threshold (e.g. 1 or 2) on the upper right corner because there are only few samples in the demo dataset.

We will set a relatively smaller default value for the count threshold to avoid this misleading result~

Please help to handle this question, thanks~ @zhijianma

ifshine · 2023-09-18T10:00:47Z

Thank you for your quick response!

I use my own data(221 examples) and a smaller count threshold(3) really solves the problem.

However, for the default demo dataset, a smaller count threshold(1) brings no change.

I still do not understand the reason. Can you explain it?

HYLcool · 2023-09-18T11:48:34Z

Umm... That's weird. I tried several times and got the same result as below. Maybe you can restart the app and try again.

The diversity analysis is mostly for post-tuning data. It analyzes the verb-noun diversity of given samples and clusters them by the verb-noun categories. Only those clusters with a number of samples ≥ count threshold will be kept in the final analysis result. Thus count threshold is set to 1 means all samples that contain any verb-noun phrases will be kept.

BTW, the default value is set to 0 in the latest PR. You can also try to pull the latest code and try again~

ifshine · 2023-09-18T13:52:48Z

😃Thank you for your explanation. I have learned a lot from it.

zhijianma · 2023-09-18T15:26:13Z

You can download the diversity analysis result of your dataset with CSV format to see more details.

ifshine · 2023-09-19T01:00:13Z

After I pull the latest code, an error occurs(both for the default demo and my own data).

ifshine · 2023-09-19T01:51:38Z

I redo the following commands:

git clone -n https://github.com/alibaba/data-juicer
cd data-juicer/
git checkout e221d06

The visualization of diversity can run correctly(for both the dafault demo dataset and my own data).

HYLcool · 2023-09-19T03:34:25Z

Sorry for this problem. That's because there are some conflicts between wget and streamlit, which is brought in the last few PRs 😅

We have restored these modifications in the latest PR #24 and merged it into the main branch. You can pull and try again now. Thanks for your report!

ifshine · 2023-09-19T05:26:49Z

It works now lol!

HYLcool assigned HYLcool and zhijianma and unassigned HYLcool Sep 18, 2023

HYLcool mentioned this issue Sep 18, 2023

Feature/preparation for wheel & Docker #23

Merged

ifshine closed this as completed Sep 19, 2023

HYLcool added the bug Something isn't working label Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What causes visualization to be blank? #22

What causes visualization to be blank? #22

ifshine commented Sep 18, 2023 •

edited

Loading

HYLcool commented Sep 18, 2023

ifshine commented Sep 18, 2023

HYLcool commented Sep 18, 2023

ifshine commented Sep 18, 2023

zhijianma commented Sep 18, 2023 •

edited

Loading

ifshine commented Sep 19, 2023

ifshine commented Sep 19, 2023

HYLcool commented Sep 19, 2023

ifshine commented Sep 19, 2023

What causes visualization to be blank? #22

What causes visualization to be blank? #22

Comments

ifshine commented Sep 18, 2023 • edited Loading

HYLcool commented Sep 18, 2023

ifshine commented Sep 18, 2023

HYLcool commented Sep 18, 2023

ifshine commented Sep 18, 2023

zhijianma commented Sep 18, 2023 • edited Loading

ifshine commented Sep 19, 2023

ifshine commented Sep 19, 2023

HYLcool commented Sep 19, 2023

ifshine commented Sep 19, 2023

ifshine commented Sep 18, 2023 •

edited

Loading

zhijianma commented Sep 18, 2023 •

edited

Loading