Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What causes visualization to be blank? #22

Closed
ifshine opened this issue Sep 18, 2023 · 9 comments
Closed

What causes visualization to be blank? #22

ifshine opened this issue Sep 18, 2023 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@ifshine
Copy link

ifshine commented Sep 18, 2023

Thanks for your excellent work! Question in title. Here is my result.
20230918-171819

@HYLcool HYLcool assigned HYLcool and zhijianma and unassigned HYLcool Sep 18, 2023
@HYLcool
Copy link
Collaborator

HYLcool commented Sep 18, 2023

Thanks for your question!

If you conduct the diversity analysis on the default demo dataset, you might need to use a smaller count threshold (e.g. 1 or 2) on the upper right corner because there are only few samples in the demo dataset.

We will set a relatively smaller default value for the count threshold to avoid this misleading result~

Please help to handle this question, thanks~ @zhijianma

@ifshine
Copy link
Author

ifshine commented Sep 18, 2023

Thank you for your quick response!

I use my own data(221 examples) and a smaller count threshold(3) really solves the problem.

However, for the default demo dataset, a smaller count threshold(1) brings no change.

I still do not understand the reason. Can you explain it?

@HYLcool
Copy link
Collaborator

HYLcool commented Sep 18, 2023

Umm... That's weird. I tried several times and got the same result as below. Maybe you can restart the app and try again.

image

The diversity analysis is mostly for post-tuning data. It analyzes the verb-noun diversity of given samples and clusters them by the verb-noun categories. Only those clusters with a number of samples ≥ count threshold will be kept in the final analysis result. Thus count threshold is set to 1 means all samples that contain any verb-noun phrases will be kept.

BTW, the default value is set to 0 in the latest PR. You can also try to pull the latest code and try again~

@ifshine
Copy link
Author

ifshine commented Sep 18, 2023

😃Thank you for your explanation. I have learned a lot from it.

@zhijianma
Copy link
Collaborator

zhijianma commented Sep 18, 2023

You can download the diversity analysis result of your dataset with CSV format to see more details. image

@ifshine
Copy link
Author

ifshine commented Sep 19, 2023

After I pull the latest code, an error occurs(both for the default demo and my own data).

image

@ifshine
Copy link
Author

ifshine commented Sep 19, 2023

I redo the following commands:

git clone -n https://github.com/alibaba/data-juicer
cd data-juicer/
git checkout e221d06

The visualization of diversity can run correctly(for both the dafault demo dataset and my own data).

@HYLcool
Copy link
Collaborator

HYLcool commented Sep 19, 2023

Sorry for this problem. That's because there are some conflicts between wget and streamlit, which is brought in the last few PRs 😅

We have restored these modifications in the latest PR #24 and merged it into the main branch. You can pull and try again now. Thanks for your report!

@ifshine
Copy link
Author

ifshine commented Sep 19, 2023

It works now lol!

@ifshine ifshine closed this as completed Sep 19, 2023
@HYLcool HYLcool added the bug Something isn't working label Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants