Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release of Training Data #3

Open
stabilize-ai opened this issue Jun 29, 2023 · 1 comment
Open

Release of Training Data #3

stabilize-ai opened this issue Jun 29, 2023 · 1 comment

Comments

@stabilize-ai
Copy link

Hi, could you please release the training data too, to enable further research into the model behavior ? Other projects like EleuterAI's pythia project have done that, which has helped get more interest and usage for those models.

@tianxie-9
Copy link

tianxie-9 commented Jun 29, 2023

Sorry that we are not able to release the training data. Most of our training data can be found in https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T, https://pile.eleuther.ai/ and https://huggingface.co/datasets/wikipedia. We used https://github.com/google-research/text-to-text-transfer-transformer#c4 to get more C4 data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants