06 Dec 10:22

echen102

bddd1ae

Release v2.76

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 12/03/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Chen E, Lerman K, Ferrara E
Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
JMIR Public Health Surveillance 2020;6(2):e19273
DOI: 10.2196/19273
PMID: 32427106

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.76)

Number of Tweets : 2,093,224,111

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,330,421,045	63.56%
Spanish	es	249,300,099	11.91%
Portuguese	pt	87,052,605	4.16%
French	fr	67,932,116	3.25%
Indonesian	in	65,410,124	3.12%
Undefined	und	62,054,427	2.96%
German	de	41,428,977	1.98%
Thai	th	30,051,730	1.44%
Japanese	ja	28,489,295	1.36%
Italian	it	22,499,003	1.07%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

30 Nov 03:37

echen102

v2.75

011bd26

Release v2.75

This release contains Tweet IDs collected from 1/21/20 - 11/27/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.75)

Number of Tweets : 2,071,000,563

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,315,960,560	63.54%
Spanish	es	247,500,661	11.95%
Portuguese	pt	86,526,929	4.18%
French	fr	66,941,970	3.23%
Indonesian	in	65,064,902	3.14%
Undefined	und	61,324,482	2.96%
German	de	40,543,545	1.96%
Thai	th	29,449,521	1.42%
Japanese	ja	28,224,198	1.36%
Italian	it	22,209,720	1.07%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

22 Nov 08:49

echen102

v2.74

b104dbe

Release v2.74

This release contains Tweet IDs collected from 1/21/20 - 11/20/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.74)

Number of Tweets : 2,054,212,671

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,306,366,348	63.59%
Spanish	es	245,795,086	11.97%
Portuguese	pt	86,051,098	4.19%
French	fr	65,821,832	3.2%
Indonesian	in	64,873,345	3.16%
Undefined	und	60,749,324	2.96%
German	de	39,395,508	1.92%
Thai	th	29,314,755	1.43%
Japanese	ja	28,011,234	1.36%
Italian	it	21,891,287	1.07%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

16 Nov 09:56

echen102

v2.73

1923513

Release v2.73

This release contains Tweet IDs collected from 1/21/20 - 11/13/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.73)

Number of Tweets : 2,039,036,465

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,297,522,412	63.63%
Spanish	es	244,463,886	11.99%
Portuguese	pt	85,637,948	4.2%
French	fr	64,968,453	3.19%
Indonesian	in	64,661,773	3.17%
Undefined	und	60,264,272	2.96%
German	de	38,145,947	1.87%
Thai	th	29,198,926	1.43%
Japanese	ja	27,798,699	1.36%
Italian	it	21,581,176	1.06%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

08 Nov 10:52

echen102

v2.72

3e3f515

Release v2.72

This release contains Tweet IDs collected from 1/21/20 - 11/05/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.72)

Number of Tweets : 2,022,616,140

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,287,621,418	63.66%
Spanish	es	243,092,674	12.02%
Portuguese	pt	85,215,116	4.21%
Indonesian	in	64,416,015	3.18%
French	fr	64,047,986	3.17%
Undefined	und	59,715,860	2.95%
German	de	37,063,273	1.83%
Thai	th	28,965,691	1.43%
Japanese	ja	27,560,641	1.36%
Italian	it	21,279,090	1.05%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

03 Nov 11:55

echen102

v2.71

daa4544

Release v2.71

This release contains Tweet IDs collected from 1/21/20 - 10/30/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.71)

Number of Tweets : 2,010,791,913

Language breakdown to come.

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

26 Oct 09:07

echen102

v2.70

a1b2983

Release v2.70

This release contains Tweet IDs collected from 1/21/20 - 10/22/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.70)

Number of Tweets : 1,995,620,333

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,270,563,955	63.67%
Spanish	es	240,616,667	12.06%
Portuguese	pt	84,372,958	4.23%
Indonesian	in	63,990,726	3.21%
French	fr	62,917,614	3.15%
Undefined	und	58,806,041	2.95%
German	de	35,952,285	1.8%
Thai	th	28,586,629	1.43%
Japanese	ja	27,197,679	1.36%
Italian	it	20,773,113	1.04%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

18 Oct 21:36

echen102

v2.69

b91c2a7

Release v2.69

This release contains Tweet IDs collected from 1/21/20 - 10/16/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.69)

Number of Tweets : 1,982,052,421

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,261,270,916	63.63%
Spanish	es	239,467,959	12.08%
Portuguese	pt	83,893,107	4.23%
Indonesian	in	63,790,340	3.22%
French	fr	62,461,555	3.15%
Undefined	und	58,396,916	2.95%
German	de	35,611,395	1.8%
Thai	th	28,477,270	1.44%
Japanese	ja	26,998,837	1.36%
Italian	it	20,553,189	1.04%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

11 Oct 07:20

echen102

v2.68

64e3343

Release v2.68

This release contains Tweet IDs collected from 1/21/20 - 10/08/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.68)

Number of Tweets : 1,965,744,001

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,250,162,534	63.6%
Spanish	es	237,982,144	12.11%
Portuguese	pt	83,343,587	4.24%
Indonesian	in	63,470,314	3.23%
French	fr	61,909,151	3.15%
Undefined	und	57,884,590	2.94%
German	de	35,220,638	1.79%
Thai	th	28,375,733	1.44%
Japanese	ja	26,739,925	1.36%
Italian	it	20,350,682	1.04%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

04 Oct 07:34

echen102

v2.67

e452705

Release v2.67

This release contains Tweet IDs collected from 1/21/20 - 10/01/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.67)

Number of Tweets : 1,949,917,968

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,239,637,331	63.57%
Spanish	es	236,595,593	12.13%
Portuguese	pt	82,690,780	4.24%
Indonesian	in	63,142,927	3.24%
French	fr	61,324,095	3.14%
Undefined	und	57,335,774	2.94%
German	de	34,835,676	1.79%
Thai	th	28,184,598	1.45%
Japanese	ja	26,479,368	1.36%
Italian	it	20,184,433	1.04%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

Releases: echen102/COVID-19-TweetIDs

Release v2.76

Data Usage Agreement / How to Cite

Statistics Summary (v2.76)

Known Gaps

Inquiries

Release v2.75

Data Usage Agreement / How to Cite

Statistics Summary (v2.75)

Known Gaps

Inquiries

Release v2.74

Data Usage Agreement / How to Cite

Statistics Summary (v2.74)

Known Gaps

Inquiries

Release v2.73

Data Usage Agreement / How to Cite

Statistics Summary (v2.73)

Known Gaps

Inquiries

Release v2.72

Data Usage Agreement / How to Cite

Statistics Summary (v2.72)

Known Gaps

Inquiries

Release v2.71

Data Usage Agreement / How to Cite

Statistics Summary (v2.71)

Known Gaps

Inquiries

Release v2.70

Data Usage Agreement / How to Cite

Statistics Summary (v2.70)

Known Gaps

Inquiries

Release v2.69

Data Usage Agreement / How to Cite

Statistics Summary (v2.69)

Known Gaps

Inquiries

Release v2.68

Data Usage Agreement / How to Cite

Statistics Summary (v2.68)

Known Gaps

Inquiries

Release v2.67

Data Usage Agreement / How to Cite

Statistics Summary (v2.67)

Known Gaps

Inquiries