14 Feb 03:57

echen102

254275a

Release v2.86

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 02/11/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Chen E, Lerman K, Ferrara E
Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
JMIR Public Health Surveillance 2020;6(2):e19273
DOI: 10.2196/19273
PMID: 32427106

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.86)

Number of Tweets : 2,337,034,553

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,488,684,033	63.7%
Spanish	es	273,379,641	11.7%
Portuguese	pt	95,006,612	4.07%
French	fr	81,220,479	3.48%
Undefined	und	69,293,682	2.97%
Indonesian	in	68,739,707	2.94%
German	de	48,558,554	2.08%
Thai	th	33,440,747	1.43%
Japanese	ja	31,313,014	1.34%
Italian	it	25,806,176	1.1%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

12 Feb 00:28

echen102

v2.85

4f7b3e1

Release v2.85

This release contains Tweet IDs collected from 1/21/20 - 02/04/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.85)

Number of Tweets : 2,318,609,222

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,476,995,754	63.7%
Spanish	es	271,868,132	11.73%
Portuguese	pt	94,461,488	4.07%
French	fr	80,350,906	3.47%
Undefined	und	68,746,369	2.96%
Indonesian	in	68,209,440	2.94%
German	de	47,914,328	2.07%
Thai	th	33,290,413	1.44%
Japanese	ja	30,988,812	1.34%
Italian	it	25,531,113	1.1%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

01 Feb 21:50

echen102

v2.84

5e16408

Release v2.84

This release contains Tweet IDs collected from 1/21/20 - 01/28/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.84)

Number of Tweets : 2,298,001,975

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,463,784,358	63.7%
Spanish	es	269,988,767	11.75%
Portuguese	pt	93,677,010	4.08%
French	fr	79,290,257	3.45%
Undefined	und	68,103,522	2.96%
Indonesian	in	67,805,415	2.95%
German	de	47,189,414	2.05%
Thai	th	33,074,693	1.44%
Japanese	ja	30,670,492	1.33%
Italian	it	25,257,379	1.1%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

24 Jan 22:45

echen102

v2.83

ca90773

Release v2.83

This release contains Tweet IDs collected from 1/21/20 - 01/21/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.83)

Number of Tweets : 2,274,495,340

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,448,937,464	63.7%
Spanish	es	267,616,396	11.77%
Portuguese	pt	92,674,329	4.07%
French	fr	77,933,247	3.43%
Indonesian	in	67,478,087	2.97%
Undefined	und	67,412,827	2.96%
German	de	46,409,477	2.04%
Thai	th	32,909,554	1.45%
Japanese	ja	30,265,418	1.33%
Italian	it	24,941,983	1.1%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

20 Jan 01:55

echen102

v2.82

b2c4ce3

Release v2.82

This release contains Tweet IDs collected from 1/21/20 - 01/14/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.82)

Number of Tweets : 2,249,024,368

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,433,099,880	63.72%
Spanish	es	264,621,956	11.77%
Portuguese	pt	91,562,086	4.07%
French	fr	76,506,268	3.4%
Indonesian	in	67,212,573	2.99%
Undefined	und	66,643,040	2.96%
German	de	45,669,312	2.03%
Thai	th	32,768,124	1.46%
Japanese	ja	29,898,454	1.33%
Italian	it	24,549,747	1.09%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

11 Jan 10:23

echen102

v2.81

de83fb9

Release v2.81

This release contains Tweet IDs collected from 1/21/20 - 01/07/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.81)

Number of Tweets : 2,220,262,501

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,414,872,792	63.73%
Spanish	es	261,048,589	11.76%
Portuguese	pt	90,375,467	4.07%
French	fr	74,958,733	3.38%
Indonesian	in	66,989,008	3.02%
Undefined	und	65,792,780	2.96%
German	de	44,988,982	2.03%
Thai	th	32,471,291	1.46%
Japanese	ja	29,584,545	1.33%
Italian	it	24,133,694	1.09%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

05 Jan 03:19

echen102

v2.80

056a2c8

Release v2.80

This release contains Tweet IDs collected from 1/21/20 - 01/02/22.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.80)

Number of Tweets : 2,199,480,456

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,402,175,455	63.75%
Spanish	es	258,616,718	11.76%
Portuguese	pt	89,404,493	4.06%
French	fr	73,609,067	3.35%
Indonesian	in	66,773,807	3.04%
Undefined	und	65,221,555	2.97%
German	de	44,512,962	2.02%
Thai	th	32,013,748	1.46%
Japanese	ja	29,403,936	1.34%
Italian	it	23,861,808	1.08%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

27 Dec 13:07

echen102

v2.79

597dcd0

Release v2.79

This release contains Tweet IDs collected from 1/21/20 - 12/25/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.79)

Number of Tweets : 2,169,119,102

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,382,003,379	63.71%
Spanish	es	255,498,974	11.78%
Portuguese	pt	88,690,950	4.09%
French	fr	71,586,712	3.3%
Indonesian	in	66,437,264	3.06%
Undefined	und	64,375,522	2.97%
German	de	43,875,662	2.02%
Thai	th	31,624,912	1.46%
Japanese	ja	29,193,402	1.35%
Italian	it	23,391,776	1.08%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

21 Dec 04:31

echen102

v2.78

6443bd9

Release v2.78

This release contains Tweet IDs collected from 1/21/20 - 12/17/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.78)

Number of Tweets : 2,138,323,554

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,361,303,634	63.66%
Spanish	es	252,730,072	11.82%
Portuguese	pt	88,014,234	4.12%
French	fr	70,047,868	3.28%
Indonesian	in	65,950,419	3.08%
Undefined	und	63,445,020	2.97%
German	de	43,062,373	2.01%
Thai	th	30,838,246	1.44%
Japanese	ja	28,951,690	1.35%
Italian	it	23,040,630	1.08%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

14 Dec 01:25

echen102

v2.77

4416923

Release v2.77

This release contains Tweet IDs collected from 1/21/20 - 12/11/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.77)

Number of Tweets : 2,117,640,323

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,346,671,958	63.59%
Spanish	es	251,209,175	11.86%
Portuguese	pt	87,591,811	4.14%
French	fr	69,184,327	3.27%
Indonesian	in	65,671,705	3.1%
Undefined	und	62,815,762	2.97%
German	de	42,407,168	2.0%
Thai	th	30,562,585	1.44%
Japanese	ja	28,774,420	1.36%
Italian	it	22,807,287	1.08%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

Releases: echen102/COVID-19-TweetIDs

Release v2.86

Data Usage Agreement / How to Cite

Statistics Summary (v2.86)

Known Gaps

Inquiries

Release v2.85

Data Usage Agreement / How to Cite

Statistics Summary (v2.85)

Known Gaps

Inquiries

Release v2.84

Data Usage Agreement / How to Cite

Statistics Summary (v2.84)

Known Gaps

Inquiries

Release v2.83

Data Usage Agreement / How to Cite

Statistics Summary (v2.83)

Known Gaps

Inquiries

Release v2.82

Data Usage Agreement / How to Cite

Statistics Summary (v2.82)

Known Gaps

Inquiries

Release v2.81

Data Usage Agreement / How to Cite

Statistics Summary (v2.81)

Known Gaps

Inquiries

Release v2.80

Data Usage Agreement / How to Cite

Statistics Summary (v2.80)

Known Gaps

Inquiries

Release v2.79

Data Usage Agreement / How to Cite

Statistics Summary (v2.79)

Known Gaps

Inquiries

Release v2.78

Data Usage Agreement / How to Cite

Statistics Summary (v2.78)

Known Gaps

Inquiries

Release v2.77

Data Usage Agreement / How to Cite

Statistics Summary (v2.77)

Known Gaps

Inquiries