27 Sep 08:56

echen102

62ea340

Release v2.66

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 9/25/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Chen E, Lerman K, Ferrara E
Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
JMIR Public Health Surveillance 2020;6(2):e19273
DOI: 10.2196/19273
PMID: 32427106

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.66)

Number of Tweets : 1,935,461,654

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,229,797,818	63.54%
Spanish	es	235,389,409	12.16%
Portuguese	pt	82,055,923	4.24%
Indonesian	in	62,900,576	3.25%
French	fr	60,793,256	3.14%
Undefined	und	56,879,037	2.94%
German	de	34,556,126	1.79%
Thai	th	28,050,015	1.45%
Japanese	ja	26,231,565	1.36%
Italian	it	20,013,955	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

20 Sep 06:19

echen102

v2.65

e973462

Release v2.65

This release contains Tweet IDs collected from 1/21/20 - 9/17/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.65)

Number of Tweets : 1,914,366,526

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,215,653,703	63.5%
Spanish	es	233,530,872	12.2%
Portuguese	pt	81,021,521	4.23%
Indonesian	in	62,518,811	3.27%
French	fr	60,136,898	3.14%
Undefined	und	56,212,980	2.94%
German	de	34,095,122	1.78%
Thai	th	27,811,052	1.45%
Japanese	ja	25,896,253	1.35%
Italian	it	19,769,849	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

13 Sep 10:37

echen102

v2.64

9f89bfd

Release v2.64

This release contains Tweet IDs collected from 1/21/20 - 9/10/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.64)

Number of Tweets : 1,894,238,969

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,201,698,992	63.44%
Spanish	es	231,932,975	12.24%
Portuguese	pt	80,462,952	4.25%
Indonesian	in	62,027,976	3.27%
French	fr	59,487,845	3.14%
Undefined	und	55,546,141	2.93%
German	de	33,696,996	1.78%
Thai	th	27,602,331	1.46%
Japanese	ja	25,529,999	1.35%
Italian	it	19,532,350	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

07 Sep 02:06

echen102

v2.63

75d80eb

Release v2.63

This release contains Tweet IDs collected from 1/21/20 - 9/03/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.63)

Number of Tweets : 1,873,386,736

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,187,304,119	63.38%
Spanish	es	230,112,594	12.28%
Portuguese	pt	80,114,319	4.28%
Indonesian	in	61,598,475	3.29%
French	fr	58,711,326	3.13%
Undefined	und	54,853,149	2.93%
German	de	33,237,111	1.77%
Thai	th	27,403,338	1.46%
Japanese	ja	25,130,739	1.34%
Italian	it	19,276,706	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

30 Aug 10:48

echen102

v2.62

ab82092

Release v2.62

This release contains Tweet IDs collected from 1/21/20 - 8/27/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.62)

Number of Tweets : 1,851,213,185

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,172,546,804	63.34%
Spanish	es	228,313,132	12.33%
Portuguese	pt	79,559,336	4.3%
Indonesian	in	60,991,610	3.29%
French	fr	57,818,458	3.12%
Undefined	und	54,152,050	2.93%
German	de	32,760,460	1.77%
Thai	th	27,034,124	1.46%
Japanese	ja	24,601,430	1.33%
Italian	it	19,027,965	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

24 Aug 01:14

echen102

v2.61

4ab56c7

Release v2.61

This release contains Tweet IDs collected from 1/21/20 - 8/20/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.61)

Number of Tweets : 1,827,636,629

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,156,672,793	63.29%
Spanish	es	226,543,006	12.4%
Portuguese	pt	78,933,436	4.32%
Indonesian	in	60,299,364	3.3%
French	fr	56,843,328	3.11%
Undefined	und	53,391,578	2.92%
German	de	32,298,372	1.77%
Thai	th	26,523,789	1.45%
Japanese	ja	24,002,709	1.31%
Italian	it	18,785,256	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

17 Aug 00:26

echen102

v2.60

b3853b7

Release v2.60

This release contains Tweet IDs collected from 1/21/20 - 8/13/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.60)

Number of Tweets : 1,804,849,738

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,141,811,458	63.26%
Spanish	es	224,449,439	12.44%
Portuguese	pt	78,191,172	4.33%
Indonesian	in	59,375,752	3.29%
French	fr	55,949,478	3.1%
Undefined	und	52,682,527	2.92%
German	de	31,913,310	1.77%
Thai	th	26,164,607	1.45%
Japanese	ja	23,397,567	1.3%
Italian	it	18,568,926	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

10 Aug 08:14

echen102

v2.59

1858af0

Release v2.59

This release contains Tweet IDs collected from 1/21/20 - 8/06/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.59)

Number of Tweets : 1,778,540,842

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,124,617,850	63.23%
Spanish	es	222,128,766	12.49%
Portuguese	pt	77,502,077	4.36%
Indonesian	in	57,858,778	3.25%
French	fr	54,877,300	3.09%
Undefined	und	51,879,620	2.92%
German	de	31,463,626	1.77%
Thai	th	25,689,631	1.44%
Japanese	ja	22,794,876	1.28%
Italian	it	18,361,175	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

03 Aug 09:34

echen102

v2.58

b825d4e

Release v2.58

This release contains Tweet IDs collected from 1/21/20 - 7/30/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.58)

Number of Tweets : 1,751,332,815

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,107,430,343	63.23%
Spanish	es	219,874,566	12.55%
Portuguese	pt	76,893,807	4.39%
Indonesian	in	56,364,329	3.22%
French	fr	53,647,348	3.06%
Undefined	und	50,998,585	2.91%
German	de	31,037,488	1.77%
Thai	th	24,606,326	1.41%
Japanese	ja	21,988,760	1.26%
Italian	it	18,147,908	1.04%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

26 Jul 09:54

echen102

v2.57

ce88396

Release v2.57

This release contains Tweet IDs collected from 1/21/20 - 7/23/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.57)

Number of Tweets : 1,724,887,123

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	1,090,993,772	63.25%
Spanish	es	217,446,068	12.61%
Portuguese	pt	76,417,549	4.43%
Indonesian	in	54,749,221	3.17%
French	fr	52,430,570	3.04%
Undefined	und	50,144,493	2.91%
German	de	30,599,613	1.77%
Thai	th	23,587,217	1.37%
Japanese	ja	21,425,852	1.24%
Italian	it	17,875,073	1.04%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

Releases: echen102/COVID-19-TweetIDs

Release v2.66

Data Usage Agreement / How to Cite

Statistics Summary (v2.66)

Known Gaps

Inquiries

Release v2.65

Data Usage Agreement / How to Cite

Statistics Summary (v2.65)

Known Gaps

Inquiries

Release v2.64

Data Usage Agreement / How to Cite

Statistics Summary (v2.64)

Known Gaps

Inquiries

Release v2.63

Data Usage Agreement / How to Cite

Statistics Summary (v2.63)

Known Gaps

Inquiries

Release v2.62

Data Usage Agreement / How to Cite

Statistics Summary (v2.62)

Known Gaps

Inquiries

Release v2.61

Data Usage Agreement / How to Cite

Statistics Summary (v2.61)

Known Gaps

Inquiries

Release v2.60

Data Usage Agreement / How to Cite

Statistics Summary (v2.60)

Known Gaps

Inquiries

Release v2.59

Data Usage Agreement / How to Cite

Statistics Summary (v2.59)

Known Gaps

Inquiries

Release v2.58

Data Usage Agreement / How to Cite

Statistics Summary (v2.58)

Known Gaps

Inquiries

Release v2.57

Data Usage Agreement / How to Cite

Statistics Summary (v2.57)

Known Gaps

Inquiries