10 May 09:02

echen102

b96714d

Release v2.46

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020. To comply with Twitter’s Terms of Service, we are only publicly releasing the Tweet IDs of the collected Tweets. The data is released for non-commercial research use.

This release contains Tweet IDs collected from 1/21/20 - 5/07/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0). By using this dataset, you agree to abide by the stipulations in the license, remain in compliance with Twitter’s Terms of Service, and cite the following manuscript:

Chen E, Lerman K, Ferrara E
Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set
JMIR Public Health Surveillance 2020;6(2):e19273
DOI: 10.2196/19273
PMID: 32427106

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.46)

Number of Tweets : 1,471,398,830

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	944,612,909	64.2%
Spanish	es	190,216,727	12.93%
Portuguese	pt	64,028,667	4.35%
French	fr	44,746,214	3.04%
Undefined	und	42,006,372	2.85%
Indonesian	in	36,474,544	2.48%
German	de	26,472,204	1.8%
Japanese	ja	17,297,706	1.18%
Italian	it	15,882,067	1.08%
Thai	th	15,587,319	1.06%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

03 May 21:08

echen102

v2.45

9b4bd7d

Release v2.45

This release contains Tweet IDs collected from 1/21/20 - 4/30/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.45)

Number of Tweets : 1,443,871,621

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	928,225,493	64.29%
Spanish	es	186,880,167	12.94%
Portuguese	pt	62,398,113	4.32%
French	fr	44,097,563	3.05%
Undefined	und	41,140,188	2.85%
Indonesian	in	35,683,876	2.47%
German	de	25,970,256	1.8%
Japanese	ja	16,865,989	1.17%
Italian	it	15,697,293	1.09%
Turkish	tr	14,931,506	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

25 Apr 09:00

echen102

v2.44

3c2a80e

Release v2.44

This release contains Tweet IDs collected from 1/21/20 - 4/23/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.44)

Number of Tweets : 1,414,688,248

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	910,623,293	64.37%
Spanish	es	183,218,600	12.95%
Portuguese	pt	61,323,722	4.33%
French	fr	43,331,836	3.06%
Undefined	und	40,246,654	2.84%
Indonesian	in	35,124,918	2.48%
German	de	25,422,091	1.8%
Japanese	ja	16,464,737	1.16%
Italian	it	15,482,583	1.09%
Turkish	tr	14,628,218	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

20 Apr 03:52

echen102

v2.43

22ea5d2

Release v2.43

This release contains Tweet IDs collected from 1/21/20 - 4/16/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.43)

Number of Tweets : 1,386,739,774

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	894,590,447	64.51%
Spanish	es	179,004,778	12.91%
Portuguese	pt	60,224,354	4.34%
French	fr	42,580,344	3.07%
Undefined	und	39,377,778	2.84%
Indonesian	in	34,598,688	2.49%
German	de	24,732,799	1.78%
Japanese	ja	16,160,779	1.17%
Italian	it	15,244,729	1.1%
Turkish	tr	14,292,400	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

12 Apr 02:36

echen102

v2.42

62aba04

Release v2.42

This release contains Tweet IDs collected from 1/21/20 - 4/09/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.42)

Number of Tweets : 1,359,591,254

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total tweets
English	en	880,753,098	64.78%
Spanish	es	174,102,000	12.81%
Portuguese	pt	58,372,922	4.29%
French	fr	41,588,189	3.06%
Undefined	und	38,495,608	2.83%
Indonesian	in	34,129,886	2.51%
German	de	23,860,891	1.76%
Japanese	ja	15,868,441	1.17%
Italian	it	14,966,016	1.1%
Turkish	tr	13,980,342	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

05 Apr 12:12

echen102

v2.41

4cfcbfd

Release v2.41

This release contains Tweet IDs collected from 1/21/20 - 4/02/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.41)

Number of Tweets : 1,333,369,189

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	867,854,670	65.09%
Spanish	es	168,263,876	12.62%
Portuguese	pt	56,402,776	4.23%
French	fr	40,737,413	3.06%
Undefined	und	37,743,919	2.83%
Indonesian	in	33,774,499	2.53%
German	de	23,004,891	1.73%
Japanese	ja	15,621,384	1.17%
Italian	it	14,690,727	1.1%
Turkish	tr	13,667,429	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

30 Mar 03:34

echen102

v2.40

2d14136

Release v2.40

This release contains Tweet IDs collected from 1/21/20 - 3/26/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.40)

Number of Tweets : 1,308,519,411

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	855,574,399	65.38%
Spanish	es	162,884,466	12.45%
Portuguese	pt	54,469,049	4.16%
French	fr	39,402,060	3.01%
Undefined	und	37,005,463	2.83%
Indonesian	in	33,470,927	2.56%
German	de	22,167,543	1.69%
Japanese	ja	15,410,569	1.18%
Italian	it	14,345,605	1.1%
Turkish	tr	13,396,516	1.02%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

22 Mar 20:37

echen102

v2.39

13b0acd

Release v2.39

This release contains Tweet IDs collected from 1/21/20 - 3/19/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.39)

Number of Tweets : 1,282,780,680

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	842,745,290	65.7%
Spanish	es	158,218,582	12.33%
Portuguese	pt	51,482,588	4.01%
French	fr	38,253,997	2.98%
Undefined	und	36,268,761	2.83%
Indonesian	in	33,145,083	2.58%
German	de	21,185,219	1.65%
Japanese	ja	15,189,910	1.18%
Italian	it	14,016,989	1.09%
Turkish	tr	13,171,045	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

15 Mar 08:34

echen102

v2.38

4bd700e

Release v2.38

This release contains Tweet IDs collected from 1/21/20 - 3/12/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement / How to Cite

BibTeX:

@article{chen2020tracking,
  title={Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set},
  author={Chen, Emily and Lerman, Kristina and Ferrara, Emilio},
  journal={JMIR Public Health and Surveillance},
  volume={6},
  number={2},
  pages={e19273},
  year={2020},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

Statistics Summary (v2.38)

Number of Tweets : 1,258,560,216

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	830,077,559	65.95%
Spanish	es	154,673,839	12.29%
Portuguese	pt	48,537,828	3.86%
French	fr	37,196,393	2.96%
Undefined	und	35,528,197	2.82%
Indonesian	in	32,743,567	2.6%
German	de	20,421,380	1.62%
Japanese	ja	14,968,129	1.19%
Italian	it	13,610,339	1.08%
Turkish	tr	12,972,715	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

09 Mar 04:40

echen102

v2.37

f795ac2

Release v2.37

This release contains Tweet IDs collected from 1/21/20 - 3/05/21.

Please refer to the README for more details regarding data, data organization and data usage agreement.

Data Usage Agreement

Statistics Summary (v2.37)

Number of Tweets : 1,235,254,351

Language breakdown of top 10 most prevalent languages :

Language	ISO	No. tweets	% total Tweets
English	en	815,941,794	66.05%
Spanish	es	151,726,357	12.28%
Portuguese	pt	46,535,632	3.77%
French	fr	36,428,149	2.95%
Undefined	und	34,895,731	2.82%
Indonesian	in	32,368,060	2.62%
German	de	19,805,917	1.6%
Japanese	ja	14,705,198	1.19%
Italian	it	13,200,110	1.07%
Turkish	tr	12,762,669	1.03%

Known Gaps

Date	Time
2/1/2020	4:00 - 9:00 UTC
2/8/2020	6:00 - 7:00 UTC
2/22/2020	21:00 - 24:00 UTC
2/23/2020	0:00 - 24:00 UTC
2/24/2020	0:00 - 4:00 UTC
2/25/2020	0:00 - 3:00 UTC
3/2/2020	Intermittent Internet Connectivity Issues
5/14/2020	7:00 - 8:00 UTC

Inquiries

If you have technical questions about the data collection, please contact Emily Chen at echen920[at]usc[dot]edu.

If you have any further questions about this dataset please contact Dr. Emilio Ferrara at emiliofe[at]usc[dot]edu.

Assets 2

Releases: echen102/COVID-19-TweetIDs

Release v2.46

Data Usage Agreement / How to Cite

Statistics Summary (v2.46)

Known Gaps

Inquiries

Release v2.45

Data Usage Agreement / How to Cite

Statistics Summary (v2.45)

Known Gaps

Inquiries

Release v2.44

Data Usage Agreement / How to Cite

Statistics Summary (v2.44)

Known Gaps

Inquiries

Release v2.43

Data Usage Agreement / How to Cite

Statistics Summary (v2.43)

Known Gaps

Inquiries

Release v2.42

Data Usage Agreement / How to Cite

Statistics Summary (v2.42)

Known Gaps

Inquiries

Release v2.41

Data Usage Agreement / How to Cite

Statistics Summary (v2.41)

Known Gaps

Inquiries

Release v2.40

Data Usage Agreement / How to Cite

Statistics Summary (v2.40)

Known Gaps

Inquiries

Release v2.39

Data Usage Agreement / How to Cite

Statistics Summary (v2.39)

Known Gaps

Inquiries

Release v2.38

Data Usage Agreement / How to Cite

Statistics Summary (v2.38)

Known Gaps

Inquiries

Release v2.37

Data Usage Agreement

Statistics Summary (v2.37)

Known Gaps

Inquiries