Name		Name	Last commit message	Last commit date
parent directory ..
PR_review_comments_generation.ipynb		PR_review_comments_generation.ipynb
PR_review_comments_stats.ipynb		PR_review_comments_stats.ipynb
README.md		README.md

README.md

GitHub Pull Request Review Comments

Download link.

25.3 million pull request review comments on GitHub since January 2015 till December 2018.

Format

xz-compressed CSV, with columns:

COMMENT_ID - identifier of the comment in mother dataset - GH Archive
COMMIT_ID - commit hash to which the review comment is attached
URL - path to the GitHub pull request the comment comes from
AUTHOR - GitHub user of the author of the comment
CREATED_AT - creation date of the comment
BODY - raw content of the comment

Sample code

Python:

# too big for pandas.read_csv
import codecs
import csv
import lzma

with lzma.open("review_comments.csv.xz") as archf:
    reader = csv.DictReader(codecs.getreader("utf-8")(archf))
    for record in reader:
        print(record)

Origin

The dataset was generated from GH Archive in the following notebook. The comments which exceeded Python's csv.field_size_limit equal to 128KB were discarded (~10 comments).

We gathered some statistics about the dataset.

License

Open Data Commons Open Database License (ODbL)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReviewComments

ReviewComments

README.md

GitHub Pull Request Review Comments

Format

Sample code

Origin

License

Files

ReviewComments

Directory actions

More options

Directory actions

More options

Latest commit

History

ReviewComments

Folders and files

parent directory

README.md

GitHub Pull Request Review Comments

Format

Sample code

Origin

License