-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two "original" submissions are essentially identical #3
Comments
In general, 13 and 15 are very close also in
According to your paper, honesty of original answers is assumed. However, I would remove submissions from 15 from the dataset (removing 15 has the advantage of not leaving any numerical gaps), because they are obviously not the result of independent development. Nobody types a byte-wise perfect copy with just 1 changed line by accident. Especially not 5 times, and always coinciding with the same colleague's answers. |
Hi Manuel, Thank you for noting this out! We ensured that student 13 and 15 wrote the solutions independently and thus consider this case as coincidental similarity. Two close students can write the same code for simple tasks if their minds are alike. Nevertheless, you can exclude one of those two from your analysis if you think that is not the case. In research, we are open with many perspectives and arguments so long as they are backed up with justifications :) |
I was very happy to find your open academic plagiarism dataset (as you know, they are not common). Many plagiarism-detection tools are focused on reporting overall similarity, which as you note can be misleading; while mine is based on visualizing those similarities in context. In the context of other submissions in the dataset, the 13-15 pair really stands out. In many years as a CS teacher, I have found this to be very reliable indication of non-originality. This is a histogram of similarities (0 = identical, 1 = totally dissimilar) in case-07 (it is similar in the 4 other instances noted above). Note the large gap between the 13-15 similarity and all others; more on AC's histograms and visualizations here (somewhat outdated). Feel free to close the issue, but were those two my students, I would have a very hard time believing them. |
Hi Manuel, Thank you for the explanation. Although we ensured they were not copying one another, I acknowledge that there is still a possibility that these students copied one another. I will put a note on the readme file about this. |
Submissions
13
and15
forcase-07/non-plagiarized
are identical save for spacing, author name, file-name, and class-name. Found while testing the detection capabilities of my own plagiarism detector (https://github.com/manuel-freire/ac2). Screenshot from this tool below:The text was updated successfully, but these errors were encountered: