Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two "original" submissions are essentially identical #3

Closed
manuel-freire opened this issue May 10, 2023 · 4 comments
Closed

Two "original" submissions are essentially identical #3

manuel-freire opened this issue May 10, 2023 · 4 comments

Comments

@manuel-freire
Copy link

Submissions 13 and 15 for case-07/non-plagiarized are identical save for spacing, author name, file-name, and class-name. Found while testing the detection capabilities of my own plagiarism detector (https://github.com/manuel-freire/ac2). Screenshot from this tool below:

image

@manuel-freire
Copy link
Author

In general, 13 and 15 are very close also in

  • case-02 (swapped 2 lines),
  • case-03 (1 extraneous comment in a line)
  • case-04 (same extraneous comment in a line)
  • case-06 (identical changes as in 03 & 04 - see screenshot below)

image

According to your paper, honesty of original answers is assumed. However, I would remove submissions from 15 from the dataset (removing 15 has the advantage of not leaving any numerical gaps), because they are obviously not the result of independent development. Nobody types a byte-wise perfect copy with just 1 changed line by accident. Especially not 5 times, and always coinciding with the same colleague's answers.

@oscarkarnalim
Copy link
Owner

Hi Manuel,

Thank you for noting this out! We ensured that student 13 and 15 wrote the solutions independently and thus consider this case as coincidental similarity. Two close students can write the same code for simple tasks if their minds are alike.

Nevertheless, you can exclude one of those two from your analysis if you think that is not the case. In research, we are open with many perspectives and arguments so long as they are backed up with justifications :)

@manuel-freire
Copy link
Author

I was very happy to find your open academic plagiarism dataset (as you know, they are not common). Many plagiarism-detection tools are focused on reporting overall similarity, which as you note can be misleading; while mine is based on visualizing those similarities in context. In the context of other submissions in the dataset, the 13-15 pair really stands out. In many years as a CS teacher, I have found this to be very reliable indication of non-originality. This is a histogram of similarities (0 = identical, 1 = totally dissimilar) in case-07 (it is similar in the 4 other instances noted above). Note the large gap between the 13-15 similarity and all others; more on AC's histograms and visualizations here (somewhat outdated).

image

Feel free to close the issue, but were those two my students, I would have a very hard time believing them.

@oscarkarnalim
Copy link
Owner

Hi Manuel,

Thank you for the explanation. Although we ensured they were not copying one another, I acknowledge that there is still a possibility that these students copied one another. I will put a note on the readme file about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants