-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiple users reporting duplicate images while signed in #1069
Comments
Update: I've just checked the metadata in the subject download from the project and I can't find any duplicates in the image name, so I don't think this is a duplicate-subject issue. |
Any talk links that indicate they have seen duplicate images? |
I don't think so, yet... the mods posted on Talk but didn't link to specific images. Will ask them to do so in the future. |
@vrooje closing for now, please re-open if you have more info. |
https://www.zooniverse.org/#/projects/vrooje/galaxy-zoo-bar-lengths/talk/21/280?page=1&comment=1515 From @Capella05:
|
Also found the subject comment thread, which I suspect is redundant but am including it just in case. |
I've now got more information about this: Strangely I don't have a duplicate in the classification export for Capella05 on subject id 465197, despite the fact that my export was requested more than 24 hours after the reported duplicate. I do have a different duplicate for that user, however:
And I have a total of 159 duplicates from various users. That's about 1% of the classifications. Examples (with dummy usernames):
thisuser has done 30 classifications total. And, more recently, from a user who has done 18 classifications total:
I also have duplicates from users who have done many more classifications overall, though not proportionally more duplicates. And not everyone who has done hundreds of classifications has done duplicate classifications. |
More details:
I've isolated the duplicates from the full export and grouped them together and I'm happy to send that over if the full data would be helpful. |
Non-logged in, is a won't fix issue - see the conversation here #1127 - that's how the system was designed The 60s thing on my ticket #1128 isn't important, there are classifications made in between. Merging from the info in my ticket since Yeah got a very recent one subject 487143 dweilant 2015-07-12 23:43:49 and 2015-07-12 23:45:07 - this is after is live for correctly accounting for the subjects - now subject has seen almost 50% of the live subjects if that matters but that still leaves about 4000 subjects to select from user who's on the other spectrum of classifiers from the above in this post Audriusa for subject 484145 2015-07-12 07:24:10 and 2015-07-12 07:25:32 notice same thing for nathalieg69 on subject 484066 2015-07-01 03:55:37 and 2015-07-01 04:05:37 Worth noting this is still occurring after the fix for updating subject classification counts/retirement |
Thanks @camallen |
Linked to #1176 Update, we've had no luck tracing this to any direct implemented behaviour (in code) but it could be the result of message ordering between clients and busy / fast API end points where subjects are not dequeued before the next request for subjects comes in. We will be modifying the dequeue behaviour to happen directly after you've been served subjects and not wait till you submit a classification. This should make any race condition harder to come by as the timing between these "race" messages will be much greater. I'm going to leave this open. If any more duplicate reports come in (especially after #1176 is fixed) then please make us aware on this issue. |
Thanks for the update, and thanks for working on tracking it down. @camallen If you can tell me a time stamp when this fix should be online, I'll check it over the next few days after that and report back on P4: Terrains |
Thanks - will keep you posted if this keeps happening. If all we can do is minimize it instead of fix it we should make sure @ggdhines knows to explicitly check for duplicates in the data aggregation phase. |
@vrooje pretty sure he already is as we always serve subjects to users in panoptes (we mark them seen / retired, etc) but allow them to keep classifying. |
@camallen though that status is not marked in the raw csv dump - if duplicates and retired images seen again can be marked that would be something handy to include in the csv for those doing their data reduction |
@mschwamb can you open a sperate issue for this? Perhaps reopen #1086 after reading zooniverse/Panoptes-Front-End#368 |
@camallen I can't reopen. I'm not a collaborator on this repo. If the powers that be can add me I can start do that or if you want to reopen it I'll comment |
Just comment and @ mention me |
@camallen Still happening I think - I'm just looking at duplicates from 2015-07-30 00:00:00' on P4T username Uganalandia view subject 491799 at 2015-07-30 12:11:50 and 2015-07-30 12:14:46 gaga7 saw subject 491810 at 2015-07-30 15:41:52 and 2015-07-30 16:08:10 4thplanet4444 saw subject 486319 at 2015-07-31 07:39:39 and 2015-07-31 08:25:08 |
I'm going to close this issue as I think we've got this sorted. In summary our queuing code had some bugs and some legacy use case was causing the queues to grow very very large, which allowed dups in. If anyone finds duplicate classifications happening since the latest date quoted above please comment here / re-open the issue asap. |
+1
|
Sadly, I have to re-open this. There were a handful of duplicates in GZ:BL between the closing of this issue and the sending of a newsletter to recruit people to GZ:BL, but since the recruitment there have been about 1,150, out of 36,000, so a duplicate percentage of about 3%. 800 of the duplicates were from not-logged-in users. I looked into this because another top user reported duplicates, but (as with Capella05's report previously) I couldn't find any recent duplicates in the database from this user. Happy to provide more info as needed... |
@vrooje 800 from non-logged in is by design in the api, i've got an issue open in the front end to be smarter about ignoring these, zooniverse/Panoptes-Front-End#1427. So the final 350 i'll need some extra information. Are these all from power users in the long tail? Since commit 8693854 we may get to a situation where we have nothing in the queue and just select something to show the user, they should have seen a banner saying they've seen this before though....seems there are still some bugs to iron out on this one. |
There were two more uses on talk that have reported duplicate images: https://www.zooniverse.org/talk/18/115/?comment=19991 |
Also been noticing this problem when annotating any subject set with more than 10 members, while logged in. |
@bruggsy, do you have reports / talk subject id's, screenshots? Can you please confirm that these are real duplicates and not the expected behaviour, that is the api will return a set of subjects when you have classified all of them. The client should mark the images as retired / seen before in the browser, e.g. |
Still getting occasional comments from users about duplicates. I think this is due to a failure to submit the classification the first time rather than a duplicate registered classification, but this still makes me uncomfortable because we don't really know how often this is happening, right? |
Oh, so this isn't just a WildCam issue? zooniverse/wildcam-gorongosa#192 |
@camallen Sorry for not responding, been busy with other projects. Not sure what you mean exactly, when I do an API request for a subject set / single subject I don't see those fields. The subjects I was having problems with were 1037498 - 1037697, and subject sets 2422 - 2440. The exact problem was getting "already seen" subjects before I had seen the whole set, as far as I can tell I wasn't getting the same image multiple times with no banner the latter times. |
@vrooje i'm looking into the dup reports for GZ Bars. @bruggsy the system should be showing you unseen un retired images first, then falls back to unseen retired, then once you have completed them all it'll just pick some at random to show to you. Can you please check again and provide me some workflow / subject set id's to reproduce for your account? |
@camallen Sure, the workflow is 861 and the subject set associated with it is 2592, project is 292. |
I setup a project today for Intro2Astro using a very small subject set with 22 subjects and @JulieAnnKU reported this issue. She was served a subject with the already seen flag before having seen all 22. Project id |
I've got another lead on real duplicates for users. Seems some of the background workers haven't been running and creating the req'd tracking details for what they've seen. Linked to #1517, ensuring that these workers run as soon as possible after failure decreases the window for duplicates to emerge. |
Happening on Comet Hunters too - https://www.zooniverse.org/projects/mschwamb/comet-hunters/talk/84/25164?comment=51262 |
@camallen this seems to still be popping up on Wildcam Gorongosa - had a whole bunch reported mid/late December, and at least one person reported this week: https://www.zooniverse.org/projects/zooniverse/wildcam-gorongosa/talk/79/7125?page=3 |
thanks, there was a bug showing the proper already seen / retired tags from the API that got fixed in SGL. Also some of the issues in SGL would be leading to duplicates showing under heavy api loads. We've got new code out to fix these and some more to get out that will hopefully alleviate this. |
@camallen just wanted to let you know this is still happening: https://www.zooniverse.org/projects/zooniverse/wildcam-gorongosa/talk/79/7125?comment=74235&page=4 |
Hmm - seems the messaging about already seen / retired was removed at the end of last month. See this, zooniverse/Panoptes-Front-End#2170. I can only hope we aren't messaging correctly. I'm checking this user details to see what's happening. |
This seems to still be happening - example on Planet Four Terrains user_id number 1328372 classified subject 1328372 twice within a ten minute span. This was on February 9th. |
This is still happening. Volunteer is seeing 'already seen' flag on and off, but there's ~6000 new images on Comet Hunters. See the volunteers description here |
closing this in favour of #1640 since we "are" labelling the duplicates as already seen, please put all reports onto the "Already Seens" issue from now ow. |
Our 2 moderators on GZ Bars have both reported seeing duplicate images. I'm going to check the subject list for duplicates and will update but just wanted to flag this in case it's related to #787 or something else that isn't duplicate subjects.
The text was updated successfully, but these errors were encountered: