Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated locations when submitting via ODK #792

Closed
dpalomino opened this issue Oct 5, 2016 · 21 comments
Closed

Duplicated locations when submitting via ODK #792

dpalomino opened this issue Oct 5, 2016 · 21 comments
Assignees
Milestone

Comments

@dpalomino
Copy link

Steps to reproduce the error

STR are not clear at all. This has happened once when testing the repeat group feature, although we think it is totally unrelated with this.

We were submitting a form with:

  • one party
  • two locations (location_repeat group)
  • one single tenure_type for both locations
  • several select_multiple fields
  • several image resources
  • one AUDIO amr file (which caused an error 400 when submitting) - This seems to be the root cause of the issue but not totally sure about that.

However these locations where actually submitted and indeed appeared duplicated several times.

Interestingly when trying to access to the resources section, it returns a 502 error.

Link to the project:
https://platform-staging.cadasta.org/organizations/david-org-second-org/projects/testing-repeats-multiple-locations-minus-tenure/resources/

Approx timestamp of the issue: 20161005 - 184500 CEST

Don't know if this is a corner case or not if it is something more serious. Has anyone seen something like this?

@dpalomino dpalomino added the bug label Oct 5, 2016
@amplifi
Copy link
Contributor

amplifi commented Oct 5, 2016

502 error is due to same Unicode error as #770; Django cannot handle character encoding and uWsgi terminates.

@dpalomino
Copy link
Author

Thanks @amplifi. The difference with #770 is that I have permissions to see resources in that project.

However, main issue is how those locations have been duplicated. I am not able to reproduce this again, but it'd be nice to know whether this is a corner case or not, or if we should be worried about this.

I tried to reproduce this again many times but I didn't see it. Maybe if you can take a quick look at the logs at that timestamp (20161005 - 184500 CEST)? (not at the 502 error, but when the locations were duplicated).

Thanks!

@dpalomino
Copy link
Author

Ok, it happened again.

These are the steps I've followed:

  1. Create a new project from the scratch using the attached questionnaire.
  2. Create a new location through the web interface (that was ok)
  3. Via GeoODK, submit a form containing one party plus two locations, including all the information regarding resources (photos), but not the audio field.
  4. Go back to the platform and check the number of locations/parties. It should be 3 locations, and 2 parties. But we have 7 locations and 4 parties. So it seems that the party and the location submitted from ODK was included 3 times.

Link to the project:
https://platform-staging.cadasta.org/organizations/david-org-second-org/projects/testing-repeats-multiple-locations-minus-tenure-2/

You can see two clusters of 3 locations each. Those are exactly replicas of the actual location submitted .

I will keep testing this. Setting this as high-priority.

@linzjax, I will let you know if I find something else...

multiple_location_minus_tenure_questionnaire_0.2.xlsx

Timestamp: 20161006 - 113600 CEST (approx)

@dpalomino
Copy link
Author

100% reproducible. I've followed exactly the same steps as before in a new project with the same results (parties and locations submitted via ODK stored 3 times in the project).

In a second submission, trying again submitting one party a 2 locations, resulted in 3 parties added and 6 locations in the project.

Link to the project:
https://platform-staging.cadasta.org/organizations/david-org-second-org/projects/testing-repeats-multiple-locations-minus-tenure-3/

@amplifi
Copy link
Contributor

amplifi commented Oct 6, 2016

@dpalomino The similarities with #770 were only in relation to the 502 error. There were no log entries aside from those posted in #platform-errors. We're not currently tracking errors from ODK.

@linzjax
Copy link
Contributor

linzjax commented Oct 6, 2016

So after poking this quite a bit, it looks like for whatever reason, ODK is sending multiple submissions per xform submission... I could try and come up with something to prevent this, but I don't know that it would A) be any good or B) be done by the end of this week.

@linzjax
Copy link
Contributor

linzjax commented Oct 6, 2016

@wonderchook didn't we run into this issue during testing for the first release?

@wonderchook
Copy link
Contributor

@linzjax I think so but couldn't track it down. I believe this is a known bug with ODK that others have had issues with as well. I'm inclined to say we need to come up with something to prevent this but not by the end of the week.

@dpalomino I think this should be in sprint 10, what are your thoughts?

@dpalomino
Copy link
Author

Hi @wonderchook, @linzjax

I was trying to find the ODK bug in their github repository, but I didn't find it. Does anyone have a link to that issue or to some place where we can check the steps-to-reproduce?

I think it'd be important to know how this can be reproduced. If it is something to do with timeouts (because of several resources attached for instance), or for repeat groups, or a combination of both... I will try to do more testing today and try to provide more feedback later on today.

@dpalomino
Copy link
Author

Hi,

After some more testing we've found out that:

  • It is confirmed that it has nothing to do with repeat groups.
  • It is happening when including several resources (tested with pics) in the form. For any unknown reason the GeoODK app is sending duplicated information.
  • I haven't seen differences when attaching the resources to a party, a location, or a relation
  • I haven't seen any issues attaching more than one resource to an entity (i.e. 2 pics attached to a party, etc)
  • With my device it start failing from 4 resources. But I suspect that this will depend on the file size

My guess (only my guess) is that there would be some timeout expiring that makes the app to resend the form.

In any case I think this is important, taking 4 pictures or more when collecting data is not unfrequent.

You can see more details about the testing done here.

Anyone has an idea about what could be the root cause of this? And how to work around it?

@dpalomino
Copy link
Author

It seems that the root cause for this is having attached resources over 10MB all together. Then the submission is automatically split (see the discussion in this thread).

Adding an instanceId in the form like suggested here should assist to de-duplicate the submissions on the platform side.

Thoughts? Do you think it'd be feasible to implement this "de-duplication" process when receiving and processing the forms? @linzjax @bjohare

@amplifi
Copy link
Contributor

amplifi commented Oct 13, 2016

If we're going to allow submissions over 10MB to be split, we should seriously consider adding a per-file size limit for resources. There's real potential for project resources to exceed what the platform can display back to the user, particularly when slow/rural connection speeds are at play. It could prevent page load altogether.

@wonderchook
Copy link
Contributor

Perhaps we should have limits for mobile submissions specifically? I can think of bigger files being necessary in some situations, so I don't think we should disable this completely.

@bjohare
Copy link
Contributor

bjohare commented Oct 13, 2016

@dpalomino we could add the instanceid to the XFormSubmission and check for existing submissions before creating new domain entities..https://github.com/Cadasta/cadasta-platform/blob/master/cadasta/xforms/models.py#L8

@dpalomino
Copy link
Author

That would be great @bjohare. Do you think this could be something reasonable for Sprint 10? Meanwhile we would advise partners not to include many resources.

@amplifi @wonderchook , I think we can ask partners to use low resolution photos etc. I think including a file size limit could bring some confusion to users (as they won't probably know why it's failing and they might blame the platform). IMO I would wait to have a bigger scale to setup these limits... what do you think?

@wonderchook
Copy link
Contributor

@dpalomino I don't think we can require low resolution photos. I think we can suggest it, but there are going to be situations where high resolution is required.

@bjohare
Copy link
Contributor

bjohare commented Oct 13, 2016

@dpalomino yes, should be ok for Sprint 10

@seav
Copy link
Contributor

seav commented Oct 13, 2016

If we're going to allow submissions over 10MB to be split, we should seriously consider adding a per-file size limit for resources. There's real potential for project resources to exceed what the platform can display back to the user, particularly when slow/rural connection speeds are at play. It could prevent page load altogether.

At least for resources, this should not be a problem for the platform at all. Any links to resource files go directly to S3. The only possible problem for the platform are large images since we generate thumbnails for them so it's possible that uploading a gigantic image may result into a timeout. Other files types should be no problem.

@dpalomino
Copy link
Author

Yes @wonderchook. I meant to recommend to use at least not high-resolution images for the most typical cases (take a picture of a deed, of the landowner, etc). Not to make this mandatory of course, but recommend to configure their smartphone for not using the highest resolution possible.

@wonderchook
Copy link
Contributor

perhaps @bethschechter can add a section to the documentation regarding smartphone settings for this

@dpalomino dpalomino added this to the Sprint 10 milestone Oct 14, 2016
@dpalomino
Copy link
Author

Assigning tentatively for Sprint 10 (thanks @bjohare!) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants