-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitize street and house numbers #468
Conversation
Log
Good results (187)
Okayish results (2)
Not so good results (3)
Postal code (5, previously not logged)
|
Quite some regex magic in use here, but I think the results turned out way better than I expected. I won't be sad if we don't want to merge this. But just let me know what you think :) Some things I'd do if these changes work for you:
It would also be possible to handle the edge cases listed in the comment above but I don't think that would be worth it. |
a038d82
to
4e43759
Compare
4e43759
to
2e8b69e
Compare
Am I correct in my assumption, that the main benefit of sanitizing street and housenumbers is that the actual geocoding sanitization of coordinates works better? |
Initially, while developing the backend we had discussions about the address format. Because there is quite a lot of complexity the current format probably would not be able to cover addresses outside of Germany. French people for example put the house number befor the street (or used to). My opinion back then was to just use a single string to represent addresses, because it will be hopeless to support ANY address worldwide. We decided to go with the current solution because we focused on Bavaria. I don't have experience with addresses or how it is for example in border areas around germany. But my feeling is that we will have problems again in the future with our data/address format. |
Yes, that should be the main benefit/was the reasoning behind the issue. However, it does not really make a big difference and provides better coordinates for only two stores I think :/. I think there are a few smaller benefits like removal of wrong/duplicated information from the address/separation of additional hints, e.g.
|
Isn't this only a problem if we have a berechtigungskarte ouside of germany? In the app we also just have a single street string (street + house number), location and postal code are still handled separately. |
Another thing that this MIGHT be important for (did not look into it yet): we have lots of duplicates in our data, i.e. multiple stores in the data for the same actual stores. If we have sanitized sanitized addresses, it might be easier to decide which duplicates are redundant. |
What we might also do (although a bit unrelated to this specific PR): |
I would say yes, but we already have the first Akzeptanzstelle in Spain :D Ofc this is an extreme corner case, but worth discussing :)
@Schedulaar can you create a User Story for this? I think we actually do not have an issue about this feature which is kind of possible to do now. |
@maxammann I created the brain-storming issue #473 . |
As discussed yesterday in the call I'll merge this. This (slightly) improves sanitation with geocoding and has the benefit of removing stuff from the addresses that should not be part of the address. It works really well for german addresses and stores outside of germany have to be handled differently anyway at some point anyway if we remove filtering (#447). @maxammann @Schedulaar should I rename the steps? e.g. Map -> Sanitize and Sanitize -> Geocode? Or should we leave it like it currently is? |
…ocation-house # Conflicts: # backend/src/main/kotlin/app/ehrenamtskarte/backend/stores/importer/steps/Map.kt
Hm I think this requires some cleanup in naming yes. So I think we should strive for small and fine-grained steps in the pipeline. Proposal:
If we suppose that AcceptingStore is immutable then we need to recreate AcceptingStores. Should be done in a separate task though. |
backend/src/main/kotlin/app/ehrenamtskarte/backend/stores/importer/types/AcceptingStore.kt
Show resolved
Hide resolved
@maxammann I'll do the renaming/splitting up of steps in a separate PR |
What do you mean by this? |
Just that if AcceptingStore is an immutable data class, then you can not modify it in a pipeline step. This means when changing a field of an AcceptingStore you have to "deep-copy" it |
I think we do this already using the |
Yes, feel free to merge. That comment was not important. |
Fixes #467