Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial start on auto-generating taginfo JSON #2754

Closed
wants to merge 37 commits into from

Conversation

peternewman
Copy link
Collaborator

@peternewman peternewman commented Apr 16, 2021

I'm not expecting any serious code reviews yet, more just sort of getting it out there a bit, and I guess starting to look at workflows.

TODO:

  • Combine the function names (e.g. add versus updateCheckDateForKey) - "Whether add or modify doesn't really matter, if it tags it, it tags it."

@westnordost
Copy link
Member

Good start! Though, I can't say much about this yet because I haven't looked closer at how the file would need to look like for taginfo to work with it properly.

I'd just say that this should go into an own script.

I wonder if it makes more sense to generate this file once every release or as a hook like @FloEdelmann 's wiki-script. If the former, the proper location would be buildSrc/src/main/java/TaginfoJasonTask.kt and then register a task in the project's build.gradle.kts

The old taginfo json from @goldfndr listed the "quest fill" and "quest filter" separately. For a new version, I think it is better to only keep the "quest fill" (=how the app tags something) part, because obviously, for the filter, the app sometimes interprets deprecated, outdated, ambiguous, not-really-used and "wrong" tags, just so that the quest is not shown for too many elements. These kind of tags should then not appear in taginfo as "StreetComplete uses these tags" because this paints a false picture of the situation:

  • It may make it look like StreetComplete is propagating the use of this tag
  • It may look like a tag is "accepted and used" because StreetComplete uses it

@matkoniecz
Copy link
Member

If you reach stage when it starts generating things and you are unaware about bugs... You can publish output as gist and I can look at some quests that I know well. Maybe I will spot some bugs this way.

And it is really cool that you are doing this!

@peternewman
Copy link
Collaborator Author

Good start! Though, I can't say much about this yet because I haven't looked closer at how the file would need to look like for taginfo to work with it properly.

Thanks. It's just the same as the existing one, I was initially trying to just automate what @goldfndr had done manually. Most of the fields are just free text, so obviously his hand-crafted explanations are likely to be better, but only if it's kept up to date all the time, hopefully we can be good enough via some automation:
https://wiki.openstreetmap.org/wiki/Taginfo/Projects

I'd just say that this should go into an own script.

Yes, although it shares lots of common requirements with @FloEdelmann 's code, such as extracting quest names and lists of quests, so unless we're going to have a helper library too... I was going to add some command line arguments (if Kotlin does them, so you would decide which output to produce).

I wonder if it makes more sense to generate this file once every release or as a hook like @FloEdelmann 's wiki-script. If the former, the proper location would be buildSrc/src/main/java/TaginfoJasonTask.kt and then register a task in the project's build.gradle.kts

Possibly, we can automate the generation like @FloEdelmann 's so it just gets built automagically each time there is a release, the benefit then is you don't need to do anything. I guess with both there's an argument that it's beneficial to have the data available sooner (i.e. as soon as a quest is released), either because people might be running their own builds of the code, either forks or nightly releases or whatever, and also because as soon as its been committed it means it's going to appear at some point soon, so in the wiki case it would mean the page could be updated before it's released, so it's immediately current when people checkout the new version, and in the Taginfo case, an accessible map designer for example might choose to prioritise adding a feature to their app knowing SC supports it and therefore data will exist and grow. Essentially once the code has been written, it's going to appear so having a little head start can't hurt.

The old taginfo json from @goldfndr listed the "quest fill" and "quest filter" separately. For a new version, I think it is better to only keep the "quest fill" (=how the app tags something) part, because obviously, for the filter, the app sometimes interprets deprecated, outdated, ambiguous, not-really-used and "wrong" tags, just so that the quest is not shown for too many elements. These kind of tags should then not appear in taginfo as "StreetComplete uses these tags" because this paints a false picture of the situation:

I guess it depends what people are doing with the data. Avoiding the query part is certainly likely to make the code far simpler, as I don't have to parse and understand the Overpass type stuff, just some basic Kotlin methods. Presumably it's mostly irrelevant as for the majority of current tags and ones we're "filling", we're likely to be filtering on them too, to ensure they aren't already set. Are there actually many which we only filter on?

* It may make it look like StreetComplete is propagating the use of this tag

* It may look like a tag is "accepted and used" because StreetComplete uses it

Yep, understood. Although we could add some descriptions about those usages.

@peternewman
Copy link
Collaborator Author

If you reach stage when it starts generating things and you are unaware about bugs... You can publish output as gist and I can look at some quests that I know well. Maybe I will spot some bugs this way.

Still on my list is to actually write the JSON file and commit it to the repo, it would be in the runner output, but the runner doesn't currently run apart from on a release due to the workflow config. I might tweak that to aid development. Thanks for the offer anyway. As long as a quest isn't setting things in multiple different ways (which I think a few are), it's likely most bugs will be fairly obvious as it will complain it can't process the data.

And it is really cool that you are doing this!

Thanks. I thought it seemed like an interesting little challenge with a useful outcome, without me having to get into proper Android and GUI development! I'm still a bit worried it might end up rather brittle and fragile, but probably less hassle than using a full Kotlin lexer to do it properly!

@matkoniecz
Copy link
Member

matkoniecz commented Apr 17, 2021

Are there actually many which we only filter on?

From just implemented https://github.com/streetcomplete/StreetComplete/pull/2753/files

sets barrier (several values), filters on barrier=yes and:

         and !man_made
         and !historic
         and !military
         and !power
         and !tourism
         and !attraction
         and !amenity
         and !leisure

(as there are rare tagging mistakes of leisure=playground barrier=yes type that could result in mapper selecting "not existing" option for question about specifying barrier type. This filtering should reduce this anyway rare cases to something basically not happening)

Or building levels quest with

!building:levels and !height and !building:height
         and !man_made and location != underground and ruins != yes

filter (though here only building:height is a bad value, other would be kind of useful to list)

I was going to add some command line arguments (if Kotlin does them, so you would decide which output to produce).

It may be preferable to have only actually used code (as unused code triggered by arguments is by definition not used but still requires maintenance/review etc)

probably less hassle than using a full Kotlin lexer to do it properly!

Though after some times of doing this I am nowadays really preferring starting from a proper parser if at all available :)

@westnordost
Copy link
Member

Are you still working on this?

@peternewman
Copy link
Collaborator Author

Are there actually many which we only filter on?

(as there are rare tagging mistakes of leisure=playground barrier=yes type that could result in mapper selecting "not existing" option for question about specifying barrier type. This filtering should reduce this anyway rare cases to something basically not happening)

Hmm, as @westnordost mentions, it's how we flag those things without making it look like they are good/relevant tags, not errors.

filter (though here only building:height is a bad value, other would be kind of useful to list)

I think the other thing is parsing the query stuff is probably even harder than the changes stuff.

I was going to add some command line arguments (if Kotlin does them, so you would decide which output to produce).

It may be preferable to have only actually used code (as unused code triggered by arguments is by definition not used but still requires maintenance/review etc)

It is used code, some bits are common to the wiki stuff and the JSON, some are only used by one or the other. It should perhaps be in a separate library file if that's not excessive for a little script. Take a look now and it should make more sense.

probably less hassle than using a full Kotlin lexer to do it properly!

Though after some times of doing this I am nowadays really preferring starting from a proper parser if at all available :)

Yeah, I'm still a bit undecided, it seems to be mostly working. Some bits would be a bit easier with a lexer/parser, but I'm not sure it will magically fix the issues, as I'll still need to process all the tokens it returns. In C++ land I'd probably do something clever with some macros and ifdefs so you could compile the code in two different ways to be able to get a list of changes it would apply or something.

@peternewman
Copy link
Collaborator Author

Are you still working on this?

I had mostly forgotten, or been doing other things.

I've now picked it up and it covers most stuff I think. There are a few fiddly edge cases, but I wonder if it's worth merging as is/after a small tidy up, and perhaps having both JSON files in parallel. E.g. @goldfndr manual one still covers all the query stuff, which this doesn't touch at all yet, but equally this has already picked up say the bollard and CCTV camera quests with no input from anyone, so it's doing useful things already. I'm pretty confident it's not lying, and while it's incomplete in some ways, so is the existing one. I believe TagInfo is quite happy with having multiple JSON files for the same project, so there's no major downside there.

@peternewman
Copy link
Collaborator Author

peternewman commented Jun 23, 2021

Remember to update https://github.com/streetcomplete/StreetComplete/blob/master/CONTRIBUTING.md#improving-documentation when this is in!

Edit: currently removed in 3cd0e73 to be tidied as appropriate depending on how this PR progresses.

@peternewman peternewman changed the title Initial basic start on auto-generating taginfo JSON Initial start on auto-generating taginfo JSON Jun 23, 2021
@matkoniecz
Copy link
Member

@peternewman Are you still working on this one? Or is it waiting for someone to adopt it?

when (task) {
"wiki" -> generateWikiCsv(questNames, questFiles)
"json" -> generateTaginfoJson(questNames, questFiles)
else -> { // Note the block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this comment is referring to? { }?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think it's possibly because if I didn't put the block in, it didn't print or something?

"https://github.com/streetcomplete/StreetComplete",
"https://wiki.openstreetmap.org/wiki/StreetComplete",
"https://raw.githubusercontent.com/streetcomplete/StreetComplete/master/app/src/main/res/mipmap-xhdpi/ic_launcher.png",
"Peter Newman"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that you want to be contacted directly about maintaining this script?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly not, a bit of a placeholder at least. I think the spec suggested it should be an individual... 🤷

@matkoniecz
Copy link
Member

matkoniecz commented Dec 13, 2021

General question: would it be easier to parse Kotlin code into some AST? For example with https://github.com/kotlinx/ast

I did may share of "parsing is complicated, I will solve it with bunch of simple regexps". And after several months I wished that I did it properly with parsing - as I kept encountering one edge case after another and ended with monster regexp which was uneditable nightmare.

End result was dynamically build regexp that was incomprehensible large and really poor in editing (at least I had tests).

If that would be regexp - then it really needs some tests, otherwise (I know from experience) this will be irritating source of regressions if this will be ever changed.

@smichel17
Copy link
Member

would it be easier to parse Kotlin code into some AST?

I was going to suggest the same thing. At the very least, we already have code for parsing a string to an ElementFilterExpression (EFE); it shouldn't be too much work* to have an EFE spit out an equivalent tagInfo for it.

For more complicated parsing, perhaps we could take an approach something like a javadoc: embed a comment above the getApplicableElements function with the relevant information. That would still require manual maintenance, but at least it would be right in your face when you're changing the code, rather than off in a separate repo.

*famous last words, I know

@peternewman
Copy link
Collaborator Author

#3202 was caught, but as it was nothing fatal and obnoxious to fix the relevant scripts were commented out - in hope that Sophox situation will be fixed on Sophox side

Yeah, it just avoids that mad rush finding loads of stuff is broken just as you're about to release.

Should we really combine all those (e.g. add versus updateCheckDateForKey)?

Yes, I think yes. Whether add or modify doesn't really matter, if it tags it, it tags it.

Okay, I'll add to the TODO list, it's vaguely useful during debug to know where it's come from.

Regarding check date, it should probably be interpreted like what it tags. F.e. the taginfo JSON should show that the opening hours quest tags opening_hours but also check_date:opening_hours.

I think it already did this before you commented, it definitely does now anyway.

@peternewman Are you still working on this one? Or is it waiting for someone to adopt it?

On and off, although mostly off unfortunately, too many other bits to do too. I wouldn't hugely object if someone wanted to take it on, I think I've done most of the easy bits! I've just updated it all to cope with all the changes due to #3665 .

General question: would it be easier to parse Kotlin code into some AST? For example with https://github.com/kotlinx/ast

From a quick look at that, I'm not sure it actually solves the hard part of this, which is dealing with the logic and enums etc. It would avoid the regex, and some potential errors there, but probably in exchange for more complicated/less readable means of processing the results

e.g. given this code:

private enum class Side(val value: String) {
LEFT("left"), RIGHT("right"), BOTH("both")
}
private fun applyCyclewayAnswerTo(cycleway: Cycleway, side: Side, dir: Int, tags: Tags)
{
val directionValue = when {
dir > 0 -> "yes"
dir < 0 -> "-1"
else -> null
}
val cyclewayKey = "cycleway:" + side.value
when (cycleway) {
NONE, NONE_NO_ONEWAY -> {
tags[cyclewayKey] = "no"
}
EXCLUSIVE_LANE, ADVISORY_LANE, UNSPECIFIED_LANE -> {
tags[cyclewayKey] = "lane"
if (directionValue != null) {
tags["$cyclewayKey:oneway"] = directionValue

We need to produce the JSON version of these statements; that to my mind is the hard bit:

tags["left"] = "no"
tags["left"] = "lane"
tags["left:oneway"] = "yes"
tags["left:oneway"] = "-1"
tags["right"] = "no"
tags["right"] = "lane"
tags["right:oneway"] = "yes"
tags["right:oneway"] = "-1"
tags["both"] = "no"
tags["both"] = "lane"
tags["both:oneway"] = "yes"
tags["both:oneway"] = "-1"

As I mentioned, I could possibly conceive you might be able to do some preprocessor magic in C/C++, I don't know if Kotlin offers equivalents, or if it would even help.

Maybe if everything could be switched to code similar to these (which in fairness the Cycleway is already):

fun BarrierType.applyTo(tags: Tags) {
tags["barrier"] = this.osmValue
when (this) {
BarrierType.STILE_SQUEEZER -> {
tags["stile"] = "squeezer"
}
BarrierType.STILE_LADDER -> {
tags["stile"] = "ladder"
}
BarrierType.STILE_STEPOVER_WOODEN -> {
tags["stile"] = "stepover"
tags["material"] = "wood"
}
BarrierType.STILE_STEPOVER_STONE -> {
tags["stile"] = "stepover"
tags["material"] = "stone"
}
}

Then we'd just need to write code which feeds in all combinations of inputs (i.e. Cycleway x Side x dir) and captures all the output; but then in this case, dir is currently an Int so has thousands of values; we'd need to do the Int to enum conversion first to make it manageable.

I did may share of "parsing is complicated, I will solve it with bunch of simple regexps". And after several months I wished that I did it properly with parsing - as I kept encountering one edge case after another and ended with monster regexp which was uneditable nightmare.

Yeah I've found a few quirks here which I've caught and fixed, some of which is inconsistency in the codebase (e.g. func = func2 versus func { func2 } etc).

If that would be regexp - then it really needs some tests, otherwise (I know from experience) this will be irritating source of regressions if this will be ever changed.

Agreed, I've started collecting the examples so some tests can be written.

would it be easier to parse Kotlin code into some AST?

I was going to suggest the same thing. At the very least, we already have code for parsing a string to an ElementFilterExpression (EFE); it shouldn't be too much work* to have an EFE spit out an equivalent tagInfo for it.

Remember we're only interested in the tagging side, not the selection side (@westnordost preference, but I can see the logic so we don't flag e.g. roundabout=yes in our road quest or some other random/obscure tag). So I don't think any EFE stuff will help.

For more complicated parsing, perhaps we could take an approach something like a javadoc: embed a comment above the getApplicableElements function with the relevant information. That would still require manual maintenance, but at least it would be right in your face when you're changing the code, rather than off in a separate repo.

Again, we don't care about getApplicableElements just applyAnswerTo. I think @westnordost didn't want JSON in the source folders, we talked about some JSON outside of the main source. I'd imagine he may feel the same about Javadoc style commenting.

@westnordost
Copy link
Member

This has been open as draft since almost a year. Is this still being worked on?

@westnordost
Copy link
Member

Closing as this seems to be abandoned.

@westnordost
Copy link
Member

westnordost commented Apr 10, 2022

If you intend to finish it, please only open a new PR when it is ready for review or if you need help (then as draft).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants