Skip to content
This repository has been archived by the owner on Jan 12, 2024. It is now read-only.

Nested regex can cause extremely slow generation times and longer xml #213

Open
smsm1 opened this issue Dec 13, 2012 · 13 comments
Open

Nested regex can cause extremely slow generation times and longer xml #213

smsm1 opened this issue Dec 13, 2012 · 13 comments

Comments

@smsm1
Copy link

smsm1 commented Dec 13, 2012

As a particularly noticeable example is the TileMill example open-streets-dc where converting two sections with the widths to regex, or even just changing a few lines from = to =~ gives significantly different results.
https://github.com/tilemill-project/tilemill/blob/master/examples/open-streets-dc/highway.mss#L119 is an example line to change to get the difference.

I'm using Carto 0.9.4.

@tmcw
Copy link
Contributor

tmcw commented Dec 13, 2012

This is probably unavoidable.

We can make judgments about zoom level filters and other relative filters, like =, < and so on, so we can say that if you have rules that overrule each other we can eliminate lots of possible combinations.

We can't do that with regex filters unless we have some kind of zany meta-regex-evaluator which compares their DFAs and derives new ones that are combinations and exclusions of each other.

@springmeyer
Copy link

@smsm1 - would you mind pasting a few before/after of the Mapnik XML based on a given change?

@smsm1
Copy link
Author

smsm1 commented Dec 17, 2012

Before: http://shaunmcdonald.me.uk/tmp/carto/open-streets-dc-orginal.xml
After: http://shaunmcdonald.me.uk/tmp/carto/open-streets-dc-orginal-1.xml

Changed about line 119 from:
[type='motorway'],
[type='trunk'] {

To:
[type=~'motorway|trunk'] {

@springmeyer
Copy link

thank you @smsm1

@springmeyer
Copy link

ugh, the diff is larger than either xml. And many of the differences seem not directly related to the change: http://cl.ly/code/0l1V2S0O0W2n. @tmcw - are you positive this bug is "unavoidable" ?

@tmcw
Copy link
Contributor

tmcw commented Dec 17, 2012

@tmcw - are you positive this bug is "unavoidable" ?

I'm positive that the regex-makes-more-rules bug is unavoidable unless carto or mapnik changes its approach, but it sounds like you're seeing a different bug than that.

@smsm1
Copy link
Author

smsm1 commented Dec 17, 2012

http://shaunmcdonald.me.uk/tmp/carto/open-streets-dc-orginal-2.xml is an even simpler change to (note the extra tilde)

[type=~'motorway'],
[type='trunk'] {
I wonder if having another syntax for the most common reasons for needing regexes currently may be an alternate solution?

@springmeyer
Copy link

It seems like the next step is to try to create a very small testcase (small stylesheet, only one layer...) and to use it to try to tease out if there are multiple bugs.

@smsm1
Copy link
Author

smsm1 commented Feb 25, 2013

I've got a case where 4 levels of indentation (levels of filters) with no regex can cause generation of the mapnik xml to jump, even so it makes the document simpler to read with less duplication.

@tmcw
Copy link
Contributor

tmcw commented Feb 25, 2013

@smsm1 can you create & post a testcase for this? the effect of the filter nesting is entirely dependent on the filters themselves.

@stevage
Copy link

stevage commented May 8, 2014

Hmm, I've just started using regexes because they're so much more concise:

  [shop=~"bicycle|supermarket|convenience|general|bakery"],
  [amenity=~"cafe|toilets|restaurant|fast_food|shelter"],
  [tourism=~"viewpoint|attraction"]

This bug is a bit alarming. I'd agree with @smsm1's comment that maybe just syntax for the most important use case would be enough? For me, it would definitely be this one - I could very easily live without even wildcard matches, but the ability to concisely express a set of possible things to match against is very valuable.

Maybe even: [shop in 'bakery,convenience,general'] { } ? (Commas are more readable than pipes, but maybe there's some other reason to avoid them...)

@tmcw tmcw mentioned this issue Jan 14, 2016
@nebulon42 nebulon42 added bug and removed osm labels Mar 2, 2016
@nebulon42
Copy link
Collaborator

So far I'm not seeing problems with a minimal example like

#world {
    [type=~'motorway|trunk'] {
        [zoom=12] { line-width: 1.2 + 2; }
        [zoom=13] { line-width: 2 + 2; }
        [zoom=14] { line-width: 4 + 2; }
        [zoom=15] { line-width: 6 + 2; }
        [zoom=16] { line-width: 9 + 3; }
        [zoom=17] { line-width: 13 + 3; }
        [zoom>17] { line-width: 15 + 3; }
    }
}

Also

#motorways::case[zoom>=6][zoom<=12],
#mainroads::case[zoom>=10][zoom<=12],
#roads::case[zoom>=13][tunnel!=1][bridge!=1],
#tunnels::case[zoom>=13][tunnel=1],
#bridges::case[zoom>=13][bridge=1] {
    line-cap:round;
    line-join:round;
    line-width:0;

    [type=~'motorway|trunk'] {
        [zoom=12] { line-width: 1.2 + 2; }
        [zoom=13] { line-width: 2 + 2; }
        [zoom=14] { line-width: 4 + 2; }
        [zoom=15] { line-width: 6 + 2; }
        [zoom=16] { line-width: 9 + 3; }
        [zoom=17] { line-width: 13 + 3; }
        [zoom>17] { line-width: 15 + 3; }
    }
}

is ok.

But with the open-streets-dc style I see an increase in 3200 lines. Challenge seems to be to find a "minimal" example that is not too minimal.

@nebulon42
Copy link
Collaborator

nebulon42 commented Jan 20, 2017

Right now I could not confirm that there is an interaction of bugs. Starting with https://github.com/tilemill-project/tilemill/blob/master/examples/open-streets-dc/highway.mss#L82-L160 and condensing it further seems to bring the number of lines closer together for the regex and non-regex case. Until it reaches an equal number of lines when there is only one sub selector.

Writing #world or https://github.com/tilemill-project/tilemill/blob/master/examples/open-streets-dc/highway.mss#L82-L86 made no difference. Seems to me like a combinatorial explosion problem.

@tmcw is probably right with what he wrote in #213 (comment).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants