Format semicolon-delimited value lists in labels #666

1ec5 · 2023-01-05T00:59:40Z

Whenever a feature’s name contains multiple values separated by semicolons, the label now puts each value on a separate line or separates them with an elegant bullet, for up to nine values, honoring the ;; escape sequence.

This implementation includes a general-purpose polyfill for a string replacement expression operator. We can use it for any purpose in an expression or filter but need to be careful about avoiding a combinatoric explosion in the size of an expression’s code. I’ve included a comprehensive battery of unit tests.

Kaser and New Square, New York

Machahe/Caiyuan/Liucandong/Houyukou/Wangshilou/Lichahe/Chahexin/Fukangxin/Qianyukou villages in Liangshan County, Jining City, Shandong, China

Viet Heritage Garden, San José, California

Columbus Cincinnati Road/Cincinnati–Columbus Road in West Chester Township, Butler County, Ohio

Southbound Interstate 275 at exits 63A/B in Union Township, Clermont County, Ohio

Fixes #665.

1ec5 · 2023-01-05T01:01:03Z

src/constants/label.js

+/**
+ * Maximum number of values in a semicolon-delimited list of values.
+ *
+ * Increasing this constant deepens recursion for replacing delimiters in the
+ * list, potentially affecting style loading performance.
+ */
+const maxValueListLength = 9;


Please compare this branch to main and note any perceptible degradation in style loading speed, tile loading speed, panning and zooming performance, or memory or CPU usage. If it’s severe enough, we can tune this limit downward until we’re satisfied with the balance between handling enough features and maintaining reasonable performance.

1ec5 · 2023-01-05T01:06:13Z

src/constants/label.js

+  // Replace the ;; escape sequence with a placeholder sequence unlikely to
+  // legitimately occur inside a value or separator.
+  const objReplacementChar = "\x91\ufffc\x92"; // https://overpass-turbo.eu/s/1pJx


It’s amazing how much junk there is in name that can impersonate a strip marker, but this sequence does the trick for now. If someone opens a pub for text processing geeks and names it “The �� & ;”, I’ll kindly ask someone to offer up a prized password as sacrifice.

1ec5 · 2023-01-05T01:08:44Z

src/constants/label.js

@@ -217,7 +338,7 @@ export const localizedNameWithLocalGloss = [
      ["var", "localizedCollator"],
    ],
    // ...just pick one.
-    ["var", "localizedName"],
+    ["format", listValuesExpression(["var", "localizedName"], "\n")],


The return value of this method is so complex that the style-spec package (used by the unit tests) is apparently unable to upcast it to a format expression, leading to an error when it sees the direct format expression in the else slot below. Interestingly, GL JS has no such problem at runtime. This explicit cast reassures the expression parser that each branch of the case expression will resolve to a format expression.

1ec5 · 2023-01-05T01:11:08Z

test/spec/label.js

      if (typeof evaluated === "string") {
-        return [evaluated, null];
+        return [evaluated];
      }
-      return [evaluated.sections[0].text, evaluated.sections[4].text];
+      return [evaluated.sections[0].text, evaluated.sections[4]?.text];


As a result of the manual cast above, this expression no longer returns a literal string, always a format object.

ZeLonewolf · 2023-01-05T01:27:27Z

Test location below.

Performance measurement in Chrome on my laptop (multiple refreshes based on network tab in the Chrome debugger console
Current master (remote github): 1.9-2.1s
Current master (localhost): 1.9-2.1s
This branch (localhost): 2.6-2.9s

Approximately 37% slowdown.

Current/New:

(localhost link)

Cursory testing seems to indicate that these numbers are independent of location (high zoom on an ocean tile yields the same results).

I'm curious to understand if the performance difference is due to execution or building the style.json.

1ec5 · 2023-01-05T01:44:14Z

Cursory testing seems to indicate that these numbers are independent of location (high zoom on an ocean tile yields the same results).

Yes, the expression is only evaluated at runtime when loading a tile or reevaluating a symbol layer, so the impact is uniform everywhere. That said, it would be possible to limit this semicolon replacement to certain layers that are more likely to have semicolons in them. Or we could create a separate copy of each layer prefiltered to just the features that contain semicolons in them, but at the cost of additional layers that participate in symbol collision.

I'm curious to understand if the performance difference is due to execution or building the style.json.

The new code probably isn’t a significant contributor to the time it takes to build the JSON. (It happens in memory; there’s no file involved.) However, the step right after, loading the JSON, requires more memory than before. Under the hood, GL JS compiles the expression to a type-safe, object-oriented data structure, which becomes more expensive as the expression becomes more deeply nested. Whenever a new tile needs to be displayed, a Web Worker needs to evaluate more massive expressions. The case statements should allow the evaluator to short-circuit most of the expression unless there are semicolons, but it still requires work to ferry around such a large data structure.

With a built-in string replacement operator, the performance concern would essentially go away. The operator’s performance would be similar to the upcase and downcase operators, which no one bats an eye about. Deduplicating values would require slightly more involved string splitting and array joining operators, but those probably wouldn’t have this performance impact either.

ZeLonewolf · 2023-01-05T15:43:38Z

Pending anyone else's review of the code changes, I think it's a good idea to proceed with the proposed change to handle semicolon-delimited lists. I'm concerned about the 1/3 performance drop for processing up to 9 semi-colons, though I understand that future improvements to MapLibre may make it possible for use to be more efficient about this. Perhaps we could reduce the 9 down to a number that captures the vast majority of the sane delimiter use cases while ignoring the silly ones that are probably tagging errors.

claysmalley · 2023-01-05T16:29:29Z

Should spaces before and after semicolons be trimmed, or is that considered a tagging error?

1ec5 · 2023-01-05T16:49:19Z

Should spaces before and after semicolons be trimmed, or is that considered a tagging error?

I believe that’s a tagging error. In fact, editors such as iD automatically remove the space when you touch the name.

zekefarwell · 2023-01-05T17:38:11Z

Should spaces before and after semicolons be trimmed, or is that considered a tagging error?

The wiki mentions a single space after a semicolon being valid:
https://wiki.openstreetmap.org/wiki/Semi-colon_value_separator#Space_character_padding

I'm sure handling either ; or ; would make for more complex code, but it could be a strategically good choice if it's doable. On maps that don't replace semicolons with something nicer looking, mappers will probably find "First Name; Second Name" a bit more tolerable than "First Name;Second Name". Although they may not like the look of the semicolon, at least there is also a space separating the two names.

1ec5 · 2023-01-05T17:41:19Z

Editors strip out the space in general, but preserve it for certain syntaxes like opening_hours and *:conditional. At one point, it was common to intentionally leave a space after the semicolon, but removing it has been standard practice for many years at this point.

zekefarwell · 2023-01-05T17:44:25Z

Editors strip out the space in general

Well iD does anyway. I didn't have any problem including a space here with JOSM: https://www.openstreetmap.org/way/971815925

ZeLonewolf · 2023-01-05T17:49:54Z

For place nodes, stats on prevalence:

# Semi-colons	Prevalence
1+ semicolon	444
2+ semicolons	9
3+ semicolons	6
4+ semicolons	3
5+ semicolons	2
6+ semicolons	1

1ec5 · 2023-01-05T19:39:07Z

For reference, the semicolon delimiter was first documented (without a space) in 2007, when it replaced a recommendation to use a comma.¹ There continued to be some uncertainty about the delimiter until 2008.

Editors strip out the space in general

Well iD does anyway. I didn't have any problem including a space here with JOSM: https://www.openstreetmap.org/way/971815925

You can do anything you want in JOSM. 😉 But it too prefers a semicolon whenever it constructs a value list for you. The main reason space-padded semicolons appear in the database today is that Potlatch used to insert this delimiter when merging two ways. It was usually a mistake caused by its confusing shortcut for merging ways. Based on that behavior, mappers (including yours truly) used to manually insert the same delimiter for consistency, but it has been rare for a long time: openstreetmap/iD#941.

Documentation about space padding is pretty stale, dating back to when the same page pointed out that characters other than a semicolon were being used as value delimiters. This page is the product of past edit-warring and also flamewarring on the tagging list, so people have probably been hesitant to do much cleanup on it since. (The controversy was not about the choice of delimiter but rather about multiple-value lists in the first place.) I updated the documentation to reflect longstanding reality.

If there’s a strong need to support the space after the semicolon, then there will be an additional performance hit as long as we build upon this magnificent workaround for the lack of a string replacement operator. This extra padding is more noticeable with the inline treatment than with the newline treatment. We could replace the bullet with a series of spaces, which would make the imbalance imperceptible, but at the cost of making it even more likely for a dual-named label to get collided out.

Given this complexity, I’d suggest tracking support for space-padded semicolons as tail work, which I would consider blocked by a string replacement operator. I’m not opposed to it in principle, since a value’s leading and trailing spaces are always extraneous, but the cost-benefit ratio isn’t superb.

For place nodes, stats on prevalence:

Thanks, this argues strongly in favor of setting the limit to just one or two semicolons for now. I suppose it would be difficult to gather similar statistics about ad-hoc delimiters that would be candidates for conversion to semicolons – which is the point of using semicolons in the first place. But out of curiosity, is there a similar distribution on places if we naïvely assume all space-padded slashes and hyphens are intended to be value delimiters?

A comma continues to be used very commonly in addr:housenumber to this day as part of the Karlsruhe schema, authored by the same mapper who originally documented the comma as a more general delimiter. ↩

zekefarwell · 2023-01-05T19:59:41Z

You can do anything you want in JOSM. 😉

Indeed 💪💪

Given this complexity, I’d suggest tracking support for space-padded semicolons as tail work, which I would consider blocked by a string replacement operator. I’m not opposed to it in principle, since a value’s leading and trailing spaces are always extraneous, but the cost-benefit ratio isn’t superb.

Sounds good to me. Even without explicitly handling the extra space after a semicolon, the bullet delimiter is far superior to to displaying the raw value with the semicolon. I don't really find the extra space on one side of the bullet all that noticeable. I just didn't think we ought to necessarily consider a space after a semicolon to be a tagging error. At least not while the raw, semicolon delimited values are displayed on the osm.org standard layer.

ZeLonewolf · 2023-01-05T20:28:40Z

Query for all types of objects this time.

1 semi-colon 31,953
2 semi-colons 2,163
3 semi-colon 539
4 semi-colon 34
5 semi-colon 10
6+ semi-colon 6

zekefarwell

This looks good to me. I did some performance profiling and there's definitely an increase to JS execution time and jank on initial load and while panning/zooming around the map. However, performance is already not great and without profiling I'm not sure the increase from this changes is really all that noticeable (at least on my hardware). Given the stats on numbers of semicolons it seems like dropping maxValueListLength to 4 would be perfectly reasonable if that would improve performance.

Added a function that returns an expression to find and replace a fixed number of occurrences of a substring. Added a function to format a list of semicolon-delimited tag values. Separate multiple names by newlines or bullets as necessary.

1ec5 · 2023-01-06T08:23:03Z

src/constants/label.js

@@ -175,7 +175,7 @@ export function replaceExpression(
 * Increasing this constant deepens recursion for replacing delimiters in the
 * list, potentially affecting style loading performance.
 */
-const maxValueListLength = 9;
+const maxValueListLength = 3;


Three values separated by two semicolons is plenty enough for the names currently using semicolons. In the future, with a more efficient string replacement implementation, we can add support for more semicolons, which will make this option more attractive to mappers currently using informal delimiters like spaces.

1ec5 · 2023-01-11T01:57:40Z

Should spaces before and after semicolons be trimmed, or is that considered a tagging error?

#670 collapses up to one space following a semicolon. If a mapper does intentionally use ; to make the label look nicer in some other renderers, it’ll still work fine in this style. However, iD does strip it out, so if this becomes an important tagging style, iD will need a one-line fix.

Preserve newlines in labels in legend

0e4adb3

1ec5 added enhancement New feature or request internationalization labels Jan 5, 2023

1ec5 self-assigned this Jan 5, 2023

1ec5 commented Jan 5, 2023

View reviewed changes

zekefarwell approved these changes Jan 6, 2023

View reviewed changes

1ec5 added 3 commits January 6, 2023 00:20

Format name lists in labels

527f2fc

Added a function that returns an expression to find and replace a fixed number of occurrences of a substring. Added a function to format a list of semicolon-delimited tag values. Separate multiple names by newlines or bullets as necessary.

Format highway exit number lists

2f96926

Limit to 3 values separated by 2 semicolons

cf6c351

1ec5 force-pushed the 1ec5-name-semicolon-665 branch from 2091677 to cf6c351 Compare January 6, 2023 08:20

1ec5 commented Jan 6, 2023

View reviewed changes

1ec5 merged commit 00d067b into main Jan 6, 2023

1ec5 deleted the 1ec5-name-semicolon-665 branch January 6, 2023 08:23

This was referenced Jan 6, 2023

Missing bullet delimiter in waterway labels #668

Closed

Deduplicate names in gloss; gloss towns #670

Merged

Split localized name on semicolons before gloss #673

Merged

1ec5 mentioned this pull request Jan 12, 2023

Raise limit on semicolons in lists #680

Open

This was referenced Jan 31, 2023

Add Asian Highway Network shields #749

Draft

Add split and join expressions maplibre/maplibre-gl-js#2064

Closed

1ec5 mentioned this pull request Apr 7, 2023

Fontstack lacks punctuation characters #850

Closed

1ec5 mentioned this pull request Aug 12, 2024

Make Dutch highway shields more recognizable (spacing, font?) #1148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Format semicolon-delimited value lists in labels #666

Format semicolon-delimited value lists in labels #666

1ec5 commented Jan 5, 2023 •

edited

Loading

1ec5 Jan 5, 2023

1ec5 Jan 5, 2023

1ec5 Jan 5, 2023 •

edited

Loading

1ec5 Jan 5, 2023

ZeLonewolf commented Jan 5, 2023

1ec5 commented Jan 5, 2023 •

edited

Loading

ZeLonewolf commented Jan 5, 2023

claysmalley commented Jan 5, 2023

1ec5 commented Jan 5, 2023

zekefarwell commented Jan 5, 2023 •

edited

Loading

1ec5 commented Jan 5, 2023 •

edited

Loading

zekefarwell commented Jan 5, 2023

ZeLonewolf commented Jan 5, 2023

1ec5 commented Jan 5, 2023 •

edited

Loading

zekefarwell commented Jan 5, 2023

ZeLonewolf commented Jan 5, 2023

zekefarwell left a comment

1ec5 Jan 6, 2023

1ec5 commented Jan 11, 2023

Format semicolon-delimited value lists in labels #666

Format semicolon-delimited value lists in labels #666

Conversation

1ec5 commented Jan 5, 2023 • edited Loading

1ec5 Jan 5, 2023

Choose a reason for hiding this comment

1ec5 Jan 5, 2023

Choose a reason for hiding this comment

1ec5 Jan 5, 2023 • edited Loading

Choose a reason for hiding this comment

1ec5 Jan 5, 2023

Choose a reason for hiding this comment

ZeLonewolf commented Jan 5, 2023

1ec5 commented Jan 5, 2023 • edited Loading

ZeLonewolf commented Jan 5, 2023

claysmalley commented Jan 5, 2023

1ec5 commented Jan 5, 2023

zekefarwell commented Jan 5, 2023 • edited Loading

1ec5 commented Jan 5, 2023 • edited Loading

zekefarwell commented Jan 5, 2023

ZeLonewolf commented Jan 5, 2023

1ec5 commented Jan 5, 2023 • edited Loading

Footnotes

zekefarwell commented Jan 5, 2023

ZeLonewolf commented Jan 5, 2023

zekefarwell left a comment

Choose a reason for hiding this comment

1ec5 Jan 6, 2023

Choose a reason for hiding this comment

1ec5 commented Jan 11, 2023

1ec5 commented Jan 5, 2023 •

edited

Loading

1ec5 Jan 5, 2023 •

edited

Loading

1ec5 commented Jan 5, 2023 •

edited

Loading

zekefarwell commented Jan 5, 2023 •

edited

Loading

1ec5 commented Jan 5, 2023 •

edited

Loading

1ec5 commented Jan 5, 2023 •

edited

Loading