Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory with ndjson-reduce. A streaming ndjson-merge for the rescue #38

Closed
ilyabo opened this issue Feb 13, 2019 · 1 comment
Closed

Comments

@ilyabo
Copy link

ilyabo commented Feb 13, 2019

I was experiencing an out of memory error when trying to process the US counties shapefile from census.gov with the following workflow:

geo2topo $OUTPUT_OBJECT_NAME=<(\
  ndjson-join  <(shp2json -n $INPUT_SHAPEFILE) <(dbf2json -n $INPUT_DBF) \
  | ndjson-map 'Object.assign(d[0], { properties: d[1] })' \
  | ndjson-map -r d3geo=d3-geo 'd.properties.centroid = d3geo.geoCentroid(d), d' \
  | ndjson-reduce 'p.features.push(d), p' '{type: "FeatureCollection", features: []}' \
) \
| toposimplify -f -s $SIMPLIFICATION_THRESHOLD \
| topoquantize 1e5

ndjson-reduce is obviously not streaming but loading everything in memory and that causes the overflow.

I could work around the issue by replacing the ndjson-reduce step with my own ndjson-merge utility as below:

| ./ndjson-merge '{"type": "FeatureCollection", "features": [' ',' ']}'

This is my implementation of ndjson-merge:

#!/usr/bin/env node
const readline = require('readline');

if (process.argv.length < 5) {
  console.log('Usage: ndjson-merge <prefix> <separator> <suffix>')
  process.exit(1);
}

var prefix = process.argv[2];
var separator = process.argv[3];
var suffix = process.argv[4];

process.stdout.write(prefix);
var count = 0;
readline.createInterface({
  input: process.stdin,
  output: null,
}).on('line', function(line) {
  if (count++ > 0) {
    process.stdout.write(separator);
  }
  process.stdout.write(line);
}).on('close', function() {
  process.stdout.write(suffix);
}).on('error', (err) => {
  console.error(err);
  process.exit(1);
});

Would it make sense to add a command like this one to ndjson-cli? If so, I'd try to put together a proper PR.

@ilyabo
Copy link
Author

ilyabo commented Feb 20, 2019

I realized that there is a simpler way to avoid the Out of memory error. The merge step in which it happens is unnecessary because geo2topo can accept ndjson when run with -n (see this issue).

geo2topo -n $OUTPUT_OBJECT_NAME=<(\
  ndjson-join  <(shp2json -n $INPUT_SHAPEFILE) <(dbf2json -n $INPUT_DBF) \
  | ndjson-map 'Object.assign(d[0], { properties: d[1] })' \
  | ndjson-map -r d3geo=d3-geo 'd.properties.centroid = d3geo.geoCentroid(d), d' \
) \
| toposimplify -f -s $SIMPLIFICATION_THRESHOLD \
| topoquantize 1e5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant