Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sample data] update web log geo.src field to match country code of geo.coordinates #110885

Merged
merged 3 commits into from
Sep 7, 2021

Conversation

nreese
Copy link
Contributor

@nreese nreese commented Sep 1, 2021

Fixes #40401

Web logs sample data has inconsistencies between geo.src, geo.dest and geo. coordinates where in most cases geo. coordinates is not contained in either country denoted by geo.src or geo.dest. This inconsistency makes understanding the web logs samples impossible and diminishes the usefulness of web logs dashboard and maps.

When web logs were originally generated, geo.coordinates were assigned based on a random pick in US airport locations. This assignment did not verify that the src or dest country were US.

This PR fixes the problem by assigning geo.src for each row to "US". That way, the source country and the geo.coordinates are always consistent. Below is the script used to update the data set.

'use strict'

const fs = require('fs');
const readline = require('readline');

async function load () {
  let failures = 0;
  const logs = [];
  const readStream = fs.createReadStream('./logs.json');

  const rl = readline.createInterface({
    input: readStream,
    crlfDelay: Infinity
  });

  for await (const line of rl) {
    try {
      const log = JSON.parse(line);
      // {"srcdest":"IN:US","src":"IN","dest":"US","coordinates":{"lat":39.41042861,"lon":-88.8454325}}
      log.geo.src = 'US';
      log.geo.srcdest = 'US:' + log.geo.dest;
      logs.push(log);
    } catch (err) {
      failures ++;
      console.log(`Unable to parse line, error: ${err.message}`);
    }
  }

  const writeStream = fs.createWriteStream('./updated_logs.json');
  logs.forEach(log => {
    writeStream.write(JSON.stringify(log, null, ''));
  });
  writeStream.end();
  
  console.log(`success: ${logs.length}, failures: ${failures}`);
}

load().catch(console.log);

Why not update geo.coordinates for original geo.src country? Well, Maps has a tutorial where geo.coordinates are reverse geo coded with United States Census Bureau Combined Statistical Area (CSA). Changing geo.coordinates would have broken this tutorial. Also, changing geo.coordinates to match original geo.src country would have been harder to code.

There were a few visualizations that used geo.src. Since geo.src is now always "US", these charts had to be updated to provide more meaningful visualizations. For example, the old sanky diagram kind of sucks when geo.src is always the same value
Screen Shot 2021-09-01 at 1 08 47 PM

Updated charts:
Screen Shot 2021-09-01 at 1 47 09 PM

@nreese nreese added Feature:Add Data Add Data and sample data feature on Home [Deprecated-Use Team:Presentation]Team:Geo Former Team Label for Geo Team. Now use Team:Presentation v8.0.0 release_note:skip Skip the PR/issue when compiling release notes v7.16.0 labels Sep 1, 2021
@nreese nreese requested review from a team as code owners September 1, 2021 20:03
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-gis (Team:Geo)

Copy link
Contributor

@timroes timroes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested sample data dashboard and it seems consistant and visualizations making all sense. thanks for taking care of it. LGTM

@nreese
Copy link
Contributor Author

nreese commented Sep 7, 2021

@elasticmachine merge upstream

Copy link
Contributor

@thomasneirynck thomasneirynck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@nreese nreese added the auto-backport Deprecated - use backport:version if exact versions are needed label Sep 7, 2021
@nreese nreese merged commit daf860c into elastic:master Sep 7, 2021
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Sep 7, 2021
…eo.coordinates (elastic#110885)

* [sample data] update web log geo.src field to match country code of geo.coordinates

* fix functional tests

Co-authored-by: Kibana Machine <[email protected]>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Sep 7, 2021
…eo.coordinates (#110885) (#111434)

* [sample data] update web log geo.src field to match country code of geo.coordinates

* fix functional tests

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Nathan Reese <[email protected]>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Sep 8, 2021
…-link-to-kibana-app

* 'master' of github.com:elastic/kibana: (61 commits)
  [Logs UI] Fix alert previews for thresholds of `0` (elastic#111150)
  [Archive Migration][Partial] discover apps-discover (elastic#110437)
  [APM] Set start date of APM ML job to -4 weeks (elastic#111375)
  [ML] APM Latency Correlations: Code consolidation. (elastic#110790)
  [Discover] Fix indices permission for multiline test (elastic#111284)
  [Detection Rules] Add 7.15 rules (elastic#111464)
  [Security Solution][Endpoint][Host Isolation] Hide isolate host option in alert details rather than disabling (elastic#111064)
  React version of angular license view (elastic#111317)
  [APM] Fix link in readme (elastic#111362)
  [Security Solution] add agent field to generator (elastic#111428)
  [Dashboard] Retain Tags on Quicksave (elastic#111015)
  Reorder App Search ingestion methods (elastic#111361)
  Port performance docs to new docs system. (elastic#111063)
  [Security Solution][RAC] Fixes updatedAt loading bug (elastic#111010)
  [sample data] update web log geo.src field to match country code of geo.coordinates (elastic#110885)
  [Security solution] [Endpoint] Fix bad artifact migration (elastic#111294)
  Fix copy typo. (elastic#111203)
  [build] Remove empty optimize directory (elastic#111393)
  [Maps] fix term join not updating when editing right field (elastic#111030)
  [Fleet] Set default settings in component template instead of the index template (elastic#111197)
  ...

# Conflicts:
#	x-pack/plugins/reporting/public/management/__snapshots__/report_listing.test.tsx.snap
#	x-pack/plugins/reporting/public/management/report_listing.test.tsx
chrisronline pushed a commit to chrisronline/kibana that referenced this pull request Sep 8, 2021
…eo.coordinates (elastic#110885)

* [sample data] update web log geo.src field to match country code of geo.coordinates

* fix functional tests

Co-authored-by: Kibana Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed [Deprecated-Use Team:Presentation]Team:Geo Former Team Label for Geo Team. Now use Team:Presentation Feature:Add Data Add Data and sample data feature on Home release_note:skip Skip the PR/issue when compiling release notes v7.16.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kibana Sample Data is inconsistent
6 participants