Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sample Data Set to Kibana #16473

Closed
4 tasks done
alexfrancoeur opened this issue Feb 2, 2018 · 25 comments · Fixed by #17807
Closed
4 tasks done

Add Sample Data Set to Kibana #16473

alexfrancoeur opened this issue Feb 2, 2018 · 25 comments · Fixed by #17807
Assignees
Labels
discuss Feature:Add Data Add Data and sample data feature on Home Team:Platform-Design Team Label for Kibana Design Team. Support the Analyze group of plugins. Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@alexfrancoeur
Copy link

alexfrancoeur commented Feb 2, 2018

Updated on May 3, 2018
In order to improve the getting started experience, we'd like to add sample data to Kibana so users can take Kibana for a test ride

Sample Data

  • We will only have one dataset and set of dashboards, visualizations, etc. to work with. This will not be use case specific and will be as generic as possible
  • It should include geo, timeseries and additional requirements in order to best see "Kibana at a glance"
  • We will likely package Kibana with a small CSV file for this data and utilize a script that will make the data current, possibly adding past / future data (similar to makelogs)
  • Once the tour is defined, we can choose a data set that helps tell this story best. We'll then need to build the appropriate dashboards and visualizations to show off functionality. In order to best define this dataset, we must consider features that will be here in 6.x/7.0

To do

  • Define data set, size of the CSV file we want to package with Kibana and size of the index
  • Create script and define how many days of data we want in the past and future
  • Create dashboards, visualizations, ML jobs, etc.
  • Provide a way to remove sample data and all associated saved objects

Related to #10813
WIP PR: #17807

@alexfrancoeur alexfrancoeur added discuss Feature:Add Data Add Data and sample data feature on Home :Sharing labels Feb 2, 2018
@uric
Copy link

uric commented Feb 14, 2018

subscribe

@deanpeters
Copy link

The idea of the separate tab is appealing. All the more so as I would suspect its visibility could be potentially governed by usage rights.

And now that I think about it, it also leaves us room to grow along various categories within that tab.

@deanpeters
Copy link

Hmmm ... now that I think about it ... if y'all do go with a separate tab for samples, does it make sense to drive it from entries, contributions made to the git repo?

@alexfrancoeur
Copy link
Author

alexfrancoeur commented Mar 12, 2018

This concept continues to evolve. My initial thoughts of having separate use case sample data or separate sample data sets in general may be replaced by a simple opt-in process.

Notes from today's call

  • Rather than focus on specific use case data, we will likely add a single generic data set that can be consumable by all - regardless of use case
  • This may be presented through a simple UI that prompts you to add sample data and take a tour. This would only surface if no data (excluding internal indices) is available from the Elasticsearch cluster Kibana uses
  • If you opt in to sample data, we could take you on a tour that essentially routes you to different aspects of Kibana that shows the sample data. This would require a new generic EUI component
  • We would need a quick and easy way to remove all sample data indices and saved objects. We briefly discussed using a system or sample tag to mark sample data components.
  • While this is more of an implementation detail, we hope to package Kibana with a csv file. If a user opts into adding sample data, we will use a script that ingests past, current and future data as well as visualizations / dashboards.
  • At some point, I'd like to figure out how we can tie X-Pack features into the sample data components and product tour. This may be a second phase. ML would require a lot of data and APM is rather specific.

Next steps

  • Design will put together some quick concepts of what this experience could look like
  • We will work with Product Marketing on the data set and example dashboards
  • We plan to meet again next week on next steps and may iterate through different flows with usability testing

@archanid @AlonaNadler @cchaos @snide

@snide snide self-assigned this Mar 12, 2018
@rhoboat rhoboat self-assigned this Mar 15, 2018
@rhoboat
Copy link

rhoboat commented Mar 22, 2018

/cc @nreese Would you be interested in tag-teaming the engineering on this? We've had two meetings, but we can get you up to speed with the current design/direction.

@snide
Copy link
Contributor

snide commented Mar 22, 2018

Some conversation designs can be found at http://snid.es/0S1j2u1o3a1g . As usual, these are designs from a moment in time and likely to change significantly as this project progresses.

@alexfrancoeur
Copy link
Author

Description and title updated based on the past few conversations. I'll provide a Google Doc tomorrow to start brainstorming tour steps

cc: @AlonaNadler @archanid @snide @cchaos @asawariS @jamiesmith

@alexfrancoeur alexfrancoeur changed the title Add sample data sets to the Add Data UI Add Sample Data Sets and Product Tour to Kibana Mar 23, 2018
@deanpeters
Copy link

I like the idea of a user-guided tour using a small portion of their own data to walk their own small use case.

I guess the real trick here would be answering the question "what guardrails do we put in place" to ensure a positive user experience ... and to avoid individuals floundering with issues in their own data that they perceive as bugs?

This in turn could lead to individuals/evangelists creating their own "how-to" repos with their own data rather than weighing down the product itself.

@alexfrancoeur alexfrancoeur added Team:Platform-Design Team Label for Kibana Design Team. Support the Analyze group of plugins. and removed :Sharing labels Apr 3, 2018
@alexfrancoeur alexfrancoeur changed the title Add Sample Data Sets and Product Tour to Kibana Add Sample Data Set and Product Tour to Kibana Apr 3, 2018
@nreese
Copy link
Contributor

nreese commented Apr 9, 2018

@stacey-gammon and I had a conversation about this. We thought the simplest approach would be to break the sample data set and tour into separate tasks. The first task should be to create a sample data set with some visualizations and dashboards and make that accessible via the existing Add Data section. Then, at a later time, build a guided tour around the data set.

@stacey-gammon suggested using anonymized Elasticon enrollment data. It has the following benefits.

  • it is non-technical
  • it is not tied to any security, logging, or metrics use case.
  • it is easy to understand without a lot of background knowledge in a particular field.
  • its owned by Elastic
  • it has geo and other interesting dimensions to filter with and explore. It would be interesting to show where everyone came from. It would also be interesting to show if price increases skew the registration date histogram.

@alexfrancoeur
Copy link
Author

alexfrancoeur commented Apr 9, 2018

@nreese that was my initial thought as well (adding to Add Data directly). As I understand it, the design team had planned to introduce a Product Tour component as soon as we can commit engineering to work on this feature. Happy to discuss splitting up these tasks.

Anonymized Elasticon is an interesting idea, it definitely checks the a number of boxes for the types of visualizations we can portray. Even if it is anonymized, I don't believe we could share geo points for privacy reasons, these would need to be completely stubbed or randomized. I know @asawariS and @jamiesmith had some ideas around data sets as well. Would be interested in hearing what they've thought about.

@nreese would love to sync up on this, let's see if we can touch base tomorrow. I'll add a quick 30 min slot

@stacey-gammon
Copy link
Contributor

Even if it is anonymized, I don't believe we could share geo points for privacy reasons, these would need to be completely stubbed or randomized.

If we used locations of Elasticon tour stops, that would give us a geo location that wouldn't cause privacy issues.

Funny thing is that I actually thought that was the proposal as I started skimming this issue with sample data - I saw the word "tour" and thought that was the sample data set proposed. :) hehe.

@alexfrancoeur
Copy link
Author

I'd like to propose a sample data set idea - mock flight data. This will provide us with a data set that includes geo data, time series data and a variety of metrics. Flights are also globally understood in just about every culture. This is loosely based off of https://transtats.bts.gov/DL_SelectFields.asp. In order to index a usable and recent amount of data while limiting the size of data we package with Kibana (<10MB), I imagine we'll need to rely heavily on a script to provide mocked data. We can have some fun here with carrier names, reasons for delays, etc. and really "Elasticize" this data set.

As far as visualizations go, this will open the door to coordinate maps, region maps, heatmaps, point series charts, pie charts, metrics / gauges, vega, nested input controls, TSVB, etc.

I threw this example data set together pretty quickly below. Let me know what you think.

{
  "_index": "flights-2018.04.12",
  "_id": "_P7KlWIBe31SPbqmS60L",
  "_version": 1,
  "_score": null,
  "_source": {
    "index": "flights-2018.04.12",
    "@timestamp": "2018-04-12T15:57:40.498Z",
    "FlightDate": "2018-04-12"
    "Carrier": "Kibana Air",
    "FlightNum": "KA12932,
    "TicketPrice": 350.21
    "Origin": "Boston Logan Airport",
    "OriginAirportID": "BOS",
    "OriginAirportSeqID": 1,
    "OriginCityName": "Boston",
    "OriginState": "MA",
    "OriginStateName": "Massachusetts",
    "OriginLocation": {
      "coordinates": {
        "lat": 31.95131917,
        "lon": -85.128925
      },
    "Dest": "San Francisco Airport",
    "DestAirportID": "SFO",
    "DestAirportSeqID": 1,
    "DestCityName": "San Francisco",
    "DestState": "CA",
    "DestStateName": "California",
    "DestLocation": {
      "coordinates": {
        "lat": 31.95131917,
        "lon": -85.128925
      },
      "ScheduledDeptTime": "2018-04-12T12:00:00.000Z",
      "ActualDeptTime": "2018-04-12T13:15:00.000Z",
      "DepDelay": 4500,
      "ScheduledArrTime": "2018-04-12T15:00:00.000Z",
      "ActualArrtTime": "2018-04-12T15:45:00.000Z",
      "ArrDelay": 2700,
      "Cancelled": false,
      "CancelReason": "N/A",
      "ActualElapsedTime": 12600,
      "AirTime": 11000,
      "NumFlights": 1,
      "DistanceMiles": 2704,
      "DistanceKilometers": 4352,
  "sort": [
    1523548660498
  ]
}

@rayafratkina
Copy link
Contributor

Maybe AverageTicketPrice?
Also, are we allowing for flights that do more than one hop?
Not sure I understand what "NumFlights" is...
Do we want to add something about the plane itself?

  • Type (B737, B757, A380, A318 etc)
  • Inventory control number for the plane
  • Year of manufacture
  • Date of last service?

@alexfrancoeur
Copy link
Author

+1 on AverageTicketPrice
NumFlights and FlightSeqID is meant to support multiple flights in a trip. Honestly for the mock data, it may not be necessary. Would love to add some plane details as well.

If we have buy in from the group, I'm sure we can pursue this in more detail. Any thoughts / feedback from @asawariS @jamiesmith @stacey-gammon @nreese @jimgoodwin would be great.

@asawariS
Copy link

I am +1 on this data. Happy to help with creating pretty visualizations :) @alexfrancoeur

@cchaos
Copy link
Contributor

cchaos commented Apr 16, 2018

My one concern with this type of data would be the mapping visualizations. I think the typical way to visualize flights is something like this:

screen shot 2018-04-16 at 12 32 06 pm

and I'm not sure if our maps can handle this yet?

@nreese
Copy link
Contributor

nreese commented Apr 16, 2018

@nyurik Could vega be used to create a flight route map in Kibana?

@jamiesmith
Copy link

jamiesmith commented Apr 16, 2018

@nyurik
Copy link
Contributor

nyurik commented Apr 16, 2018

Yes, and I'm pretty sure I posted a small demo on how to do this somewhere, but might be hard to find. In 6.3, just use the type:map config param to enable dynamic base map, and draw lines between points. See example as @jamiesmith mentioned above.

@alexfrancoeur
Copy link
Author

Either way, we could use region maps / coordinate maps to show most popular destinations, heavily used airports, etc. I agree that this would be an awesome visualization to showcase with flights as well

@nreese nreese mentioned this issue Apr 20, 2018
@alexfrancoeur alexfrancoeur changed the title Add Sample Data Set and Product Tour to Kibana Add Sample Data Set to Kibana May 3, 2018
@alexfrancoeur alexfrancoeur assigned nreese and unassigned rhoboat May 3, 2018
@alexfrancoeur
Copy link
Author

Updated description to only focus on sample data. Follow #18787 for progress on the Product Tour

@alexfrancoeur
Copy link
Author

Yes @nreese!

@timroes timroes added Team:Visualizations Visualization editors, elastic-charts and infrastructure and removed :Sharing labels Sep 13, 2018
@Anduye
Copy link

Anduye commented Apr 16, 2019

How we add sample log to kibana

@Anduye
Copy link

Anduye commented Apr 16, 2019

After we create index in the management setting we need to view sample log in dashboard but log cant display on dashboard so how we view log file on dashboard of kibana file type is modsecurity and index is logstash.json all guy please help me ??

@nreese
Copy link
Contributor

nreese commented Apr 16, 2019

The current implementation of Sample data ships 3 data sets (weblogs, flights, eCommerce) with Kibana.
Follow these instructions to install the built-in sample data sets.

There is not a sample data set for modsecurity at this time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:Add Data Add Data and sample data feature on Home Team:Platform-Design Team Label for Kibana Design Team. Support the Analyze group of plugins. Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

Successfully merging a pull request may close this issue.