Skip to content

Latest commit

 

History

History
105 lines (70 loc) · 5.18 KB

File metadata and controls

105 lines (70 loc) · 5.18 KB

core-xml-schemas Content Pack

Contents

The following represents the folder structure and content that will be imported in to Stroom with this content pack.

analytic-output

This XMLSchema is a data structure for data produced by an analytic of some kind. The functionality for making use of this schema is currently not included in Stroom and will be added at a later date.

annotation

This XMLSchema defines a structure used to represent a single Stroom annotation as XML.

Stroom version 7.0+ supports AnnotationWriter, which is a pipeline element that operates on XML documents of this structure, in order to create the corresponding annotations.

data-splitter

This XMLSchema defines the data used to describe a Data Splitter configuration, i.e the regexes and splits to convert a plain text file format into structured XML.

detection

This XMLSchema defines a structure suitable for representing arbitrary analytic output that might reference pre-existing events within Stroom.

json

This schema comes from w3.org and defines the structure used to represent json data as XML.

There are two versions of this schema:

Data output by the JSON type Parser pipeline element will conform to this XMLSchema. Data input to the JsonWriter pipeline element must conform to this XMLSchema.

kafka-records

This structure is intended as an XML representation of a Kafka producer record. Its intended use is to allow Stroom to publish records onto Kafka topics. Events/records in Stroom can be translated into this format and then passed to a KafkaProducer for the kafka-records to be published onto a topic. The structure allows all aspects of a producer record to be defined, such as the partion, key, headers, etc. using XSLT.

Stroom version 7.0+ supports StandardKafkaProducer, which is a pipeline element that operates on XML documents of this structure, in order to create the corresponding messages on Kafka.

records

This schema defines a structure that is used for holding arbitrary records. It is typically used as the structure of the data output by a Data Splitter type Parser pipeline element. It is also used as the normalised form of the application logs (i.e. SLF4J type logs) generated by Stroom.

reference-data

XMLSchema to provide a common structure for describing reference data. For example a reference data feed may be supplied to Stroom to map IP addresses to fully qualified domain names. This data feed would either be ingested as data conforming to this XMLSchema or converted into it.

Data input to the ReferenceDataFilter pipeline element must conform to this XMLSchema.

statistics

Statistic events in Stroom are an abstraction of the rich event records in Stroom. The idea is to condense part of an event down to a count or value with some qualifying attributes, e.g. the number of bytes in a file upload event, or reducing a rich logon event down to a count of 1 with qualifying attributes for the user and device. These statistic events can then be aggregated in a number of different time buckets for fast querying.

Statistics data can be recorded in two ways in Stroom, either using the internal SQL based statistics store, or by sending the statistic events via Kafka to Stroom-Stats. Each mechanism uses a different version of the Statistics XMLSchema. The appropriate schema version for each statistics store is as follows:

  • SQL Statistics - v2.0.1

  • Stroom-Stats - v4.0.0

SQL Statistics

This statistics store is built in to Stroom. The schema is used to describe a statistics event record. Statistics are used to record counts (or values) of events happening, e.g. the number of a particular kind of event within a time period, or the CPU% of a Stroom node.

Data fed to the StatisticsFilter pipeline element must conform to this XMLSchema.

stroom-stats Statistics

Stroom-Stats is external to Stroom and provides a more scalable and feature rich store for statistics data. The structure of a Stroom-Stats statistic event is broadly similar to a SQL Statistics event, with the addition of some features to support recording references to the source event(s) that contributed to the Statistic event.

Data fed to the statisticEvents-Count and statisticEvents-Value Kafka topics using Stroom's KafkaProducerFilter pipeline element must conform to this XMLSchema.