Utility to extract values from a string using a compiled Blueprint
- You might have run into situation, where you want to extract substrings from within a string, that match a regex
- Sometimes, you might need a reverse of Handlebars.java
- You might be trying to transform some jmx metrics
org.apache.kafka.network.prd-001.org.dc.node3
into
name=org.apache.kafka.network
andhost=prd-001.org.dc.node3
This library tries to address the above. You will need to specify a few extraction blueprints, and run your strings through them to extract values. Before you get started with the code, let us first understand blueprints.
From here on, we will refer to all rules for extractions as blueprints.
Before we start, it is important to understand the schema/language of a blueprint string:
A simple blueprint would look like so:
someString${{variableName1:matchString1}}someMoreString${{variableName2:matchString2}}
where:
variableName
s are optional (if not present, then that matched string will be discarded) eg:${{:matchString}}
- If
variableName
is present, the matched string will be put into the variable and returned as part of the result, and removed from the source string. - If absent, the matched string will still be removed from the source string
- If
matchString
could be one of 2 things:- String: which expects an exact match. eg:
${{variable1:value1}}
- Regex: which expects a regex match. eg:
${{variable1:[a-zA-Z]+}}
- String: which expects an exact match. eg:
${{
and}}
are markers for start and end of a variable definition
The following example should give you an idea of what an extraction is (don't get confused seeing the output as a json,
it is just a json representation of the ExtractionResult.java object you will get as result):
Blueprint |
|
Input String |
|
Output |
{
"extractedString": "You are .",
"extractions": {
"adjective": "beautiful"
},
"error": false
} |
Blueprint |
|
Input String |
|
Output |
{
"extractedString": "io.package",
"extractions": {
"domain": "github",
"user": "tushar"
},
"error": false
} |
- Proper error handling
- Allows multiple extractions.
You can extract into multiple variables using the same rule - Supports different types of extractions
- Exact match Variable :
org.${{domain:apache}}.org.dc.node3
The match happens only if the string had exactlyapache
in the specified position. Eg: org.apache.org.dc.node3. This match will be extracted and stored into the variabledomain
- Regex Match Variable :
org.${{domain:[a-z]+}}.org.dc.node3
The match happens to the string the complies to the regex, after the stringorg.
. Eg: org.something.org.dc.node3. This match will be extracted and stored into the variabledomain
- Last Variable :
org.${{domain}}
The match happens to all text that follows the stringorg.
. Eg: org.something.org.dc.node3. This match will be extracted and stored into the variabledomain
- Discarded Regex Variable :
org.${{:[a-z]+}}.org.dc.node3
The match happens to the string the complies to the regex, after the stringorg.
. Eg: org.something.org.dc.node3. This match will be extracted out, but NOT stored into any variable - Discarded Exact Match :
org.${{:apache}}.org.dc.node3
The match happens only if the string had exactlyapache
in the specified position. Eg: org.apache.org.dc.node3. This match will be extracted out, but NOT stored into any variable
- Exact match Variable :
- Skipping regex matched variables
In scenarios where you want to skip variable extractions, for a regex, you can do so by supplying a set of blacklisted variables.With this:Extractor extractor = ExtractorBuilder.newBuilder().blueprint(blueprint) .withSkippedVariable("skipped") .build();
This is ${{skipped:[A-Za-z]+}}. Guns in my ${{place:}}
, the first regex will be skipped - Adding string from a context
In scenarios where you want to add runtime variables from a context map to the final extracted string, you can use the following:With this:Extractor extractor = ExtractorBuilder.newBuilder().blueprint(blueprint) .withContextMappedVariable("context") .build(); // and then pass in a context Map<String, String> during extraction extractor.extractFrom(source, ImmutableMap.of("location", "bangalore"));
This is ${{context:location}}.
, the value of location will be pulled from the map and added to the string - Adding a static string
If you don't want to pull this from a map, but want to pass along a static string, you can do that tooWith this:Extractor extractor = ExtractorBuilder.newBuilder().blueprint(blueprint) .withStaticAttachVariable("attach") .build(); extractor.extractFrom(source);
This is ${{attach:bangalore}}.
, the value bangalore added to the extracted string
- There is a cost associated with regex matching. The more regex variables are matched and extracted, the slower it will be.
- If you use exact match variables, last variables, etc, the cost if far lower. You are better off using this than
a
string.split(delimiter)
approach.
Use the following dependency in your code.
<dependency>
<groupId>io.github.tushar-naik</groupId>
<artifactId>string-extractor</artifactId>
<version>${extractor.version}</version> <!--look for the latest version on top-->
</dependency>
The following shows a simple use-case where you want to extract from a single blueprint
final String blueprint="My name is ${{name:[A-Za-z]+}}";
final StringExtractor stringExtractor=new StringExtractor(blueprint);
// do the above only once in your code, this is essentially a way of compiling the blueprint and the regexes involved
final String source="My name is Tushar"
final ExtractionResult extractionResult=stringExtractor.extractFrom(source);
// You can run the above on several source Strings
The following shows a more complicated use-case where you want to extract from several blueprints. Note that the first match that happens, will be the exrtaction result
extractor=ExtractorBuilder.newBuilder().blueprints(
ImmutableList.of(
"io.github.${{name:[a-z]+\\.[a-z]+}}.stringextractor",
"org.apache.kafka.common.metrics.consumer-node-metrics.consumer-1.${{node:node-[0-9]+}}.outgoing-byte-rate",
"org.perf.service.reminders.${{component:[A-Za-z]+}}.consumed.m5_rate",
"kafkawriter.org.apache.kafka.common.metrics.producer-topic-metrics.kafka-sink_${{host:(stg|prd)-[a-z0-9]+.org.[a-z0-9]+}}.offerengine_source.record-send-total",
"${{service:[^.]+}}.memory.pools.Metaspace.init"));
// do the above only once in your code, this is essentially a way of compiling the blueprints and the regexes involved
ExtractionResult extractionResult1=extractor.extractFrom("io.github.tushar.naik.stringextractor");
ExtractionResult extractionResult2=extractor.extractFrom("org.perf.service.reminders.rabbitmq.consumed.m5_rate");
// You can run the above on several source Strings
Apache License Version 2.0