Skip to content

Commit

Permalink
Merge branch 'main' into esql-div-mod-testing
Browse files Browse the repository at this point in the history
  • Loading branch information
not-napoleon committed Apr 11, 2024
2 parents fa8f5d6 + b11bb27 commit 073f21e
Show file tree
Hide file tree
Showing 278 changed files with 212 additions and 111 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -68,4 +68,4 @@ testfixtures_shared/

# Generated
checkstyle_ide.xml
x-pack/plugin/esql/gen/
x-pack/plugin/esql/src/main/generated-src/generated/
6 changes: 6 additions & 0 deletions docs/changelog/107287.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 107287
summary: Add support for the 'Anonymous IP' database to the geoip processor
area: Ingest Node
type: enhancement
issues:
- 90789
33 changes: 18 additions & 15 deletions docs/reference/ingest/processors/geoip.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ IPv4 or IPv6 address.

[[geoip-automatic-updates]]
By default, the processor uses the GeoLite2 City, GeoLite2 Country, and GeoLite2
ASN GeoIP2 databases from http://dev.maxmind.com/geoip/geoip2/geolite2/[MaxMind], shared under the
ASN IP geolocation databases from http://dev.maxmind.com/geoip/geoip2/geolite2/[MaxMind], shared under the
CC BY-SA 4.0 license. It automatically downloads these databases if your nodes can connect to `storage.googleapis.com` domain and either:

* `ingest.geoip.downloader.eager.download` is set to true
Expand Down Expand Up @@ -38,7 +38,7 @@ field instead.
| Name | Required | Default | Description
| `field` | yes | - | The field to get the ip address from for the geographical lookup.
| `target_field` | no | geoip | The field that will hold the geographical information looked up from the MaxMind database.
| `database_file` | no | GeoLite2-City.mmdb | The database filename referring to a database the module ships with (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or a custom database in the `ingest-geoip` config directory.
| `database_file` | no | GeoLite2-City.mmdb | The database filename referring to one of the automatically downloaded GeoLite2 databases (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or the name of a supported database file in the `ingest-geoip` config directory.
| `properties` | no | [`continent_name`, `country_iso_code`, `country_name`, `region_iso_code`, `region_name`, `city_name`, `location`] * | Controls what properties are added to the `target_field` based on the geoip lookup.
| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document
| `first_only` | no | `true` | If `true` only first found geoip data will be returned, even if `field` contains array
Expand All @@ -47,15 +47,18 @@ field instead.

*Depends on what is available in `database_file`:

* If the GeoLite2 City database is used, then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`, `latitude`, `longitude`
* If a GeoLite2 City or GeoIP2 City database is used, then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`,
and `location`. The fields actually added depend on what has been found and which properties were configured in `properties`.
* If the GeoLite2 Country database is used, then the following fields may be added under the `target_field`: `ip`,
* If a GeoLite2 Country or GeoIP2 Country database is used, then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name` and `continent_name`. The fields actually added depend on what has been found and which properties
were configured in `properties`.
* If the GeoLite2 ASN database is used, then the following fields may be added under the `target_field`: `ip`,
`asn`, `organization_name` and `network`. The fields actually added depend on what has been found and which properties were configured
in `properties`.
* If the GeoIP2 Anonymous IP database is used, then the following fields may be added under the `target_field`: `ip`,
`hosting_provider`, `tor_exit_node`, `anonymous_vpn`, `anonymous`, `public_proxy`, and `residential_proxy`. The fields actually added
depend on what has been found and which properties were configured in `properties`.


Here is an example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field:
Expand Down Expand Up @@ -109,7 +112,7 @@ Which returns:

Here is an example that uses the default country database and adds the
geographical information to the `geo` field based on the `ip` field. Note that
this database is included in the module. So this:
this database is downloaded automatically. So this:

[source,console]
--------------------------------------------------
Expand Down Expand Up @@ -316,14 +319,14 @@ GET /my_ip_locations/_search
////

[[manage-geoip-database-updates]]
==== Manage your own GeoIP2 database updates
==== Manage your own IP geolocation database updates

If you can't <<geoip-automatic-updates,automatically update>> your GeoIP2
databases from the Elastic endpoint, you have a few other options:
If you can't <<geoip-automatic-updates,automatically update>> your IP geolocation databases
from the Elastic endpoint, you have a few other options:

* <<use-proxy-geoip-endpoint,Use a proxy endpoint>>
* <<use-custom-geoip-endpoint,Use a custom endpoint>>
* <<manually-update-geoip-databases,Manually update your GeoIP2 databases>>
* <<manually-update-geoip-databases,Manually update your IP geolocation databases>>

[[use-proxy-geoip-endpoint]]
**Use a proxy endpoint**
Expand Down Expand Up @@ -375,7 +378,7 @@ settings API>> to set
<<ingest-geoip-downloader-poll-interval,`ingest.geoip.downloader.poll.interval`>>.

[[manually-update-geoip-databases]]
**Manually update your GeoIP2 databases**
**Manually update your IP geolocation databases**

. Use the <<cluster-update-settings,cluster update settings API>> to set
`ingest.geoip.downloader.enabled` to `false`. This disables automatic updates
Expand Down Expand Up @@ -414,22 +417,22 @@ Note that these settings are node settings and apply to all `geoip` processors,
[[ingest-geoip-downloader-enabled]]
`ingest.geoip.downloader.enabled`::
(<<dynamic-cluster-setting,Dynamic>>, Boolean)
If `true`, {es} automatically downloads and manages updates for GeoIP2 databases
If `true`, {es} automatically downloads and manages updates for IP geolocation databases
from the `ingest.geoip.downloader.endpoint`. If `false`, {es} does not download
updates and deletes all downloaded databases. Defaults to `true`.

[[ingest-geoip-downloader-eager-download]]
`ingest.geoip.downloader.eager.download`::
(<<dynamic-cluster-setting,Dynamic>>, Boolean)
If `true`, {es} downloads GeoIP2 databases immediately, regardless of whether a
If `true`, {es} downloads IP geolocation databases immediately, regardless of whether a
pipeline exists with a geoip processor. If `false`, {es} only begins downloading
the databases if a pipeline with a geoip processor exists or is added. Defaults
to `false`.

[[ingest-geoip-downloader-endpoint]]
`ingest.geoip.downloader.endpoint`::
(<<static-cluster-setting,Static>>, string)
Endpoint URL used to download updates for GeoIP2 databases. For example, `https://myDomain.com/overview.json`.
Endpoint URL used to download updates for IP geolocation databases. For example, `https://myDomain.com/overview.json`.
Defaults to `https://geoip.elastic.co/v1/database`. {es} stores downloaded database files in
each node's <<es-tmpdir,temporary directory>> at `$ES_TMPDIR/geoip-databases/<node_id>`.
Note that {es} will make a GET request to `${ingest.geoip.downloader.endpoint}?elastic_geoip_service_tos=agree`,
Expand All @@ -440,6 +443,6 @@ The GeoIP downloader uses the JDK's builtin cacerts. If you're using a custom en
[[ingest-geoip-downloader-poll-interval]]
`ingest.geoip.downloader.poll.interval`::
(<<dynamic-cluster-setting,Dynamic>>, <<time-units,time value>>)
How often {es} checks for GeoIP2 database updates at the
How often {es} checks for IP geolocation database updates at the
`ingest.geoip.downloader.endpoint`. Must be greater than `1d` (one day). Defaults
to `3d` (three days).
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ enum Database {
Property.LOCATION
),
Set.of(
Property.CONTINENT_NAME,
Property.COUNTRY_NAME,
Property.COUNTRY_ISO_CODE,
Property.COUNTRY_NAME,
Property.CONTINENT_NAME,
Property.REGION_ISO_CODE,
Property.REGION_NAME,
Property.CITY_NAME,
Expand All @@ -55,11 +55,31 @@ enum Database {
Asn(
Set.of(Property.IP, Property.ASN, Property.ORGANIZATION_NAME, Property.NETWORK),
Set.of(Property.IP, Property.ASN, Property.ORGANIZATION_NAME, Property.NETWORK)
),
AnonymousIp(
Set.of(
Property.IP,
Property.HOSTING_PROVIDER,
Property.TOR_EXIT_NODE,
Property.ANONYMOUS_VPN,
Property.ANONYMOUS,
Property.PUBLIC_PROXY,
Property.RESIDENTIAL_PROXY
),
Set.of(
Property.HOSTING_PROVIDER,
Property.TOR_EXIT_NODE,
Property.ANONYMOUS_VPN,
Property.ANONYMOUS,
Property.PUBLIC_PROXY,
Property.RESIDENTIAL_PROXY
)
);

private static final String CITY_DB_SUFFIX = "-City";
private static final String COUNTRY_DB_SUFFIX = "-Country";
private static final String ASN_DB_SUFFIX = "-ASN";
private static final String ANONYMOUS_IP_DB_SUFFIX = "-Anonymous-IP";

/**
* Parses the passed-in databaseType (presumably from the passed-in databaseFile) and return the Database instance that is
Expand All @@ -79,6 +99,8 @@ public static Database getDatabase(final String databaseType, final String datab
database = Database.Country;
} else if (databaseType.endsWith(Database.ASN_DB_SUFFIX)) {
database = Database.Asn;
} else if (databaseType.endsWith(Database.ANONYMOUS_IP_DB_SUFFIX)) {
database = Database.AnonymousIp;
}
}

Expand Down Expand Up @@ -147,7 +169,13 @@ enum Property {
LOCATION,
ASN,
ORGANIZATION_NAME,
NETWORK;
NETWORK,
HOSTING_PROVIDER,
TOR_EXIT_NODE,
ANONYMOUS_VPN,
ANONYMOUS,
PUBLIC_PROXY,
RESIDENTIAL_PROXY;

/**
* Parses a string representation of a property into an actual Property instance. Not all properties that exist are
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import com.maxmind.db.Reader;
import com.maxmind.geoip2.DatabaseReader;
import com.maxmind.geoip2.model.AbstractResponse;
import com.maxmind.geoip2.model.AnonymousIpResponse;
import com.maxmind.geoip2.model.AsnResponse;
import com.maxmind.geoip2.model.CityResponse;
import com.maxmind.geoip2.model.CountryResponse;
Expand Down Expand Up @@ -169,6 +170,12 @@ public AsnResponse getAsn(InetAddress ipAddress) {
return getResponse(ipAddress, DatabaseReader::tryAsn);
}

@Nullable
@Override
public AnonymousIpResponse getAnonymousIp(InetAddress ipAddress) {
return getResponse(ipAddress, DatabaseReader::tryAnonymousIp);
}

boolean preLookup() {
return currentUsages.updateAndGet(current -> current < 0 ? current : current + 1) > 0;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

package org.elasticsearch.ingest.geoip;

import com.maxmind.geoip2.model.AnonymousIpResponse;
import com.maxmind.geoip2.model.AsnResponse;
import com.maxmind.geoip2.model.CityResponse;
import com.maxmind.geoip2.model.CountryResponse;
Expand Down Expand Up @@ -53,6 +54,9 @@ public interface GeoIpDatabase {
@Nullable
AsnResponse getAsn(InetAddress ipAddress);

@Nullable
AnonymousIpResponse getAnonymousIp(InetAddress ipAddress);

/**
* Releases the current database object. Called after processing a single document. Databases should be closed or returned to a
* resource pool. No further interactions should be expected.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
package org.elasticsearch.ingest.geoip;

import com.maxmind.db.Network;
import com.maxmind.geoip2.model.AnonymousIpResponse;
import com.maxmind.geoip2.model.AsnResponse;
import com.maxmind.geoip2.model.CityResponse;
import com.maxmind.geoip2.model.CountryResponse;
Expand Down Expand Up @@ -172,6 +173,7 @@ private Map<String, Object> getGeoData(GeoIpDatabase geoIpDatabase, String ip) t
case City -> retrieveCityGeoData(geoIpDatabase, ipAddress);
case Country -> retrieveCountryGeoData(geoIpDatabase, ipAddress);
case Asn -> retrieveAsnGeoData(geoIpDatabase, ipAddress);
case AnonymousIp -> retrieveAnonymousIpGeoData(geoIpDatabase, ipAddress);
};
}

Expand Down Expand Up @@ -340,6 +342,46 @@ private Map<String, Object> retrieveAsnGeoData(GeoIpDatabase geoIpDatabase, Inet
return geoData;
}

private Map<String, Object> retrieveAnonymousIpGeoData(GeoIpDatabase geoIpDatabase, InetAddress ipAddress) {
AnonymousIpResponse response = geoIpDatabase.getAnonymousIp(ipAddress);
if (response == null) {
return Map.of();
}

boolean isHostingProvider = response.isHostingProvider();
boolean isTorExitNode = response.isTorExitNode();
boolean isAnonymousVpn = response.isAnonymousVpn();
boolean isAnonymous = response.isAnonymous();
boolean isPublicProxy = response.isPublicProxy();
boolean isResidentialProxy = response.isResidentialProxy();

Map<String, Object> geoData = new HashMap<>();
for (Property property : this.properties) {
switch (property) {
case IP -> geoData.put("ip", NetworkAddress.format(ipAddress));
case HOSTING_PROVIDER -> {
geoData.put("hosting_provider", isHostingProvider);
}
case TOR_EXIT_NODE -> {
geoData.put("tor_exit_node", isTorExitNode);
}
case ANONYMOUS_VPN -> {
geoData.put("anonymous_vpn", isAnonymousVpn);
}
case ANONYMOUS -> {
geoData.put("anonymous", isAnonymous);
}
case PUBLIC_PROXY -> {
geoData.put("public_proxy", isPublicProxy);
}
case RESIDENTIAL_PROXY -> {
geoData.put("residential_proxy", isResidentialProxy);
}
}
}
return geoData;
}

/**
* Retrieves and verifies a {@link GeoIpDatabase} instance for each execution of the {@link GeoIpProcessor}. Guards against missing
* custom databases, and ensures that database instances are of the proper type before use.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,39 @@ public void testAsn() throws Exception {
assertThat(geoData.get("network"), equalTo("82.168.0.0/14"));
}

public void testAnonymmousIp() throws Exception {
String ip = "81.2.69.1";
GeoIpProcessor processor = new GeoIpProcessor(
randomAlphaOfLength(10),
null,
"source_field",
loader("/GeoIP2-Anonymous-IP-Test.mmdb"),
() -> true,
"target_field",
ALL_PROPERTIES,
false,
false,
"filename"
);

Map<String, Object> document = new HashMap<>();
document.put("source_field", ip);
IngestDocument ingestDocument = RandomDocumentPicks.randomIngestDocument(random(), document);
processor.execute(ingestDocument);

assertThat(ingestDocument.getSourceAndMetadata().get("source_field"), equalTo(ip));
@SuppressWarnings("unchecked")
Map<String, Object> geoData = (Map<String, Object>) ingestDocument.getSourceAndMetadata().get("target_field");
assertThat(geoData.size(), equalTo(7));
assertThat(geoData.get("ip"), equalTo(ip));
assertThat(geoData.get("hosting_provider"), equalTo(true));
assertThat(geoData.get("tor_exit_node"), equalTo(true));
assertThat(geoData.get("anonymous_vpn"), equalTo(true));
assertThat(geoData.get("anonymous"), equalTo(true));
assertThat(geoData.get("public_proxy"), equalTo(true));
assertThat(geoData.get("residential_proxy"), equalTo(true));
}

public void testAddressIsNotInTheDatabase() throws Exception {
GeoIpProcessor processor = new GeoIpProcessor(
randomAlphaOfLength(10),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,16 @@
*/
public class MaxMindSupportTests extends ESTestCase {

private static final Set<String> ANONYMOUS_IP_SUPPORTED_FIELDS = Set.of(
"anonymous",
"anonymousVpn",
"hostingProvider",
"publicProxy",
"residentialProxy",
"torExitNode"
);
private static final Set<String> ANONYMOUS_IP_UNSUPPORTED_FIELDS = Set.of("ipAddress", "network");

private static final Set<String> ASN_SUPPORTED_FIELDS = Set.of("autonomousSystemNumber", "autonomousSystemOrganization", "network");
private static final Set<String> ASN_UNSUPPORTED_FIELDS = Set.of("ipAddress");

Expand Down Expand Up @@ -192,6 +202,8 @@ public class MaxMindSupportTests extends ESTestCase {
);

private static final Map<Database, Set<String>> TYPE_TO_SUPPORTED_FIELDS_MAP = Map.of(
Database.AnonymousIp,
ANONYMOUS_IP_SUPPORTED_FIELDS,
Database.Asn,
ASN_SUPPORTED_FIELDS,
Database.City,
Expand All @@ -200,6 +212,8 @@ public class MaxMindSupportTests extends ESTestCase {
COUNTRY_SUPPORTED_FIELDS
);
private static final Map<Database, Set<String>> TYPE_TO_UNSUPPORTED_FIELDS_MAP = Map.of(
Database.AnonymousIp,
ANONYMOUS_IP_UNSUPPORTED_FIELDS,
Database.Asn,
ASN_UNSUPPORTED_FIELDS,
Database.City,
Expand All @@ -208,6 +222,8 @@ public class MaxMindSupportTests extends ESTestCase {
COUNTRY_UNSUPPORTED_FIELDS
);
private static final Map<Database, Class<? extends AbstractResponse>> TYPE_TO_MAX_MIND_CLASS = Map.of(
Database.AnonymousIp,
AnonymousIpResponse.class,
Database.Asn,
AsnResponse.class,
Database.City,
Expand All @@ -217,7 +233,6 @@ public class MaxMindSupportTests extends ESTestCase {
);

private static final Set<Class<? extends AbstractResponse>> KNOWN_UNSUPPORTED_RESPONSE_CLASSES = Set.of(
AnonymousIpResponse.class,
ConnectionTypeResponse.class,
DomainResponse.class,
EnterpriseResponse.class,
Expand Down
Binary file not shown.
Loading

0 comments on commit 073f21e

Please sign in to comment.