A simple parser and lookup library for MaxMind DB files
MMDB is licensed under GNU's GPLv2 or, at your choice, any later version.
MMDB is Copyright © 2019 Francisco Blas Izquierdo Riera (klondike)
MMDB started as a weekend project in order to fill the licensing gap provided by the official MaxMind DB libraries provided by MaxMind. These libraries are licensed under the Apache2.0 license and can't, therefore, be used with GPLv2 software.
MMDB is designed to be reasonably small at only around 800 lines of code (yeah, the format is a bit complex). It tries to keep the memory (virtual and physical) fingerprint low, so it will not try to mmap the DB files or anything similar, instead it will use the "portable" C file API in order to access the file using fseek calls to jump to the right place in the file. Sadly 64-bit support for fseek isn't portable, mmdb tries to solve that providing a small shim at compilation time emulating fseeko for Windows platforms although it hasn't been well tested.
MMDB was implemented exclusively following the specification at https://maxmind.github.io/MaxMind-DB/ and using MaxMind's GeoLite databases to test the implementation. This approach was chosen to ensure the GPL licensed code is clean from Apache2.0 code.
You can use the library by adding mmdb.c to your sources and including mmdb.h into your own code.
If you want to use the (faster) mmap based API at the expense of the database always taking some of the memory space until closed you can do so by defining MMDB_USE_MMAP when compiling mmdb.c (the backend is hidden from the rest of the code).
MMDB provides a simple API. It currently has no dependencies and can be easily integrated into your own GPL project.
mmdb_type_t
All returned database objects are of type mmdb_type_t
, this object contains an mmdb_type_enum
called type
with the specific type of object and a mmdb_type_union
called data
with the
specific data structure.
All objects are presented with an alias to an equivalent C99 type (when possible) and in the endian exposed at compile time.
Below the most common types are described but you can always check the mmdb.h file for extra details. In general you should consider any type not described here as a hint that the database may be inconsistent or a bug may exist in this software.
mmdb_bool_t
MMDB_BOOL
u_bool
Booleans can be either true or false.
Their type is mmdb_bool_t
, their mmdb_type_enum
value is MMDB_BOOL
and their
mmdb_type_union
member is called u_bool
.
mmdb_double_t
MMDB_DOUBLE
u_double
Doubles represent an IEEE 754 double precission floating point number.
Their type is mmdb_double_t
, their mmdb_type_enum
value is MMDB_DOUBLE
and their
mmdb_type_union
member is called u_double
.
mmdb_float_t
MMDB_FLOAT
u_float
Floats represent an IEEE 754 single precission floating point number.
Their type is mmdb_float_t
, their mmdb_type_enum
value is MMDB_FLOAT
and their
mmdb_type_union
member is called u_float
.
mmdb_uint16_t
MMDB_UINT16
u_uint16
Uint16s represents 16-bit unsigned integers ranging from 0 to 2^16-1.
Their type is mmdb_uint16_t
, their mmdb_type_enum
value is MMDB_UINT16
and their
mmdb_type_union
member is called u_uint16
.
mmdb_uint32_t
MMDB_UINT32
u_uint32
Uint32s represent 32-bit unsigned integers ranging from 0 to 2^32-1.
Their type is mmdb_uint32_t
, their mmdb_type_enum
value is MMDB_UINT32
and their
mmdb_type_union
member is called u_uint32
.
mmdb_int32_t
MMDB_INT32
u_int32
Int32s represent 32-bit two's complement signed integers ranging from -2^31 to 2^31-1.
Their type is mmdb_int32_t
, their mmdb_type_enum
value is MMDB_INT32
and their
mmdb_type_union
member is called u_int32
.
mmdb_uint64_t
MMDB_UINT64
u_uint64
Uint64s represent 64-bit unsigned integers ranging from 0 to 2^64-1.
Their type is mmdb_uint64_t
, their mmdb_type_enum
value is MMDB_UINT64
and their
mmdb_type_union
member is called u_uint64
.
As portable C99 code cannot guarantee that such a long type is available, the data is
represented in an array with 8 uint8_t
elements that can be reached through the data
member of the resulting structure. This may change in the future.
mmdb_uint128_t
MMDB_UINT128
u_uint128
Uint128s represent 128-bit unsigned integers ranging from 0 to 2^128-1.
Their type is mmdb_uint128_t
, their mmdb_type_enum
value is MMDB_UINT128
and their
mmdb_type_union
member is called u_uint128
.
As portable C99 code cannot guarantee that such a long type is available, the data is
represented in an array with 16 uint8_t
elements that can be reached through the data
member of the resulting structure. This may change in the future.
mmdb_string_t
MMDB_STRING
u_string
Strings represent a series of UTF-8 characters as the individual bytes of the representation.
Their type is mmdb_string_t
, their mmdb_type_enum
value is MMDB_STRING
and their
mmdb_type_union
member is called u_string
.
Their length in bytes (excluding the final '\0') is represented by the length
member of
the structure and the '\0' terminated array with the specific string can be found on the
data
member.
Keep in mind that although the final '\0' is provided for simplicity, the MaxMind DB format specficification doesn't guarantee that strings will not contain NULL characters in the middle (represented as '\0' in UTF-8), because of this, using the length when handling strings is recommended unless you are certain the data doesn't contain NULL characters.
mmdb_bytes_t
MMDB_BYTES
u_bytes
Bytes represent arbitrary binary data.
Their type is mmdb_bytes_t
, their mmdb_type_enum
value is MMDB_BYTES
and their
mmdb_type_union
member is called u_bytes
.
Their length in bytes is represented by the length
member of the structure and the
specific data is contained as an array of uint8_t
elements by the data
member.
mmdb_array_t
MMDB_ARRAY
u_array
Arrays represent a zero-indexed ordered collection of objects.
Their type is mmdb_array_t
, their mmdb_type_enum
value is MMDB_ARRAY
and their
mmdb_type_union
member is called u_array
.
The number of objects they contain is specified by the length
member of the structure
and the specific objects are contained as an array of mmbd_type_t
elements by the
entries
member.
This data structure may change in the future.
mmdb_map_t
MMDB_MAP
u_map
Maps represent a maping of strings to objects.
Their type is mmdb_map_t
, their mmdb_type_enum
value is MMDB_MAP
and their
mmdb_type_union
member is called u_map
.
The number of entires they contain is specified by the length
member of the structure,
the keys are contained as an array of mmdb_string_t
elements by the keys
member and
the specific valies are contained as an array of mmbd_type_t
elements by the
values
member.
This data structure may change in the future.
mmdb_type_t * mmdb_array_get(const mmdb_type_t * array, mmdb_length_t pos)
Array elements can be accessed using mmdb_array_get
the first parameter is the array
itself and the second the position at which the desired object is.
This function will return a pointer to the object in the array if the object is found or
a NULL
pointer otherwise or if a problem happens. Since the returned object is a reference
to the one in the array, it shouldn't be freed by mmdb_type_free
and will be freed
if the array containing the object is freed.
mmdb_type_t * mmdb_map_get(const mmdb_type_t * map, const char * key, size_t len)
mmdb_type_t * mmdb_map_gets(const mmdb_type_t * map, const char * key)
Map elements can be accessed using either mmdb_map_get
or mmdb_map_gets
. The first
parameter is the map itself and the second one the key to extract from the map.
mmdb_map_gets
expects the key to be terminated by a '\0' (i.e. be a c-style string)
whilst mmdb_map_get
requires the string length as its third parameter.
These functions will return a pointer to the object in the map if found or a NULL
pointer otherwise or if a problem happens. Since the returned object is a reference
to the one in the map, it shouldn't be freed by mmdb_type_free
and will be freed
if the map containing the object is freed.
Internally mmdb_map_gets
uses mmdb_map_get
.
The current implementation is rather slow as it first compares the string lengths and then their data with each key until it finds a match. This is done to keep the memory footprint small. In the future other data structures may be used instead.
void mmdb_print(const mmdb_type_t * lr)
The mmdb_print
function allows displaying objects on standard output. It takes as input
the object to be printed and will output it in a JSON-like format.
mmdb_t * mmdb_open(const char * path)
To open a database use mmdb_open
. This function will open the file at path, extract the
metadata needed for the operation of the system and return a pointer to an mmdb_t
object
that can be used in subsequent calls. If any problem is encountered, it will instead return
a NULL pointer. The resulting object must be freed with a call to mmdb_close
if it won't be
used anymore.
Please note that the mmdb_t
structure is kept opaque on purpose as it shouldn't be tampered
with outside of mmdb.
MMDB_MAX_DEPTH
void mmdb_set_max_depth(mmdb_t * db, uint32_t max_depth)
The MaxMind DB format theorically allows an array or map to contain a pointer to themselves. This would result in an infinite loop that would eventually exhaust the stack or the memory of the system. In order to prevent this, this library sets a maximum parsing depth when reading data from the database, any reads deeper than this will fail.
By default this depth is 16 but can been changed by setting the value of the macro
MMDB_MAX_DEPTH
at compile time.
Alternatively you can change the depth at runtime by calling mmdb_set_max_depth
for the
target opened DB file.
mmdb_type_t * mmdb_read_metadata(const mmdb_t * db)
To read the database metadata call mmdb_read_metadata
with a pointer to the target
mmdb_t
database.
This function will return a pointer to a mmdb_type_t
map if sucessful or a NULL
pointer if a problem happens. This object must be freed with mmdb_type_free
when it
won't be used anymore.
mmdb_type_t * mmdb_lookup4(const mmdb_t * db, const uint8_t ip[4])
mmdb_type_t * mmdb_lookup6(const mmdb_t * db, const uint8_t ip[16])
To look up ip addresses call mmdb_lookup4
or mmdb_lookup6
passing a pointer
to the target mmdb_t
database as first argument and a pointer to the binary
representation of the IP as the second one. IP addresses must be in network order
(that is big endian order) when looked up and be exaclty 4 octets long for mmdb_lookup4
and 16 for mmdb_lookup6
.
mmdb_lookup4
is used to look up IPv4 addresses whilst mmdb_lookup6
is used to look
up IPv6 addresses (or any IPv4 mapped in IPv6 address). IPv4 lookups on IPv6 databases
will succeed as the function will automatically map the IP as specified on the MaxMind DB
specification, on the other hand IPv6 lookups in IPv4 databases will fail.
These functions will return a pointer to a mmdb_type_t
object if sucessful or a NULL
pointer if a problem happens. This object must be freed with mmdb_type_free
when it
won't be used anymore.
void mmdb_type_free(mmdb_type_t * data)
The function mmdb_type_free
should be used to clean up the results of mmdb_read_metadata
,
mmdb_lookup4
and mmdb_lookup6
. This function will recursively free all the objects contained
by this object.
Keep in mind that any pointers to objects contained by the resulting object will be invalid once you do this. This includes any objects extracted from maps or arrays. Thus, this function shoudln't be called until you can guarantee that the object and any objects it contains won't be used anymore.
When using this library to extract specific data, ensure you make a copy of the data you are going to return BEFORE calling this method.
void mmdb_close(mmdb_t * db)
The function mmdb_close
will close the database and free all resources associated with it (other
than any objects returned by metadata or ip look ups). Any pointers to the specific mmdb_t
object
will become invalid once this happens.
You should call it when you are done using the database or to refresh the database after an update.
void mmdb_threadsafe(void)
The function mmdb_threadsafe
will return 1 when the backend uses thread-safe accesses to the
database and 0 when it does not.
You should use this in order to detect if mmdb_t objects can be shared accross threads or not.
MMDB will likely fail in unpredictable ways if a database file is updated while the database is open.
This is caused by MMDB caching the location of the metadata and data sections and some of the metadata
when calling mmdb_open
.
If database updates are needed the connections should first be closed with mmdb_close
and new ones
opened with mmdb_open
. Keep in mind that this will make any old pointers to the old mmdb_t
objects
invalid.
A simple example program is provided in the example.c file. Once compiled with mmdb it can be used to extract data from a MaxMind DB file. To execute it just pass the file as first argument followed by the ip addresses to be looked up as individual arguments.
The program will return a JSON-like structure with the results of each lookup or the relevant errors if needed.
Defensive programming techniques have been used whilst developing this tool, including clear failure paths able to clean up after themselves and avoiding complex use of pointer logic. Despite that this liibrary hasn't been thoroughly tested so there is a risk that security problem may be present in it. The author isn't responsible in any case of any such issues.
Additionally, some other issues are already known and unavoidable given the way in which this library is implemented, these are documented in the following sections.
The default of 16 is safe but users may set a depth of up to 2^32-1. This will result in a stack overflow with some DB payloads, specially any containing a recursive structure, i.e. a map or array pointing to itself.
This way in which the DB format is specified allows for exponential growth attacks, that is, an element may be used twice by an array that will then be used twice by an array, etc. Each new array will, in this way, duplicate the size of the resulting object since objects are always copied when parsed. A similar problem can also happen with maps.
Limiting the parsing depth can reduce the risk of this issue happening in some cases but since objects can have arbitrary lengths will not nullify it.
This problem could be solved by using an object cache and read only objects with reference counting instead of reading new copies from the database each time. This is, for now, outside of the scope of the project though due to the complexity it would entail.
When compiled on systems other than POSIX compliant or Windows, the library uses non-thread safe calls to fseek and fread in order to keep portability high. This means that you must ensure that any calls to any APIs taking an mmdb_t structure as input must be serialized using a mutex.
You will get a compilation warning when the chosen backend is not thread safe and can use
mmdb_threadsafe
to check if the used backend is thread safe or not at runtime.
If you want to use the APIs in parallel with a non thread-safe backend consider using mmdb_open once per thread. This will cost you an additional open file descriptor per thread but address the concurrency problem. Alternatively use a mutex to serialize all accesses to the mmdb_t object accross threads.