Lucene Server is an Erlang application that let's you manage and query documents using an in-memory Lucene backend
To start the application just run lucene_server:start().
For questions or general comments regarding the use of this library, please use our public hipchat room.
If you find any bugs or have a problem while using this library, please open an issue in this repo (or a pull request :)).
And you can check all of our open-source projects at inaka.github.io
To add documents use: lucene:add(Docs).
where Docs :: [lucene:doc()]
Each document is a proplist
where the keys can be atoms, strings or binaries and the values can be numbers, atoms, strings or lucene:geo()
.
To delete documents use: lucene:del(Query).
where Query
is written respecting the Lucene Query Syntax.
To find documents according to a query use: lucene:match(Query, PageSize).
, lucene:match(Query, PageSize, SortFields)
or lucene:match(Query, PageSize, SortFields, Timeout).
where:
Query
is written respecting the Lucene Query SyntaxPageSize
is the number of results per page you expect.SortFields
is a list of atoms that will determine the result sort order for equally scored resultsTimeout
is the number of milliseconds to wait for a return. If noTimeout
is specified, it defaults to 5000. Not to have a timeout, you should use the atominfinity
. Both functions may return:- the atom
timeout
if it took more thanTimeout
milliseconds to find the desired docs - the first page of results together with metadata as described below
A results page looks like {Docs::[lucene:doc()], Data::lucene:metadata()}
where:
Docs
is a list of no more thanPageSize
documents that match theQuery
Data
is aproplist
that include the following fields:next_page
: The token used to retrieve the following page (see below), if presenttotal_hits
: How many documents match the query across all pagesfirst_hit
: Which is the position of the first returned doc in the whole set of docs that match the query (e.g. ifPageSize == 5
, for the first pagefirst_hit == 1
; for the page #2,first_hit == 6
; etc.)
To get the following page use: lucene:continue(PageToken, PageSize).
or lucene:continue(PageToken, PageSize, Timeout).
where PageToken
comes from the metadata of the previous page and the rest of the parameters and results have the same types, format and meaning as in lucene:match/2
or lucene:match/4
functions.
Besides what Lucene already offers, Lucene Server provides support for indexing and querying some extra data types:
Atoms are treated as strings: You may add them as values in a document and query them using standard Lucene Query Syntax for strings
Lucene Server lets you store integers and floats and then use them in range queries (i.e. <Field>:[<Min> TO <Max>]
) properly, respecting the field's data type instead of treating them as strings as Lucene does by default.
Lucene Server provides support for managing and querying geo-spatial coordinates (i.e. latitude and longitude pairs).
- To construct a
lucene:geo()
object, use:lucene_utils:geo(Lat, Lng)
whereLat
andLng
are floating point numbers - You can then use it as a value on a
lucene:doc()
- To find documents near a certain point, include the following term in your query:
<Field>.near:<Lat>,<Lng>,<Miles>
. That query will filter documents within a<Miles>
radio of<Lat>,<Lng>
and also will rank results according to that distances (with closer docs ranking higher).
In the same way you can write ".near" queries, you can also write ".erlang" ones. The syntax is <Field>.erlang:<Module>:<Function>[:<Args>]
.
The function Module:Function
is expected to comply with the following spec:
- If no args are provided:
-spec Mod:Fun([term()]) -> [false | float()].
- If Args are provided (and they should be written as a list):
-spec Mod:Fun(type_of_arg1(), type_of_arg2(),... [term()]) -> [false | float()].
The function will be called with the list of values for field Field
and it is expected to return a list of results with the same length of the one received. For each element in the original list, the function may return (in the same place of the new list):
false
if it's not a match- a
float()
representing the score of such a document