-
Notifications
You must be signed in to change notification settings - Fork 118
Optimization Examples Using Python
Further improvement of the search performance can be achieved by additional processing as follows.
The default number of initial edges for each node in a default graph (ANNG) is a fixed value 10. To optimize the number, first, the objects should be inserted without building indexes as follows.
# create-optimized-anng.py
import ngtpy
index_path = 'fasttext.anng'
with open('objects.tsv', 'r') as fin:
n, dim = map(int, fin.readline().split())
ngtpy.create(index_path, dimension=dim, distance_type='Cosine')
index = ngtpy.Index(index_path) # open the index
for line in fin:
object = list(map(float, line.rstrip().split('\t')))
index.insert(object) # insert objects
index.save() # save the index
Next, execute the optimization script as follows. This script extracts the optimal number of initial edges for the inserted objects and just set the number into the index profile.
# optimize-anng.py
import ngtpy
index_path = 'fasttext.anng'
optimizer = ngtpy.Optimizer(log_disabled = True)
optimizer.optimize_number_of_edges_for_anng(index_path)
Then, build indexes as follows.
# build-optimized-anng.py
import ngtpy
index_path = 'fasttext.anng'
index = ngtpy.Index(index_path, log_disabled = True)
index.build_index()
index.save()
The next script finally optimizes the search parameters about the explored edges and memory prefetch for the existing indexes. This script does not modify the index data structure.
# optimize-anng-search-parameters.py
import ngtpy
index_path = 'fasttext.anng'
optimizer = ngtpy.Optimizer(log_disabled = True)
optimizer.set_processing_modes(search_parameter_optimization=True, prefetch_parameter_optimization=True, accuracy_table_generation=True)
optimizer.optimize_search_parameters(index_path)
Since an accuracy table which converts expected accuracies into epsilons is generated as well, an expected accuracy value instead of an epsilon value can be specified for search as follows after this optimization.
result = index.search(query_object, expected_accuracy = 0.90)
Above-mentioned ANNG optimization will bring adequate search performance improvement for ANNG. Refining ANNG (RANNG) also improves the search performance, because it improves accuracy of neighboring nodes for each node by searching with each node. However, it is optional because it needs a long processing time. The following command to refine an ANNG must be executed after building the indexes.
# refine-anng.py
import ngtpy
index_path = 'fasttext.anng'
index = ngtpy.Index(path=index_path, log_disabled = True)
index.refine_anng()
index.save()
Since further more improvement of the search performance by using an ONNG can be expected, how to generate ONNG is described.
ONNG generation requires an ANNG with more edges than default initial edges as the excess edges are removed to optimize the graph. ONNG requires an ANNG at least with edges that are optimized as mentioned above. As a result, an ONNG from the ANNG can provide almost the best performance. However, if the best performance is needed, a larger number than the optimized number may be specified. The following example shows the creation of an ANNG with 100 edges.
# create-anng-for-onng.py
import ngtpy
index_path = 'fasttext.anng-100'
with open('objects.tsv', 'r') as fin:
n, dim = map(int, fin.readline().split())
ngtpy.create(index_path, dim, distance_type='Cosine', edge_size_for_creation=100) # create an empty index
index = ngtpy.Index(index_path) # open the index
for line in fin:
object = list(map(float, line.rstrip().split('\t')))
index.insert(object) # insert objects
index.build_index() # build the index
index.save() # save the index
An ONNG can be converted from an ANNG with the ngtpy.Optimizer as follows.
# construct-onng.py
import ngtpy
optimizer = ngtpy.Optimizer()
optimizer.set(num_of_outgoings = 10, num_of_incomings = 100)
optimizer.execute("fasttext.anng", "fasttext.onng")
execute() executes pass adjustment and dynamic degree adjustment optimization. num_of_outgoings and num_of_incomings specify the number of outgoing edges and incoming edges, respectively. In this example, the number of incoming edges is the same as the number of edges for ANNG creation, i.e., 100. However, since the number of real edges of an ANNG is more than the specified number, a larger number of incoming edges can be specified.
The search.py script above must be modified to use the ONNG by replacing 'fasttext.anng' with 'fasttext.onng' on the following line.
index = ngtpy.Index('fasttext.onng') # open the index
Even if the search time increases compared with the result of the first ANNG, the search accuracy of the ONNG should be higher. Therefore, since the epsilon value for an ANNG needs to be increased to acquire higher accuracies, the search time of an ANNG increases more than that of an ONNG.
Command line tool
Python
C++