simultaneous multiple-indexing of PostgreSQL arrays
PostgreSQL arrays allow organization of data into an arbitrary number of dimensions and come with sophisticated primitives to manage this complexity. Given an array A
of n
elements, indexing is achievable with A[i]
, where i
is a 1-indexed address in the array. Furthermore, it is possible to select slices of arrays a
s A[i:j]
, where i
and j
specify the lower and upper bounds of the slice. PostgreSQL provides no efficient way to access multiple non-contiguous elements in its arrays. The standard way to access non-contiguous elements is as multiple fields in the SELECT
clause (i.e. SELECT A[i1], A[i2], A[i3], ...
). Each element selected in this manner causes the implementation to rescan the array. For non-contiguous subsets of indices approaching the length of the array, this results in O(n2) behavior.
Another method for selecting multiple elements is performing a set-returning UNNEST WITH ORDINALITY
operation on the array. This provides the array elements with their associated indices. We can then select an arbitrary subset of the elements by joining and filtering the indices with other data. Finally, we can use the array_agg
function to recompose the result into an array that contains only the elements to select. The entire procedure can be neatly wrapped in a SQL function, and it operates in O(n) time. Despite the superior asymptotic performance, this method is slow, and it is easily outperformed by selecting array elements as multiple fields when the number of fields to access is small.
It is possible to write a C function to overcome this difficulty. It accepts an array into which indexing should be performed and a second array of indices to use. We strove to maintain consistency with other PostgreSQL array primitives and allow as much generality as possible: the function returns SQL NULL
for a NULL
value array or index array, returns NULL
array values corresponding to NULL
or out-of-bounds indices, returns repeated values corresponding to repeated indices, and supports arbitrary index ordering. Since the underlying back-end implementation of PostgreSQL arrays is a C array, once the array is read from storage and unpacked, the only penalties involved in accessing elements are related to cache size and data locality. This function is therefore only slightly slower than native single-indexing via A[i]
.
The following should be performed as root or at least in such a way that the performing user has sufficient privileges to perform the make install
step.
curl -s -S -L https://github.com/rlichtenwalter/pg_array_multi_index/archive/master.zip > pg_array_multi_index.zip unzip pg_array_multi_index.zip (cd pg_array_multi_index-master && make PG_CONFIG=<optional custom pg_config path>) (cd pg_array_multi_index-master && make PG_CONFIG=<optional custom pg_config path> install) (cd ~postgres && sudo -u postgres psql -c 'CREATE EXTENSION pg_array_multi_index;')
testuser=# SELECT array_multi_index( ARRAY[0,1,2,3,4,5,6,7,8,9], ARRAY[1,2,3] ); array_multi_index ------------------- {0,1,2} (1 row)testuser=# SELECT array_multi_index( ARRAY[0,NULL,2,3,4,5,6,7,8,9], ARRAY[1,2,3] ); array_multi_index ------------------- {0,NULL,2} (1 row)
testuser=# SELECT array_multi_index( ARRAY[0,1,2,3,4,5,6,7,8,9], ARRAY[]::INTEGER[] ); array_multi_index ------------------- {} (1 row)
testuser=# SELECT array_multi_index( ARRAY[0,1,2,3,4,5,6,7,8,9], ARRAY[3,5,7,7,7,9] ); array_multi_index ------------------- {2,4,6,6,6,8} (1 row)
testuser=# SELECT array_multi_index( ARRAY[0,1,2,3,4,5,6,7,8,9], ARRAY[2,NULL,4] ); array_multi_index ------------------- {1,NULL,3} (1 row)
testuser=# SELECT array_multi_index( ARRAY[0,1,2,3,4,5,6,7,8,9], NULL ); array_multi_index -------------------
(1 row)
testuser=# SELECT array_multi_index( NULL::INTEGER[], ARRAY[2,4] ); array_multi_index -------------------
(1 row)
testuser=# SELECT array_multi_index( ARRAY['I','work','with','any','array','type.'], ARRAY[1,2,3,4,5,6] ); array_multi_index ------------------------------- {I,work,with,any,array,type.} (1 row)