Skip to content

rlichtenwalter/pg_array_multi_index

Repository files navigation

pg_array_multi_index

simultaneous multiple-indexing of PostgreSQL arrays

Description

PostgreSQL arrays allow organization of data into an arbitrary number of dimensions and come with sophisticated primitives to manage this complexity. Given an array A of n elements, indexing is achievable with A[i], where i is a 1-indexed address in the array. Furthermore, it is possible to select slices of arrays a s A[i:j], where i and j specify the lower and upper bounds of the slice. PostgreSQL provides no efficient way to access multiple non-contiguous elements in its arrays. The standard way to access non-contiguous elements is as multiple fields in the SELECT clause (i.e. SELECT A[i1], A[i2], A[i3], ...). Each element selected in this manner causes the implementation to rescan the array. For non-contiguous subsets of indices approaching the length of the array, this results in O(n2) behavior.

Another method for selecting multiple elements is performing a set-returning UNNEST WITH ORDINALITY operation on the array. This provides the array elements with their associated indices. We can then select an arbitrary subset of the elements by joining and filtering the indices with other data. Finally, we can use the array_agg function to recompose the result into an array that contains only the elements to select. The entire procedure can be neatly wrapped in a SQL function, and it operates in O(n) time. Despite the superior asymptotic performance, this method is slow, and it is easily outperformed by selecting array elements as multiple fields when the number of fields to access is small.

It is possible to write a C function to overcome this difficulty. It accepts an array into which indexing should be performed and a second array of indices to use. We strove to maintain consistency with other PostgreSQL array primitives and allow as much generality as possible: the function returns SQL NULL for a NULL value array or index array, returns NULL array values corresponding to NULL or out-of-bounds indices, returns repeated values corresponding to repeated indices, and supports arbitrary index ordering. Since the underlying back-end implementation of PostgreSQL arrays is a C array, once the array is read from storage and unpacked, the only penalties involved in accessing elements are related to cache size and data locality. This function is therefore only slightly slower than native single-indexing via A[i].

Installation

The following should be performed as root or at least in such a way that the performing user has sufficient privileges to perform the make install step.

curl -s -S -L https://github.com/rlichtenwalter/pg_array_multi_index/archive/master.zip > pg_array_multi_index.zip
unzip pg_array_multi_index.zip
(cd pg_array_multi_index-master && make PG_CONFIG=<optional custom pg_config path>)
(cd pg_array_multi_index-master && make PG_CONFIG=<optional custom pg_config path> install)
(cd ~postgres && sudo -u postgres psql -c 'CREATE EXTENSION pg_array_multi_index;')

Usage

testuser=# SELECT array_multi_index( ARRAY[0,1,2,3,4,5,6,7,8,9], ARRAY[1,2,3] );
 array_multi_index
-------------------
 {0,1,2}
(1 row)

testuser=# SELECT array_multi_index( ARRAY[0,NULL,2,3,4,5,6,7,8,9], ARRAY[1,2,3] ); array_multi_index ------------------- {0,NULL,2} (1 row)

testuser=# SELECT array_multi_index( ARRAY[0,1,2,3,4,5,6,7,8,9], ARRAY[]::INTEGER[] ); array_multi_index ------------------- {} (1 row)

testuser=# SELECT array_multi_index( ARRAY[0,1,2,3,4,5,6,7,8,9], ARRAY[3,5,7,7,7,9] ); array_multi_index ------------------- {2,4,6,6,6,8} (1 row)

testuser=# SELECT array_multi_index( ARRAY[0,1,2,3,4,5,6,7,8,9], ARRAY[2,NULL,4] ); array_multi_index ------------------- {1,NULL,3} (1 row)

testuser=# SELECT array_multi_index( ARRAY[0,1,2,3,4,5,6,7,8,9], NULL ); array_multi_index -------------------

(1 row)

testuser=# SELECT array_multi_index( NULL::INTEGER[], ARRAY[2,4] ); array_multi_index -------------------

(1 row)

testuser=# SELECT array_multi_index( ARRAY['I','work','with','any','array','type.'], ARRAY[1,2,3,4,5,6] ); array_multi_index ------------------------------- {I,work,with,any,array,type.} (1 row)

About

simultaneous multiple-indexing of PostgreSQL arrays

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published