High performance Clickhouse PHP library featuring:
- Tiny memory footprint based on static class (hundreds of times less consumption than smi2 client).
- Curl based.
- High level methods to fetch rows, single row, array of scalar values or single value.
- Parametric queries (native Clickhouse SQL injection protection).
- Long queries progress update through callback function.
- Large result sets processing without memory overflow.
- HTTP native compression.
- Error handling.
- Batch data writing.
Clone or download library:
git clone https://github.com/mrcrypster/clickhousy.git
Import lib and use it:
require 'clickhousy/clickhousy.php';
$data = clickhousy::rows('SELECT * FROM table LIMIT 5');
Clickhousy
works over Clickhouse HTTP protocol, so you just need to set connection url (by default it's localhost:8123
):
clickhousy::set_url('http://host:port/'); # no auth
clickhousy::set_url('http://user:password@host:port/'); # auth
And then select database (default
by default):
clickhousy::set_db('my_db');
Use predefined methods to quickly fetch needed data:
clickhousy::rows('SELECT * FROM table'); # -> returns array or associative arrays
clickhousy::row ('SELECT * FROM table LIMIT 1'); # -> returns single row associative array
clickhousy::cols('SELECT id FROM table'); # -> returns array of scalar values
clickhousy::col ('SELECT count(*) FROM table'); # -> returns single scalar value
If your query returns many rows, use reading callback in order not to run out of memory:
clickhousy::query('SELECT * FROM large_table', [], null, null, function($packet) {
// $packet will contain small portion of returning data
foreach ( $packet as $row ) {
// do something with $row (array of each result row values)
print_r($row) # [0 => 'col1 value', 1 => 'col2 value', ...]
}
});
In order to fight SQL injections, you can use parametric queries:
$count = clickhousy::col(
'SELECT count(*) FROM table WHERE age > {age:UInt32} AND category = {cat:String}',
['age' => 30, 'cat' => 'Engineering']
);
It's not a good idea to insert large amounts of data into Clickhouse using single row inserts. Consider using other technologies, like Kafka to ingest data into Clockhouse efficiently. Avoid using client-library inserts in production environments.
With Clickhousy you can insert either named rows:
clickhousy::insert('my_table', [
['id' => 1, 'date' => date('Y-m-d')],
['id' => 2, 'date' => date('Y-m-d')]
]);
Or unnamed rows (in this case, make sure that values order is the same as table columns order):
clickhousy::insert('my_table', [
[1, date('Y-m-d')],
[2, date('Y-m-d')]
]);
If you still need (which you should avoid please) to insert large amounts of data using Clickhousy, do it in batches of at least couple of thousands rows (but be sure to monitor RAM usage by PHP):
$f = fopen('large.csv', 'r');
$batch = [];
while ( $row = fgetcsv($f) ) {
$batch[] = $row;
if ( count($batch) >= 5000 ) {
clickhousy::insert('my_table', $batch);
$batch = [];
}
}
if ( $batch ) {
clickhousy::insert('my_table', $batch);
}
Generic query
method is available for any read or write query:
clickhousy::query('INSERT INTO table(id) VALUES(1)');
clickhousy::query('TRUNCATE TABLE table');
clickhousy::query('SELECT NOW()');
For SELECT
queries it will return resultset together with system information:
$result_set = clickhousy::query('SELECT * FROM numbers(5)');
print_r($result_set);
Will output:
Array
(
[meta] => Array
(
[0] => Array
(
[name] => number
[type] => UInt64
)
)
[data] => Array
(
[0] => Array
(
[number] => 0
)
...
)
[rows] => 2
[rows_before_limit_at_least] => 2
[statistics] => Array
(
[elapsed] => 0.000237397
[rows_read] => 2
[bytes_read] => 16
)
)
Method also supports parametric queries:
$res = clickhousy::query('SELECT * FROM table WHERE age > {age:Int32}', ['age' => $_GET['age']]);
You can insert data from files using $post_buffer
argument pointing to a file:
$res = clickhousy::query('INSERT INTO table', [], '/path/to/tsv.gz');
File should be of gzipped TSV format.
$progress_callback
allows specifying callback function which will be called when query execution progress updates:
clickhousy::query('SELECT uniq(id) FROM huge_table', [], null, function($progress) {
print_r($progress);
});
If the query is long enough, progress function will be called multiple times:
Array
(
[read_rows] => 4716360 # -> currently read rows
[read_bytes] => 37730880
[written_rows] => 0
[written_bytes] => 0
[total_rows_to_read] => 50000000 # -> total rows to read
)
...
Array
(
[read_rows] => 47687640
[read_bytes] => 381501120
[written_rows] => 0
[written_bytes] => 0
[total_rows_to_read] => 50000000
)
It's easy to calculate and print query execution progress:
clickhousy::query('SELECT count(*) FROM large_table WHERE heavy_condition', [], null, function($progress) {
echo round(100 * $progress['read_rows']/$progress['total_rows_to_read']) . '%' . "\n";
});
Which will result in:
7%
17%
28%
39%
50%
61%
72%
83%
94%
Latest query summary is available through:
clickhousy::rows('SELECT * FROM numbers(100000)');
print_r(clickhousy::last_summary());
Which is an associative array of multiple metrics:
Array
(
[read_rows] => 65505
[read_bytes] => 524040
[written_rows] => 0
[written_bytes] => 0
[total_rows_to_read] => 100000
)
By default, clickhousy
will return response with error
key on error, but will not throw any exceptions:
$response = clickhousy::query('SELECT bad_query'); # sample query with error
print_r($response);
Which contains original Clickhouse error message:
Array
(
[error] => Code: 47. DB::Exception: Missing columns: 'bad_query' while processing query: 'SELECT bad_query', required columns: 'bad_query'. (UNKNOWN_IDENTIFIER) (version 22.6.1.1696 (official build))
)
If you want exceptions functionality, you can extend clickhousy
with your own class and override error()
method:
class my_clickhousy_exception extends Exception {};
class my_clickhousy extends clickhousy {
public static function error($msg, $request_info) {
throw new my_clickhousy_exception($msg);
}
}
Then use your own class to get exceptions working:
my_clickhousy::query('SELECT bad_query'); # bad query example
# PHP Fatal error: Uncaught my_clickhousy_exception: Code: 47. DB::Exception: Missing columns: 'bad_query' ...
If you need to access raw response from Clickhouse, you can find it here:
clickhouse::last_response(); # raw text response from Clickhouse after latest query
Curl (which is used for HTTP communication) response info can be accessed via:
clickhouse::last_info(); # response from curl_getinfo() after latest query
There is also log()
method which can be overrided to allow custom logging:
class my_clickhousy extends clickhousy {
public static function log($raw_response, $curl_info) {
error_log('Received response from Clickhouse: ' . $raw_response);
}
}
Based on performance testing, Clickhousy
lib is times faster
and hundreds of times less memory consuming than smi2 lib:
Smi2:
mem: 565
time: 10.3
Clickhousy:
mem: 0.8 688.8x better
time: 2.3 4.4x better