Skip to content

Commit

Permalink
feat: dataframe_serializer supports batching (#293)
Browse files Browse the repository at this point in the history
  • Loading branch information
bednar authored Jul 29, 2021
1 parent a26fc4c commit cf21862
Show file tree
Hide file tree
Showing 6 changed files with 369 additions and 191 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

### Features
1. [#281](https://github.com/influxdata/influxdb-client-python/pull/281): `FluxTable`, `FluxColumn` and `FluxRecord` objects have helpful reprs
1. [#293](https://github.com/influxdata/influxdb-client-python/pull/293): `dataframe_serializer` supports batching

### Bug Fixes
1. [#283](https://github.com/influxdata/influxdb-client-python/pull/283): Set proxy server in config file
Expand Down
1 change: 1 addition & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
- [import_data_set.py](import_data_set.py) - How to import CSV file
- [import_data_set_multiprocessing.py](import_data_set_multiprocessing.py) - How to large CSV file by Python Multiprocessing
- [ingest_dataframe_default_tags.py](ingest_dataframe_default_tags.py) - How to ingest DataFrame with default tags
- [ingest_large_dataframe.py](ingest_large_dataframe.py) - How to ingest large DataFrame
- [iot_sensor.py](iot_sensor.py) - How to write sensor data every minute by [RxPY](https://rxpy.readthedocs.io/en/latest/)
- [import_data_set_sync_batching.py](import_data_set_sync_batching.py) - How to use [RxPY](https://rxpy.readthedocs.io/en/latest/) to prepare batches for synchronous write into InfluxDB

Expand Down
69 changes: 69 additions & 0 deletions examples/ingest_large_dataframe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
"""
How to ingest large DataFrame by splitting into chunks.
"""
import logging
import random
from datetime import datetime

from influxdb_client import InfluxDBClient
from influxdb_client.extras import pd, np

"""
Enable logging for DataFrame serializer
"""
loggerSerializer = logging.getLogger('influxdb_client.client.write.dataframe_serializer')
loggerSerializer.setLevel(level=logging.DEBUG)
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter('%(asctime)s | %(message)s'))
loggerSerializer.addHandler(handler)

"""
Configuration
"""
url = 'http://localhost:8086'
token = 'my-token'
org = 'my-org'
bucket = 'my-bucket'

"""
Generate Dataframe
"""
print()
print("=== Generating DataFrame ===")
print()
dataframe_rows_count = 150_000

col_data = {
'time': np.arange(0, dataframe_rows_count, 1, dtype=int),
'tag': np.random.choice(['tag_a', 'tag_b', 'test_c'], size=(dataframe_rows_count,)),
}
for n in range(2, 2999):
col_data[f'col{n}'] = random.randint(1, 10)

data_frame = pd.DataFrame(data=col_data).set_index('time')
print(data_frame)

"""
Ingest DataFrame
"""
print()
print("=== Ingesting DataFrame via batching API ===")
print()
startTime = datetime.now()

with InfluxDBClient(url=url, token=token, org=org) as client:

"""
Use batching API
"""
with client.write_api() as write_api:
write_api.write(bucket=bucket, record=data_frame,
data_frame_tag_columns=['tag'],
data_frame_measurement_name="measurement_name")
print()
print("Wait to finishing ingesting DataFrame...")
print()

print()
print(f'Import finished in: {datetime.now() - startTime}')
print()
Loading

0 comments on commit cf21862

Please sign in to comment.