chore(line_protocol): fix nanosecond timestamp resolution for points #811

sebito91 · 2020-04-08T20:25:57Z

Closes #407.
Closes #650.
Closes #649.
Closes #527.
Closes #489.
Closes #346.
Closes #344.
Closes #340.

This PR merges work done in #407 into the current master without inclusion of external requirements (e.g. pandas). Thanks to @AndreCAndersen and @clslgrnc !

inselbuch · 2020-04-08T21:38:24Z

rockstars!

sebito91 · 2020-04-08T21:39:25Z

Please let me know if it's not working and we'll get it fixed. Release v5.2.4 set to come out Friday, April 10th, 2020.

clslgrnc · 2020-04-09T11:59:35Z

Should release v5.2.4 come with a warning in the unusual case where someone tries to access a point inserted with v5.2.3 from its timestamp?

As an example if I insert a point every millisecond, some are inserted with a wrong timestamp, but it might not be an issue because when I try to retrieve a point at a given timestamp the same error is made and I retrieve the right point. Migrating to v5.2.4 the error is not made anymore and influxdb would return no points at the requested (correct) millisecond (because the actual point is slightly off).

I agree that this is a corner case and probably not how people should use influxdb.

Edit: Actually I am not sure the same error is made when retrieving points, and its probably how this bug was detected

…nfluxdata#811)

cuxcrider · 2020-10-05T05:58:57Z

Hi all,

Thank you for continuing to work on this.

I am finding potentially two issues:

(minor) I can confirm I can read in nanosecond timestamps with the dataframe client but only if I specify epoch = 'ns'. If I do not specify epoch I get microsecond precision. I believe this is contrary to the documentation that says nanosecond is the default.
(more troublesome) The dataframe client will write a nanosecond timestamp, but if there is a timestamp that is within a few hundred nanoseconds it seems to think it is the same timestamp and does not write the additional datapoint. This is strange to me because I do not remember this behavior after some discussions with contributors on Incorrect nanosecond timestamps being written to influxdb #649 , but now if you run my code I posted in Incorrect nanosecond timestamps being written to influxdb #649 you will see that you only get two data points rather than all four. I have tried running Pandas 1.1.2 and 0.23.4 with the same results. I am on Numpy 1.19.1, Python 3.7.9, and Influxdb-python 5.3.0. If you change timestamp '2019-10-04 06:27:19.850557111+00:00' to '2019-10-04 06:27:19.850555111+00:00' then the datapoint is written.

Any thoughts?

Here is my example code you can use to demonstrate the result, just enter in your user, password, host-ip and database name:

from influxdb import InfluxDBClient, DataFrameClient
import numpy as np
import pandas as pd
pd.show_versions()

#I use this make sure I write to influxdb anything that is a number as a float
def df_int_to_float(df):
    try:
        for i in df.select_dtypes('number').columns.values:
            df[i] = df[i].astype('float64')
    except:
        print('cycle not in dataframe')
    return df

###remember to enter your host, user, and password
def main(host = , port='8086'):
    """Instantiate a connection to the InfluxDB."""
    
    user = 
    password = 
    db_name = 
    client = InfluxDBClient(host, port, user, password)
    client.drop_database(db_name) 
    client.create_database(db_name)
    dfclient = DataFrameClient(host, port, user, password, db_name)
            
    for_df_dict = {"nanFloats": [1.1, float('nan') , 3.3, 4.4], "onlyFloats": [1.1, 2.2, 3.3, 4.4], 
                                  "strings":['one_one', 'two_two' ,'three_three', 'four_four']}
    df = pd.DataFrame.from_dict(for_df_dict)
    df['time'] = ['2019-10-04 06:27:19.850557111+00:00', '2019-10-04 06:27:19.850557184+00:00', '2019-10-04 06:27:42.251396864+00:00',
      '2019-10-04 06:27:42.251396974+00:00']
    df['time'] = pd.to_datetime(df['time'], unit='ns')
    df = df.set_index('time')
    df = df_int_to_float(df) 
    #####  df_types just for informational purposes
    df_types_float = df.select_dtypes(include = ['float64']) 
    df_types_bool = df.select_dtypes(include = ['bool'])
    df_types_obj = df.select_dtypes(include = ['object'])
    ########
    dfclient.write_points(df, 'test', time_precision='n')  
    df_dict = dfclient.query('SELECT * FROM \"test\" ', epoch = 'ns')
    
    

if __name__ == '__main__':
    main()

lihaoml · 2020-11-09T03:14:50Z

Hi all,

Thank you for continuing to work on this.

I am finding potentially two issues:

(minor) I can confirm I can read in nanosecond timestamps with the dataframe client but only if I specify epoch = 'ns'. If I do not specify epoch I get microsecond precision. I believe this is contrary to the documentation that says nanosecond is the default.

(more troublesome) The dataframe client will write a nanosecond timestamp, but if there is a timestamp that is within a few hundred nanoseconds it seems to think it is the same timestamp and does not write the additional datapoint. This is strange to me because I do not remember this behavior after some discussions with contributors on Incorrect nanosecond timestamps being written to influxdb #649 , but now if you run my code I posted in Incorrect nanosecond timestamps being written to influxdb #649 you will see that you only get two data points rather than all four. I have tried running Pandas 1.1.2 and 0.23.4 with the same results. I am on Numpy 1.19.1, Python 3.7.9, and Influxdb-python 5.3.0. If you change timestamp '2019-10-04 06:27:19.850557111+00:00' to '2019-10-04 06:27:19.850555111+00:00' then the datapoint is written.

Any thoughts?

Here is my example code you can use to demonstrate the result, just enter in your user, password, host-ip and database name:
from influxdb import InfluxDBClient, DataFrameClient
import numpy as np
import pandas as pd
pd.show_versions()

#I use this make sure I write to influxdb anything that is a number as a float
def df_int_to_float(df):
    try:
        for i in df.select_dtypes('number').columns.values:
            df[i] = df[i].astype('float64')
    except:
        print('cycle not in dataframe')
    return df

###remember to enter your host, user, and password
def main(host = , port='8086'):
    """Instantiate a connection to the InfluxDB."""
    
    user = 
    password = 
    db_name = 
    client = InfluxDBClient(host, port, user, password)
    client.drop_database(db_name) 
    client.create_database(db_name)
    dfclient = DataFrameClient(host, port, user, password, db_name)
            
    for_df_dict = {"nanFloats": [1.1, float('nan') , 3.3, 4.4], "onlyFloats": [1.1, 2.2, 3.3, 4.4], 
                                  "strings":['one_one', 'two_two' ,'three_three', 'four_four']}
    df = pd.DataFrame.from_dict(for_df_dict)
    df['time'] = ['2019-10-04 06:27:19.850557111+00:00', '2019-10-04 06:27:19.850557184+00:00', '2019-10-04 06:27:42.251396864+00:00',
      '2019-10-04 06:27:42.251396974+00:00']
    df['time'] = pd.to_datetime(df['time'], unit='ns')
    df = df.set_index('time')
    df = df_int_to_float(df) 
    #####  df_types just for informational purposes
    df_types_float = df.select_dtypes(include = ['float64']) 
    df_types_bool = df.select_dtypes(include = ['bool'])
    df_types_obj = df.select_dtypes(include = ['object'])
    ########
    dfclient.write_points(df, 'test', time_precision='n')  
    df_dict = dfclient.query('SELECT * FROM \"test\" ', epoch = 'ns')
    
    

if __name__ == '__main__':
    main()

I got the same issue here with v5.3.0 and v5.2.3

I think the bug is in _dataframe_client.py:

replacing
time = ((dataframe.index.to_timestamp().values.astype(np.int64) / precision_factor).astype(np.int64).astype(str))

with
time = ((dataframe.index.to_timestamp().values.astype(np.int64) // precision_factor).astype(np.int64).astype(str))

fixes the issue for me.

Not a python expert, but it looks like in python3 int/int = float

chore(line_protocol): fix nanosecond timestamp resolution for points

aaf8442

sebito91 requested review from aviau and xginn8 as code owners April 8, 2020 20:25

sebito91 self-assigned this Apr 8, 2020

sebito91 removed request for aviau and xginn8 April 8, 2020 20:26

sebito91 merged commit 04205cf into master Apr 8, 2020

sebito91 deleted the merge_407 branch April 8, 2020 21:17

ocworld pushed a commit to AhnLab-OSS/influxdb-python that referenced this pull request Apr 13, 2020

chore(line_protocol): fix nanosecond timestamp resolution for points (i…

62436d4

…nfluxdata#811)

bednar mentioned this pull request Jul 13, 2020

feat: use microseconds resolutions for data points influxdata/influxdb-client-python#132

Merged

6 tasks

svhb1000 mentioned this pull request Apr 15, 2021

timestamp rounding error dataframe client #885

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(line_protocol): fix nanosecond timestamp resolution for points #811

chore(line_protocol): fix nanosecond timestamp resolution for points #811

sebito91 commented Apr 8, 2020

inselbuch commented Apr 8, 2020

sebito91 commented Apr 8, 2020

clslgrnc commented Apr 9, 2020 •

edited

Loading

cuxcrider commented Oct 5, 2020

lihaoml commented Nov 9, 2020

chore(line_protocol): fix nanosecond timestamp resolution for points #811

chore(line_protocol): fix nanosecond timestamp resolution for points #811

Conversation

sebito91 commented Apr 8, 2020

inselbuch commented Apr 8, 2020

sebito91 commented Apr 8, 2020

clslgrnc commented Apr 9, 2020 • edited Loading

cuxcrider commented Oct 5, 2020

lihaoml commented Nov 9, 2020

clslgrnc commented Apr 9, 2020 •

edited

Loading