You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried using eland to read data from two data streams, with es_index_pattern=["*java.backend*", "*h3c*"] , where field 'data_stream.dataset' is the name of the data stream of the document, and its value are 'h3c' and 'java.backend' in this example.
When I use 'df' to print the dataframe, I can indeed see 'h3c' data in the printed data, but when I use value_couts() for this field, only 'java.backend' appeared. I'm not sure whether this is a bug, because i saw a warning about this field when create the eland.DataFrame.
The code and returns are in the floowing:
>>>importelandased>>>fromelasticsearchimportElasticsearch>>>importpandasaspd>>>escli=Elasticsearch(
... hosts="https://******",
... basic_auth=("elastic", "***"),
... ca_certs='./http_ca.crt',
... )
>>>df=ed.DataFrame(
... escli,
... es_index_pattern=["*java.backend*", "*h3c*"],
... columns=['@timestamp', 'message', 'data_stream.dataset'],
... es_index_field='@timestamp'
... )
# here is the warning mentioned before
......
xxxx\lib\site-packages\eland\field_mappings.py:327: UserWarning: Fielddata_stream.datasethasconflictingtypes ('constant_keyword', None) !=text
......
# here 'data_stream.dataset' has both value of 'h3c' and 'java.backend'>>>df@timestamp ... data_stream.dataset2012-12-31T23:59:33.000+08:002012-12-3123:59:33+08:00 ... h3c2012-12-31T23:59:33.000+08:002012-12-3123:59:33+08:00 ... h3c2012-12-31T23:59:48.000+08:002012-12-3123:59:48+08:00 ... h3c2012-12-31T23:59:48.000+08:002012-12-3123:59:48+08:00 ... h3c2012-12-31T23:59:48.000+08:002012-12-3123:59:48+08:00 ... h3c
... ... ... ...
2023-12-19T07:00:08.730Z2023-12-1907:00:08.730000+00:00 ... java.backend2023-12-19T07:00:08.730Z2023-12-1907:00:08.730000+00:00 ... java.backend2023-12-19T07:00:08.730Z2023-12-1907:00:08.730000+00:00 ... java.backend2023-12-19T07:00:08.730Z2023-12-1907:00:08.730000+00:00 ... java.backend2023-12-19T07:38:46.967Z2023-12-1907:38:46.967000+00:00 ... java.backend
[42240705rowsx3columns]
# but here value_counts() only return info of 'java.backend'>>>df['data_stream.dataset'].value_counts()
java.backend42043023Name: data_stream.dataset, dtype: int64>>>df['data_stream.dataset'].value_counts(10)
java.backend42043023Name: data_stream.dataset, dtype: int64>>>df['data_stream.dataset'].value_counts(2)
java.backend42043023Name: data_stream.dataset, dtype: int64
The text was updated successfully, but these errors were encountered:
I tried using eland to read data from two data streams, with
es_index_pattern=["*java.backend*", "*h3c*"]
, where field 'data_stream.dataset' is the name of the data stream of the document, and its value are 'h3c' and 'java.backend' in this example.When I use 'df' to print the dataframe, I can indeed see 'h3c' data in the printed data, but when I use value_couts() for this field, only 'java.backend' appeared. I'm not sure whether this is a bug, because i saw a warning about this field when create the eland.DataFrame.
The code and returns are in the floowing:
The text was updated successfully, but these errors were encountered: