Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elasticsearch storage can not customize the date format #1489

Closed
code98 opened this issue Apr 24, 2019 · 15 comments · Fixed by jaegertracing/spark-dependencies#101, jaegertracing/jaeger-operator#1325 or #2637
Labels
help wanted Features that maintainers are willing to accept but do not have cycles to implement storage/elasticsearch

Comments

@code98
Copy link

code98 commented Apr 24, 2019

Requirement - what kind of business use case are you trying to solve?

I want to solve the elasticsearch storage index can not customize the date format

Problem - what in Jaeger blocks you from solving the requirement?

The date format for elasticsearch to store data is fixed to 'yyyy-MM-dd', e.g: jaeger-span-2019-04-24
But my company has optimization tools for elasticsearch storage, this tool only recognizes the 'yyyyMMdd' format. e.g: jaeger-span-20190424

I believe many people will also have problems similar to me.

Proposal - what do you suggest to solve the problem or improve the existing situation?

I want to fork the code, add an es run parameter 'es.index.suffix.date.format', the default is '2006-01-02' format, support custom date format.

@yurishkuro
Copy link
Member

Seems like a reasonable enhancement/configuration.

@yurishkuro yurishkuro added the help wanted Features that maintainers are willing to accept but do not have cycles to implement label Apr 25, 2019
@code98
Copy link
Author

code98 commented Apr 26, 2019

Seems like a reasonable enhancement/configuration.

Ok, the code I have developed and is testing.

@pavolloffay
Copy link
Member

I am more skeptical about this. We will need to also change the index cleaner (written in python). The problem is that the date format in python is different than in golang.

The related issue is also #628. If we are going to implement this we should not allow changing the granularity if the indices (daily, monthly etc.)

@code98
Copy link
Author

code98 commented Apr 26, 2019

I am more skeptical about this. We will need to also change the index cleaner (written in python). The problem is that the date format in python is different than in golang.

The related issue is also #628. If we are going to implement this we should not allow changing the granularity if the indices (daily, monthly etc.)

I can develop a time format conversion program, yyyy-MM-dd converted to 2006-01-02.
Users only need to comply with the ISO-8601 standard, and do not need to care about the time format of golang.
This can be consistent with python.

@pavolloffay
Copy link
Member

the easiest would be to make the date separator configurable e.g. . would produce jaeger-span-2006.01.01

@code98
Copy link
Author

code98 commented Apr 28, 2019

the easiest would be to make the date separator configurable e.g. . would produce jaeger-span-2006.01.01

Your idea is great, but my code has been developed. Which one do you think is more appropriate?

@pavolloffay
Copy link
Member

It depends on how did you do it. I would be more in favor of a separator.

@code98
Copy link
Author

code98 commented Apr 29, 2019

We will need to also change the index cleaner (written in python)

1 python cleanup script
2. ....
Is there anything else that needs to be modified?

@pavolloffay
Copy link
Member

Also https://github.com/jaegertracing/spark-dependencies will need to be changed.

@code98
Copy link
Author

code98 commented Apr 29, 2019

Now I have a problem like this:
If the date format configured in the collector is 'yyyy-MM', then the index in es is 'jaeger-span-2019-04'.
When the spark job date format is also configured as 'yyyy-MM' and configured to run once a day, the data will be in error.

image

Because this code will query all the data for April 2019, it will calculate all the data for April every day.
solution:
Add a DSL query condition to limit the specified time range. E.g:
image

This is the solution I am thinking of, I hope that Mr. Pavolloffay will help me review it.

@pavolloffay
Copy link
Member

If the date format configured in the collector is 'yyyy-MM', then the index in es is 'jaeger-span-2019-04'.

This is not what we want here as it changes indices granularity

@code98
Copy link
Author

code98 commented Apr 29, 2019

in conclusion:
(1) With a separator, the fixed date format is: yyyyMMdd, which only modifies the separator. E. g: 2019-04-29 or 2011.04.29
(2) Using the date format, when the user-configured parameters are: 'yyyy-MM-dd or dd-MM-yyyy or MM-dd-yyyy', the operation can be successfully performed, and the granularity is also guaranteed. However, if configured as 'yyyy-MM', the spark job cannot effectively guarantee the correctness of the data, and the granularity of the data cannot be guaranteed. This risk is for the user to bear (because this is configured by the user)
(3)Only the configured date format is: yyyyMMdd or ddMMyyyy or MMddyyyy or granularity is the format of the day, any separator

How do you think you choose?

@pavolloffay
Copy link
Member

We can only change the separator e.g. - to no separator or the order of day, month and year.

This change would have to be coordinated between storage impl, python scripts, spark dependency job and Kubernetes operator.

@code98 code98 closed this as completed Dec 9, 2019
@code98
Copy link
Author

code98 commented Dec 9, 2019

ok, thank you for your reply!

@Cogniser
Copy link

Hi,
Could you please help me how to use this configuration in my jaeger deployment, currently my ES index format is jaeger-span-2020-10-20, i want to make it as jaeger-span-2020.10.20 (with dot separator) so that our cronjob can clean them periodically.
I have deployed jager in Kubernetes using Jaeger Operator
This is the image details extracted from jaeger collector pod description.
Image: jaegertracing/jaeger-collector:1.20.0
Below is the Jaeger yaml used

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: pie-jaeger
  namespace: observability
spec:
  strategy: production
  query:
    options:
      query:
        base-path: /jaeger
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://external-elastic-search-url
    esIndexCleaner:
      enabled: false
    dependencies:
      enabled: false          
  affinity: {}

I m guessing we should provide dateformat under storage.options.es section, But i m not sure about the exact param i should provide.
Thank you in Advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Features that maintainers are willing to accept but do not have cycles to implement storage/elasticsearch
Projects
None yet
5 participants