Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logs dataset naming constraints #751

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SylvainJuge
Copy link
Member

@SylvainJuge SylvainJuge commented Jan 30, 2023

The specification for logging currently allows to have values in event.dataset that do not fit the data_stream.dataset.
There are two places where this is mentionned:

Checklist

  • do we have an existing normalization in place for service.name to ensure those constraints ?
  • find if we can define (or link existing) normalization: replacing with underscores seems a relevant option.
  • in case ECS logging is used, should we make the APM agents automatically normalize the values provided by the application ?

  • Create PR as draft
  • Approval by at least one other agent
  • Mark as Ready for Review (automatically requests reviews from all agents and PM via CODEOWNERS)
    • Remove PM from reviewers if impact on product is negligible
    • Remove agents from reviewers if the change is not relevant for them
  • Merge after 2 business days passed without objections
    To auto-merge the PR, add /schedule YYYY-MM-DD to the PR description.
  • May the instrumentation collect sensitive information, such as secrets or PII (ex. in headers)?
    • Yes
      • Add a section to the spec how agents should apply sanitization (such as sanitize_field_names)
    • No
      • Why?
    • n/a
  • Create PR as draft
  • Approval by at least one other agent
  • Mark as Ready for Review (automatically requests reviews from all agents and PM via CODEOWNERS)
    • Remove PM from reviewers if impact on product is negligible
    • Remove agents from reviewers if the change is not relevant for them
  • Approved by at least 2 agents + PM (if relevant)
  • Merge after 7 days passed without objections
    To auto-merge the PR, add /schedule YYYY-MM-DD to the PR description.
  • Create implementation issues through the meta issue template (this will automate issue creation for individual agents)
  • If this spec adds a new dynamic config option, add it to central config.

@apmmachine
Copy link

apmmachine commented Jan 30, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-03-31T04:52:00.028+0000

  • Duration: 3 min 18 sec

@felixbarny
Copy link
Member

find if we can define (or link existing) normalization: replacing with underscores seems a relevant option.

I've implemented sanitization in this PR which also links to the sources where the restrictions are coming from: https://github.com/elastic/elasticsearch/pull/76511/files#diff-932d6c78fbdb18c29397c722174ecc65925a218ff27245f017a6ff9c3fb539c8R42

in case ECS logging is used, should we make the APM agents automatically normalize the values provided by the application ?

I suppose we'll need to do sanitization at multiple places, like in the ECS loggers, and in APM Server. Not sure if we also need to sanitize in agents.

@SylvainJuge
Copy link
Member Author

Thanks for the pointer to the related change.

The new constraint (and automatic normalization) is added to Elasticsearch, thus anything that would write to it will either:

  • provide a "compliant" value, the provided value is written as-is.
  • do not provide a "compliant" value: then the written value is normalized.

From what I understand this limitation is mostly technical due to the fact that we need to fit the index naming patterns.

In the logs, we use this value to break-down the application logs, using a dotted-syntax.
For example, we might use tomcat.log or tomcat.access values for the tomcat general log and access log respectively.
In the case those would be set to tomcat/log or tomcat-access, then the normalized values still preserve the meaning with tomcat_log and tomcat_access, so while it's not 100% identical the end-user can accommodate.

👍 on doing this normalization at APM Server and ECS loggers level. APM Agents would implicitly implement it by relying on their respective ECS loggers.

@SylvainJuge
Copy link
Member Author

Hi @felixbarny , does this PR is still relevant to change in the agents ? Or is it something that would be better if handled directly in apm-server instead ?

@felixbarny
Copy link
Member

Not quite sure what's the best place to handle this, tbh. Maybe even in the ecs logger libs.

We'll probably also need to re-think the way we currently set event.dataset and data_stream.dataset when moving away from service-specific data streams. See also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants