Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable Comment Filter to Support Multiple Programming Language Comment Formats #309

Merged
merged 31 commits into from
Aug 19, 2024

Conversation

lquerel
Copy link
Contributor

@lquerel lquerel commented Aug 9, 2024

This PR introduces a new, optional comment_formats section in the weaver.yaml file to specify the behavior of the comment filter. The primary objective is to create a highly configurable comment filter while maintaining simplicity in template usage.

# weaver.yaml
...
# Specify the configuration of the comment formats.  <-- new section
comment_formats:           # optional
  <format-name>:
    format: markdown|html

    # The following fields are enabled for both markdown and html formats
    # All these fields are optional.
    header: <string>                  # The comment header line (e.g., `/**`)
    prefix: <string>                  # The comment line prefix (e.g., ` * `)
    footer: <string>                  # The comment line footer (e.g., ` */`)
    trim: <bool>                      # Flag to trim the comment content (default: true). 
    remove_trailing_dots: <bool>      # Flag to remove trailing dots from the comment content (default: false).

    # The following fields are enabled only when format is set to 'markdown'
    escape_backslashes: <bool>            # Whether to escape backslashes in the markdown (default: false).
    shortcut_reference_links: <bool>      # Use this to convert inlined links into shortcut reference links, similar to those in Go documentation (default: false).
    indent_first_level_list_items: <bool> # Whether to indent the first level of list items in the markdown (default: false).
  
    # The following fields are enabled only when format is set to 'html'
    old_style_paragraph: <bool>       # Use old-style HTML paragraphs, i.e. single <p> tag (default: false)
    omit_closing_li: <bool>           # Omit closing </li> tags in lists (default: false)
    inline_code_snippet: <jinja-expr> # Jinja expression to render inline code (default: "<c>{{code}}</c>").
    block_code_snippet: <jinja-expr>  # Jinja expression to render block code (default: "<pre>\n{{code}}\n</pre>").
  <other-format-name>:
    ...
default_comment_format: <format-name>
...

With this configuration in place, the Java code generation could define the following setup (example):

# Example of configuration for Java
comment_formats:
  javadoc:
    format: html
    header: "/**"
    prefix: " * "
    footer: " */"
    old_style_paragraph: true
    omit_closing_li: false
    inline_code_snippet: "{@code {{code}}}"
    block_code_snippet: "<pre>{@code {{code}}}</pre>"
    trim: true
    remove_trailing_dots: true
  java:
    format: html
    prefix: "// "
    old_style_paragraph: true
    omit_closing_li: false
    inline_code_snippet: "{@code {{code}}}"
    block_code_snippet: "<pre>{@code {{code}}}</pre>"
    trim: true
    remove_trailing_dots: truedefault_comment_format: javadoc

If the note equals the following markdown (the whitespaces at the beginning and the multiple dots at the end are included just to demonstrate the effect of the trim and remove_trailing_dots flags):

  The `error.type` SHOULD be predictable, and SHOULD have low cardinality.

When `error.type` is set to a type (e.g., an exception type), its
canonical class name identifying the type within the artifact SHOULD be used.

Instrumentations SHOULD document the list of errors they report.

The cardinality of `error.type` within one instrumentation library SHOULD be low.
Telemetry consumers that aggregate data from multiple instrumentation libraries and applications
should be prepared for `error.type` to have high cardinality at query time when no
additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes),
it's RECOMMENDED to:

* Use a domain-specific attribute
* Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not..  

Then creating a javadoc comment for the note is as simple as this:

...
  {{ attr.note | comment(indent=2) }}
...

The generated output is:

...
  /**
  * The {@code error.type} SHOULD be predictable, and SHOULD have low cardinality.
  * <p>
  * When {@code error.type} is set to a type (e.g., an exception type), its
  * canonical class name identifying the type within the artifact SHOULD be used.
  * <p>
  * Instrumentations SHOULD document the list of errors they report.
  * <p>
  * The cardinality of {@code error.type} within one instrumentation library SHOULD be low.
  * Telemetry consumers that aggregate data from multiple instrumentation libraries and applications
  * should be prepared for {@code error.type} to have high cardinality at query time when no
  * additional filters are applied.
  * <p>
  * If the operation has completed successfully, instrumentations SHOULD NOT set {@code error.type}.
  * <p>
  * If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes),
  * it's RECOMMENDED to:
  * <p>
  * <ul>
  *   <li>Use a domain-specific attribute
  *   <li>Set {@code error.type} to capture all errors, regardless of whether they are defined within the domain-specific set or not
  * </ul>
  */
...  

The comment filter currently supports the following optional parameters:

  • format: A valid ID from the comment_formats configuration map.
  • header: A custom header for the comment block.
  • prefix: A custom prefix for each comment line.
  • footer: A custom footer for the comment block.
  • indent: Number of spaces to add before each comment line for indentation purposes.

The comment filter processes input data as follows:

  • An undefined input is ignored. For example, {{ attr.note | comment(indent="2") }} will not fail and will not produce any output if attr.note doesn't exist in the context.
  • If the input is an array, each item in the array is treated as a separate comment, with line breaks separating them.
  • For all other cases, the input is converted into a string and processed according to the rules mentioned above.

This configuration approach is designed to be extendable, allowing us to support more languages or nuances in code documentation. Therefore, I would appreciate your feedback before finalizing this PR to ensure all critical features are covered. If you notice any missing configuration options for your language, please leave a comment in this PR thread specifying the render or transform options you’d like to see included (@jsuereth , @lmolkova , @joaopgrassi , @trisch-me , @MadVikingGod , @bryannaegele ).

Task list:

  • Implement comment_formats config options
  • Implement default_comment_format
  • Implement HTML rendering based comment format config
  • Support configuration overrides
  • Provide comment config examples for Java, Go, Python, Rust in the root weaver.yaml file
  • Update documentation
  • Update Rust codegen example
  • Write tests

Closes #291

@lquerel lquerel self-assigned this Aug 9, 2024
@lquerel lquerel added the enhancement New feature or request label Aug 9, 2024
@bryannaegele
Copy link
Contributor

We use markdown so it doesn't look like anything affects us. Would it make it cleaner though to put language (html|markdown) formatter options under their own keys for clarity?

@lquerel
Copy link
Contributor Author

lquerel commented Aug 14, 2024

@bryannaegele Yes, for languages that natively support Markdown, this might be less relevant. However, transform_options can still be useful for cleaning up comments (e.g., trim, remove_trailing_dots). Regarding the output format, I’ll add an optional parameter to the comment filter to explicitly specify the output format (html or markdown, with markdown as the default). Additionally, there’s a similar field in the configuration to specify the format (see render_options.format, which also defaults to markdown). Does this address your concern, or did you have something else in mind?

@lmolkova
Copy link
Contributor

Thanks a lot for putting it up, looks great, I just have a couple of minor comments:

  block_code_snippet: <jinja-expr>  # Jinja expression to render block code (default: "<code>{{code}}</code>")

do we have some specific case where it's necessary? Also, would it be based on ``` ? My usual reactions is to not add things until they are necessary (not a strong position).

transform_options:

would it escape special characters? (there is a need for it - dns query is a good test case)

@lquerel
Copy link
Contributor Author

lquerel commented Aug 15, 2024

  block_code_snippet: <jinja-expr>  # Jinja expression to render block code (default: "<code>{{code}}</code>")

do we have some specific case where it's necessary? Also, would it be based on ``` ? My usual reactions is to not add things until they are necessary (not a strong position).

Yes, it’s based on ```. While we don’t have an immediate use case, identifying it now allows us to use precise terminology and avoid naming issues should the need arise in the future. Additionally, its implementation requires just a few lines of code, so I’m inclined to include it this time. However, I agree that, in general, we should avoid implementing features without a clear requirement.

transform_options:

would it escape special characters? (there is a need for it - dns query is a good test case)

Thanks, I’ll create a test case using your example to ensure that special characters are properly handled.

crates/weaver_forge/src/formats/markdown.rs Fixed Show fixed Hide fixed
crates/weaver_forge/src/formats/markdown.rs Fixed Show fixed Hide fixed
crates/weaver_forge/src/formats/markdown.rs Fixed Show fixed Hide fixed
crates/weaver_forge/src/formats/markdown.rs Fixed Show fixed Hide fixed
crates/weaver_forge/src/formats/markdown.rs Fixed Show fixed Hide fixed
crates/weaver_forge/src/formats/markdown.rs Fixed Show fixed Hide fixed
crates/weaver_forge/src/formats/markdown.rs Fixed Show fixed Hide fixed
crates/weaver_forge/src/formats/markdown.rs Fixed Show fixed Hide fixed
crates/weaver_forge/src/formats/markdown.rs Fixed Show fixed Hide fixed
crates/weaver_forge/src/formats/markdown.rs Fixed Show fixed Hide fixed
@jsuereth
Copy link
Contributor

The only question I have - specific to java doc needs, is if we can merge "brief" and "note" into the comment.

Example jinja template:

{{ join([attribute.brief, attribute.note], "<br/>") | comment }}

Or, what Java actually wants to do:

/**
 * {{ brief | embedded_comment }}
 *
 * {% for note in notes %}<li>{{ note | embedded_comment }}</li>{% endfor %}
 */

@lquerel
Copy link
Contributor Author

lquerel commented Aug 16, 2024

The only question I have - specific to java doc needs, is if we can merge "brief" and "note" into the comment.

Example jinja template:

{{ join([attribute.brief, attribute.note], "<br/>") | comment }}

Yes, and without the join, as this new comment filter also natively supports input sequences. Undefined items in the sequence are also ignored (e.g., if note is not defined), and trailing newlines are removed, which is great when we can place the optional part at the end of the sequence.

{{ [attr.brief, "\n", attr.note] | comment }}

For the Jinja expression above, the following semconv attribute...

      - id: error.type
        stability: stable
        brief: >
          Describes a class of error the operation ended with.
        type:
          members:
            - id: other
              value: "_OTHER"
              stability: stable
              brief: >
                A fallback error value to be used when the instrumentation doesn't define a custom value.
        examples: ['timeout', 'java.net.UnknownHostException', 'server_certificate_invalid', '500']
        note: |
          The `error.type` SHOULD be predictable, and SHOULD have low cardinality.

          When `error.type` is set to a type (e.g., an exception type), its
          canonical class name identifying the type within the artifact SHOULD be used.

          Instrumentations SHOULD document the list of errors they report.

          The cardinality of `error.type` within one instrumentation library SHOULD be low.
          Telemetry consumers that aggregate data from multiple instrumentation libraries and applications
          should be prepared for `error.type` to have high cardinality at query time when no
          additional filters are applied.

          If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`.

          If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes),
          it's RECOMMENDED to:

          * Use a domain-specific attribute
          * Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not.

...generates the following Javadoc

  /**
   * Describes a class of error the operation ended with.
   * <p>
   * The {@code error.type} SHOULD be predictable, and SHOULD have low cardinality.
   * <p>
   * When {@code error.type} is set to a type (e.g., an exception type), its
   * canonical class name identifying the type within the artifact SHOULD be used.
   * <p>
   * Instrumentations SHOULD document the list of errors they report.
   * <p>
   * The cardinality of {@code error.type} within one instrumentation library SHOULD be low.
   * Telemetry consumers that aggregate data from multiple instrumentation libraries and applications
   * should be prepared for {@code error.type} to have high cardinality at query time when no
   * additional filters are applied.
   * <p>
   * If the operation has completed successfully, instrumentations SHOULD NOT set {@code error.type}.
   * <p>
   * If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes),
   * it's RECOMMENDED to:
   * <p>
   * <ul>
   *   <li>Use a domain-specific attribute
   *   <li>Set {@code error.type} to capture all errors, regardless of whether they are defined within the domain-specific set or not
   * </ul>
   */

Or, what Java actually wants to do:

/**
 * {{ brief | embedded_comment }}
 *
 * {% for note in notes %}<li>{{ note | embedded_comment }}</li>{% endfor %}
 */

Not quite sure to understand this part of the comment. Are you talking about note or examples ?

@lquerel lquerel marked this pull request as ready for review August 17, 2024 00:05
@lquerel lquerel requested a review from a team August 17, 2024 00:05
@lquerel lquerel changed the title [WIP] Generic comment filter Configurable Comment Filter to Support Multiple Programming Language Comment Formats Aug 17, 2024
@jerbly
Copy link
Contributor

jerbly commented Aug 17, 2024

If the note includes a markdown block using triple backticks like:

Something pre-formatted

Then for the rust 🦀 this needs to have ignore as a suffix on the opening set of triple back-ticks.
I wrote some jinja to handle this in our internal company semantic conventions. (There are no examples of this in the otel library of semantic conventions currently).

{%- set ns = namespace(in_code_block=False) %}
{%- for line in attribute.note.split('\n') %}
{%- if line.startswith('```') and not ns.in_code_block %}
/// {% filter escape %}{{line}}ignore{% endfilter %}
{%- set ns.in_code_block = true %}
{%- elif line.startswith('```') and ns.in_code_block %}
/// {% filter escape %}{{line}}{% endfilter %}
{%- set ns.in_code_block = false %}
{%- else %}
/// {% filter escape %}{{line}}{% endfilter %}
{%- endif %}
{%- endfor %}

Not sure how this could be achieved with this comment filter.

@jsuereth
Copy link
Contributor

Not quite sure to understand this part of the comment. Are you talking about note or examples ?

I was thinking of the description of a Metric where we may need to pull notes from attributes. I think what you have now solves all the issues in Java's codegen today

@lquerel
Copy link
Contributor Author

lquerel commented Aug 17, 2024

@jerbly

If the note includes a markdown block using triple backticks like:

Something pre-formatted

The current implementation supports language specifications for code blocks. If the semconv definition includes the ignore directive, it should be incorporated into the Rust code generation. Additionally, a possible enhancement could be to automatically add ignore when the language is not defined in the original markdown code block, based on a configuration option. Below is the Rust code I am currently using for Markdown generation so it's easy to add include for the None branch.

            Node::Code(code) => match &code.lang {
                Some(lang) => {
                    ctx.markdown
                        .push_str(&format!("```{}\n{}\n```\n", lang, code.value));
                }
                None => {
                    ctx.markdown
                        .push_str(&format!("```\n{}\n```\n", code.value));
                }
            }

@lquerel
Copy link
Contributor Author

lquerel commented Aug 17, 2024

@jsuereth

Not quite sure to understand this part of the comment. Are you talking about note or examples ?

I was thinking of the description of a Metric where we may need to pull notes from attributes. I think what you have now solves all the issues in Java's codegen today

One possible approach for the metrics is to explore the feasibility of using a macro that returns the part of the comment derived from the attribute notes, or undefined when there is nothing to comment. This way, we could use the following statement to create the comment for the corresponding metric.

{{ [metric.brief, "\n", metric.note, "\n", metric_attrs_comment(metric)] | comment }}

Similarly, we could introduce a Jinja function that returns formatted text or undefined if the parameter doesn’t exist., e.g. {{ [metric.brief, comment_section("Note:", metric.note), "\n", metric_attrs_comment(metric)] | comment }}. I’m not entirely certain about the name of this comment_section function, but this is just to demonstrate the concept that we could develop to create complex comments with optional sections.

@jerbly
Copy link
Contributor

jerbly commented Aug 17, 2024

@lquerel

Additionally, a possible enhancement could be to automatically add ignore when the language is not defined in the original markdown code block, based on a configuration option.

I don't think it's an enhancement, it's essential to create correct rust.

@lquerel
Copy link
Contributor Author

lquerel commented Aug 17, 2024

@jerbly

I don't think it's an enhancement, it's essential to create correct rust.

Since this filter already supports the possibility of having a code block with the ‘ignore’ directive, I would say that allowing it to be added by default when it is not specified falls under the category of an enhancement :-). In any case, I will probably add this option before merging this PR.

@lquerel
Copy link
Contributor Author

lquerel commented Aug 18, 2024

@jerbly I added the Markdown option default_block_code_language, which can be set to ignore for Rust comments.

@lquerel
Copy link
Contributor Author

lquerel commented Aug 18, 2024

@jsuereth, for complex comments composed of multiple sections that should only be generated when certain fields are defined, we could introduce the function concat_if. By default, this function concatenates all its arguments and returns a string or undefined if any of its operands are undefined.

{{ [metric.brief, concat_if("\nNote: ", metric.note)] | comment }}

In this statement, if metric.note is not defined, the comment will only contain metric.brief. If metric.note is defined, the comment will include the brief, followed by "Note: ", and then the content of the note. What do you think?

@lquerel lquerel merged commit 6dd0c8d into open-telemetry:main Aug 19, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
5 participants