Configurable Comment Filter to Support Multiple Programming Language Comment Formats #309

lquerel · 2024-08-09T23:26:47Z

This PR introduces a new, optional comment_formats section in the weaver.yaml file to specify the behavior of the comment filter. The primary objective is to create a highly configurable comment filter while maintaining simplicity in template usage.

# weaver.yaml
...
# Specify the configuration of the comment formats.  <-- new section
comment_formats:           # optional
  <format-name>:
    format: markdown|html

    # The following fields are enabled for both markdown and html formats
    # All these fields are optional.
    header: <string>                  # The comment header line (e.g., `/**`)
    prefix: <string>                  # The comment line prefix (e.g., ` * `)
    footer: <string>                  # The comment line footer (e.g., ` */`)
    trim: <bool>                      # Flag to trim the comment content (default: true). 
    remove_trailing_dots: <bool>      # Flag to remove trailing dots from the comment content (default: false).

    # The following fields are enabled only when format is set to 'markdown'
    escape_backslashes: <bool>            # Whether to escape backslashes in the markdown (default: false).
    shortcut_reference_links: <bool>      # Use this to convert inlined links into shortcut reference links, similar to those in Go documentation (default: false).
    indent_first_level_list_items: <bool> # Whether to indent the first level of list items in the markdown (default: false).
  
    # The following fields are enabled only when format is set to 'html'
    old_style_paragraph: <bool>       # Use old-style HTML paragraphs, i.e. single <p> tag (default: false)
    omit_closing_li: <bool>           # Omit closing </li> tags in lists (default: false)
    inline_code_snippet: <jinja-expr> # Jinja expression to render inline code (default: "<c>{{code}}</c>").
    block_code_snippet: <jinja-expr>  # Jinja expression to render block code (default: "<pre>\n{{code}}\n</pre>").
  <other-format-name>:
    ...
default_comment_format: <format-name>
...

With this configuration in place, the Java code generation could define the following setup (example):

# Example of configuration for Java
comment_formats:
  javadoc:
    format: html
    header: "/**"
    prefix: " * "
    footer: " */"
    old_style_paragraph: true
    omit_closing_li: false
    inline_code_snippet: "{@code {{code}}}"
    block_code_snippet: "<pre>{@code {{code}}}</pre>"
    trim: true
    remove_trailing_dots: true
  java:
    format: html
    prefix: "// "
    old_style_paragraph: true
    omit_closing_li: false
    inline_code_snippet: "{@code {{code}}}"
    block_code_snippet: "<pre>{@code {{code}}}</pre>"
    trim: true
    remove_trailing_dots: truedefault_comment_format: javadoc

If the note equals the following markdown (the whitespaces at the beginning and the multiple dots at the end are included just to demonstrate the effect of the trim and remove_trailing_dots flags):

  The `error.type` SHOULD be predictable, and SHOULD have low cardinality.

When `error.type` is set to a type (e.g., an exception type), its
canonical class name identifying the type within the artifact SHOULD be used.

Instrumentations SHOULD document the list of errors they report.

The cardinality of `error.type` within one instrumentation library SHOULD be low.
Telemetry consumers that aggregate data from multiple instrumentation libraries and applications
should be prepared for `error.type` to have high cardinality at query time when no
additional filters are applied.

If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`.

If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes),
it's RECOMMENDED to:

* Use a domain-specific attribute
* Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not..

Then creating a javadoc comment for the note is as simple as this:

...
  {{ attr.note | comment(indent=2) }}
...

The generated output is:

...
  /**
  * The {@code error.type} SHOULD be predictable, and SHOULD have low cardinality.
  * <p>
  * When {@code error.type} is set to a type (e.g., an exception type), its
  * canonical class name identifying the type within the artifact SHOULD be used.
  * <p>
  * Instrumentations SHOULD document the list of errors they report.
  * <p>
  * The cardinality of {@code error.type} within one instrumentation library SHOULD be low.
  * Telemetry consumers that aggregate data from multiple instrumentation libraries and applications
  * should be prepared for {@code error.type} to have high cardinality at query time when no
  * additional filters are applied.
  * <p>
  * If the operation has completed successfully, instrumentations SHOULD NOT set {@code error.type}.
  * <p>
  * If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes),
  * it's RECOMMENDED to:
  * <p>
  * <ul>
  *   <li>Use a domain-specific attribute
  *   <li>Set {@code error.type} to capture all errors, regardless of whether they are defined within the domain-specific set or not
  * </ul>
  */
...

The comment filter currently supports the following optional parameters:

format: A valid ID from the comment_formats configuration map.
header: A custom header for the comment block.
prefix: A custom prefix for each comment line.
footer: A custom footer for the comment block.
indent: Number of spaces to add before each comment line for indentation purposes.

The comment filter processes input data as follows:

An undefined input is ignored. For example, {{ attr.note | comment(indent="2") }} will not fail and will not produce any output if attr.note doesn't exist in the context.
If the input is an array, each item in the array is treated as a separate comment, with line breaks separating them.
For all other cases, the input is converted into a string and processed according to the rules mentioned above.

This configuration approach is designed to be extendable, allowing us to support more languages or nuances in code documentation. Therefore, I would appreciate your feedback before finalizing this PR to ensure all critical features are covered. If you notice any missing configuration options for your language, please leave a comment in this PR thread specifying the render or transform options you’d like to see included (@jsuereth , @lmolkova , @joaopgrassi , @trisch-me , @MadVikingGod , @bryannaegele ).

Task list:

Implement comment_formats config options
Implement default_comment_format
Implement HTML rendering based comment format config
Support configuration overrides
Provide comment config examples for Java, Go, Python, Rust in the root weaver.yaml file
Update documentation
Update Rust codegen example
Write tests

Closes #291

crates/weaver_forge/src/config.rs

bryannaegele · 2024-08-14T15:14:17Z

We use markdown so it doesn't look like anything affects us. Would it make it cleaner though to put language (html|markdown) formatter options under their own keys for clarity?

lquerel · 2024-08-14T15:38:03Z

@bryannaegele Yes, for languages that natively support Markdown, this might be less relevant. However, transform_options can still be useful for cleaning up comments (e.g., trim, remove_trailing_dots). Regarding the output format, I’ll add an optional parameter to the comment filter to explicitly specify the output format (html or markdown, with markdown as the default). Additionally, there’s a similar field in the configuration to specify the format (see render_options.format, which also defaults to markdown). Does this address your concern, or did you have something else in mind?

lmolkova · 2024-08-15T00:20:21Z

Thanks a lot for putting it up, looks great, I just have a couple of minor comments:

  block_code_snippet: <jinja-expr>  # Jinja expression to render block code (default: "<code>{{code}}</code>")

do we have some specific case where it's necessary? Also, would it be based on ``` ? My usual reactions is to not add things until they are necessary (not a strong position).

transform_options:

would it escape special characters? (there is a need for it - dns query is a good test case)

lquerel · 2024-08-15T05:19:16Z

  block_code_snippet: <jinja-expr>  # Jinja expression to render block code (default: "<code>{{code}}</code>")
do we have some specific case where it's necessary? Also, would it be based on ``` ? My usual reactions is to not add things until they are necessary (not a strong position).

Yes, it’s based on ```. While we don’t have an immediate use case, identifying it now allows us to use precise terminology and avoid naming issues should the need arise in the future. Additionally, its implementation requires just a few lines of code, so I’m inclined to include it this time. However, I agree that, in general, we should avoid implementing features without a clear requirement.

transform_options:

would it escape special characters? (there is a need for it - dns query is a good test case)

Thanks, I’ll create a test case using your example to ensure that special characters are properly handled.

crates/weaver_forge/src/formats/markdown.rs

jsuereth · 2024-08-15T20:56:19Z

The only question I have - specific to java doc needs, is if we can merge "brief" and "note" into the comment.

Example jinja template:

{{ join([attribute.brief, attribute.note], "<br/>") | comment }}

Or, what Java actually wants to do:

/**
 * {{ brief | embedded_comment }}
 *
 * {% for note in notes %}<li>{{ note | embedded_comment }}</li>{% endfor %}
 */

lquerel · 2024-08-16T05:02:58Z

The only question I have - specific to java doc needs, is if we can merge "brief" and "note" into the comment.

Example jinja template:
{{ join([attribute.brief, attribute.note], "<br/>") | comment }}

Yes, and without the join, as this new comment filter also natively supports input sequences. Undefined items in the sequence are also ignored (e.g., if note is not defined), and trailing newlines are removed, which is great when we can place the optional part at the end of the sequence.

{{ [attr.brief, "\n", attr.note] | comment }}

For the Jinja expression above, the following semconv attribute...

      - id: error.type
        stability: stable
        brief: >
          Describes a class of error the operation ended with.
        type:
          members:
            - id: other
              value: "_OTHER"
              stability: stable
              brief: >
                A fallback error value to be used when the instrumentation doesn't define a custom value.
        examples: ['timeout', 'java.net.UnknownHostException', 'server_certificate_invalid', '500']
        note: |
          The `error.type` SHOULD be predictable, and SHOULD have low cardinality.

          When `error.type` is set to a type (e.g., an exception type), its
          canonical class name identifying the type within the artifact SHOULD be used.

          Instrumentations SHOULD document the list of errors they report.

          The cardinality of `error.type` within one instrumentation library SHOULD be low.
          Telemetry consumers that aggregate data from multiple instrumentation libraries and applications
          should be prepared for `error.type` to have high cardinality at query time when no
          additional filters are applied.

          If the operation has completed successfully, instrumentations SHOULD NOT set `error.type`.

          If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes),
          it's RECOMMENDED to:

          * Use a domain-specific attribute
          * Set `error.type` to capture all errors, regardless of whether they are defined within the domain-specific set or not.

...generates the following Javadoc

  /**
   * Describes a class of error the operation ended with.
   * <p>
   * The {@code error.type} SHOULD be predictable, and SHOULD have low cardinality.
   * <p>
   * When {@code error.type} is set to a type (e.g., an exception type), its
   * canonical class name identifying the type within the artifact SHOULD be used.
   * <p>
   * Instrumentations SHOULD document the list of errors they report.
   * <p>
   * The cardinality of {@code error.type} within one instrumentation library SHOULD be low.
   * Telemetry consumers that aggregate data from multiple instrumentation libraries and applications
   * should be prepared for {@code error.type} to have high cardinality at query time when no
   * additional filters are applied.
   * <p>
   * If the operation has completed successfully, instrumentations SHOULD NOT set {@code error.type}.
   * <p>
   * If a specific domain defines its own set of error identifiers (such as HTTP or gRPC status codes),
   * it's RECOMMENDED to:
   * <p>
   * <ul>
   *   <li>Use a domain-specific attribute
   *   <li>Set {@code error.type} to capture all errors, regardless of whether they are defined within the domain-specific set or not
   * </ul>
   */

Or, what Java actually wants to do:

/**
 * {{ brief | embedded_comment }}
 *
 * {% for note in notes %}<li>{{ note | embedded_comment }}</li>{% endfor %}
 */

Not quite sure to understand this part of the comment. Are you talking about note or examples ?

jerbly · 2024-08-17T12:18:10Z

If the note includes a markdown block using triple backticks like:

Something pre-formatted

Then for the rust 🦀 this needs to have ignore as a suffix on the opening set of triple back-ticks.
I wrote some jinja to handle this in our internal company semantic conventions. (There are no examples of this in the otel library of semantic conventions currently).

{%- set ns = namespace(in_code_block=False) %}
{%- for line in attribute.note.split('\n') %}
{%- if line.startswith('```') and not ns.in_code_block %}
/// {% filter escape %}{{line}}ignore{% endfilter %}
{%- set ns.in_code_block = true %}
{%- elif line.startswith('```') and ns.in_code_block %}
/// {% filter escape %}{{line}}{% endfilter %}
{%- set ns.in_code_block = false %}
{%- else %}
/// {% filter escape %}{{line}}{% endfilter %}
{%- endif %}
{%- endfor %}

Not sure how this could be achieved with this comment filter.

jsuereth · 2024-08-17T13:03:51Z

Not quite sure to understand this part of the comment. Are you talking about note or examples ?

I was thinking of the description of a Metric where we may need to pull notes from attributes. I think what you have now solves all the issues in Java's codegen today

lquerel · 2024-08-17T15:38:39Z

@jerbly

If the note includes a markdown block using triple backticks like:
Something pre-formatted

The current implementation supports language specifications for code blocks. If the semconv definition includes the ignore directive, it should be incorporated into the Rust code generation. Additionally, a possible enhancement could be to automatically add ignore when the language is not defined in the original markdown code block, based on a configuration option. Below is the Rust code I am currently using for Markdown generation so it's easy to add include for the None branch.

            Node::Code(code) => match &code.lang {
                Some(lang) => {
                    ctx.markdown
                        .push_str(&format!("```{}\n{}\n```\n", lang, code.value));
                }
                None => {
                    ctx.markdown
                        .push_str(&format!("```\n{}\n```\n", code.value));
                }
            }

lquerel · 2024-08-17T16:51:35Z

@jsuereth

Not quite sure to understand this part of the comment. Are you talking about note or examples ?

I was thinking of the description of a Metric where we may need to pull notes from attributes. I think what you have now solves all the issues in Java's codegen today

One possible approach for the metrics is to explore the feasibility of using a macro that returns the part of the comment derived from the attribute notes, or undefined when there is nothing to comment. This way, we could use the following statement to create the comment for the corresponding metric.

{{ [metric.brief, "\n", metric.note, "\n", metric_attrs_comment(metric)] | comment }}

Similarly, we could introduce a Jinja function that returns formatted text or undefined if the parameter doesn’t exist., e.g. {{ [metric.brief, comment_section("Note:", metric.note), "\n", metric_attrs_comment(metric)] | comment }}. I’m not entirely certain about the name of this comment_section function, but this is just to demonstrate the concept that we could develop to create complex comments with optional sections.

jerbly · 2024-08-17T20:39:11Z

@lquerel

Additionally, a possible enhancement could be to automatically add ignore when the language is not defined in the original markdown code block, based on a configuration option.

I don't think it's an enhancement, it's essential to create correct rust.

lquerel · 2024-08-17T23:58:04Z

@jerbly

I don't think it's an enhancement, it's essential to create correct rust.

Since this filter already supports the possibility of having a code block with the ‘ignore’ directive, I would say that allowing it to be added by default when it is not specified falls under the category of an enhancement :-). In any case, I will probably add this option before merging this PR.

lquerel · 2024-08-18T05:47:24Z

@jerbly I added the Markdown option default_block_code_language, which can be set to ignore for Rust comments.

lquerel · 2024-08-18T06:32:39Z

@jsuereth, for complex comments composed of multiple sections that should only be generated when certain fields are defined, we could introduce the function concat_if. By default, this function concatenates all its arguments and returns a string or undefined if any of its operands are undefined.

{{ [metric.brief, concat_if("\nNote: ", metric.note)] | comment }}

In this statement, if metric.note is not defined, the comment will only contain metric.brief. If metric.note is defined, the comment will include the brief, followed by "Note: ", and then the content of the note. What do you think?

lquerel added 2 commits August 9, 2024 15:49

feat(config): Add comment_formats section in weaver.yaml

bcbf10c

feat(forge): Add generic comment

be381f7

lquerel self-assigned this Aug 9, 2024

lquerel added the enhancement New feature or request label Aug 9, 2024

lquerel added 5 commits August 11, 2024 11:10

feat(forge): Add generic comment

f8848c1

Merge branch 'main' into comment-filter

4a9a5ae

feat(forge): Clean up comment

5b6fa21

Merge remote-tracking branch 'origin/comment-filter' into comment-filter

1ae88ac

Merge branch 'main' into comment-filter

7c3a9e7

github-advanced-security bot found potential problems Aug 13, 2024

View reviewed changes

crates/weaver_forge/src/config.rs Fixed Show fixed Hide fixed

lquerel added 3 commits August 13, 2024 21:16

feat(forge): Test HtmlRenderer

b187753

feat(forge): Add line prefix and format parameter

d5465d4

Merge branch 'main' into comment-filter

e474fb7

jsuereth mentioned this pull request Aug 14, 2024

comment_with_prefix should normalize the content #266

Closed

feat(forge): Add tests for Java, Rust, Go, Python [WIP]

863bb6c

lquerel added 2 commits August 15, 2024 10:38

feat(forge): Add rendering options for markdown comment format

83bd766

Merge remote-tracking branch 'origin/comment-filter' into comment-filter

1d7e79d

github-advanced-security bot found potential problems Aug 15, 2024

View reviewed changes

feat(forge): Improve both markdown and html comment rendering

e23a432

lquerel added 5 commits August 16, 2024 09:32

feat(forge): Fix all tests and Clippy lints

9f372c7

Merge branch 'main' into comment-filter

1c6cbe4

feat(forge): Simplify structure of comment_format

cc7cbba

feat(forge): Improve markdown and html formating

4cd6957

feat(forge): Update doc weaver-config.md

8e6b474

lquerel added 2 commits August 16, 2024 16:49

feat(forge): Update documation of the Forge crate

ac4532e

feat(forge): Update the Rust examples

485c199

lquerel marked this pull request as ready for review August 17, 2024 00:05

lquerel requested a review from a team August 17, 2024 00:05

lquerel changed the title ~~[WIP] Generic comment filter~~ Configurable Comment Filter to Support Multiple Programming Language Comment Formats Aug 17, 2024

chore(build): Fix Clippy lint issue

5624b63

lquerel mentioned this pull request Aug 17, 2024

Jinja code generation helpers suggestions #203

Closed

lquerel added 2 commits August 16, 2024 23:05

chore(doc): Update forge doc

0101902

chore(doc): Update forge doc

30d252b

Merge branch 'main' into comment-filter

35be6c1

lquerel added 2 commits August 17, 2024 22:44

feature(forge): Add markdown option default_block_code_language

626fc5b

Merge remote-tracking branch 'origin/comment-filter' into comment-filter

39ded98

jsuereth approved these changes Aug 19, 2024

View reviewed changes

lquerel added 4 commits August 19, 2024 15:42

feature(forge): Add concat_if Jinja function

4b161af

chore(build): Prepare release 0.9.0

0aeefe7

chore(build): Prepare release 0.9.0 - Fix typos

7a5bb45

chore(build): Prepare release 0.9.0 - Fix typos

f34d772

lquerel merged commit 6dd0c8d into open-telemetry:main Aug 19, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable Comment Filter to Support Multiple Programming Language Comment Formats #309

Configurable Comment Filter to Support Multiple Programming Language Comment Formats #309

lquerel commented Aug 9, 2024 •

edited

Loading

bryannaegele commented Aug 14, 2024

lquerel commented Aug 14, 2024

lmolkova commented Aug 15, 2024

lquerel commented Aug 15, 2024

jsuereth commented Aug 15, 2024

lquerel commented Aug 16, 2024 •

edited

Loading

jerbly commented Aug 17, 2024

jsuereth commented Aug 17, 2024

lquerel commented Aug 17, 2024 •

edited

Loading

lquerel commented Aug 17, 2024 •

edited

Loading

jerbly commented Aug 17, 2024 •

edited

Loading

lquerel commented Aug 17, 2024

lquerel commented Aug 18, 2024

lquerel commented Aug 18, 2024

Configurable Comment Filter to Support Multiple Programming Language Comment Formats #309

Configurable Comment Filter to Support Multiple Programming Language Comment Formats #309

Conversation

lquerel commented Aug 9, 2024 • edited Loading

bryannaegele commented Aug 14, 2024

lquerel commented Aug 14, 2024

lmolkova commented Aug 15, 2024

lquerel commented Aug 15, 2024

jsuereth commented Aug 15, 2024

lquerel commented Aug 16, 2024 • edited Loading

jerbly commented Aug 17, 2024

jsuereth commented Aug 17, 2024

lquerel commented Aug 17, 2024 • edited Loading

lquerel commented Aug 17, 2024 • edited Loading

jerbly commented Aug 17, 2024 • edited Loading

lquerel commented Aug 17, 2024

lquerel commented Aug 18, 2024

lquerel commented Aug 18, 2024

lquerel commented Aug 9, 2024 •

edited

Loading

lquerel commented Aug 16, 2024 •

edited

Loading

lquerel commented Aug 17, 2024 •

edited

Loading

lquerel commented Aug 17, 2024 •

edited

Loading

jerbly commented Aug 17, 2024 •

edited

Loading