Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Support for locale #32978

Closed
michalpristas opened this issue May 10, 2024 · 5 comments
Closed

[pkg/ottl] Support for locale #32978

michalpristas opened this issue May 10, 2024 · 5 comments

Comments

@michalpristas
Copy link
Contributor

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

When timestamp is in non-english language normal golang parsing will fail.

example:
when the tool i'm observing is producing Italian logs in a form of mercoledì set 4 2024 parsing with a layout Monday Jan 2 2006 should yield 4th September 2024 correctly

Describe the solution you'd like

Depending on #32977

DoD:

  • Add an ottl.Optional[string] argument to Time converter named locale.
  • Check locale is valid and we have a mapping for it at startup not in expression function
  • Date in non-english locales are parsed correctly
  • Support for different formats e.g stdLongWeekDay, stdWeekDay
  • Unit tests verifying known locales

Describe alternatives you've considered

No response

Additional context

No response

@michalpristas michalpristas added enhancement New feature or request needs triage New item requiring triage labels May 10, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

Is this closed by #32140?

@TylerHelmuth TylerHelmuth removed the needs triage New item requiring triage label May 10, 2024
@michalpristas
Copy link
Contributor Author

michalpristas commented May 13, 2024

no i extracted it outside of that issue. to be more granular
i kept that one about timezones, and this one about locales (languages)

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 15, 2024
TylerHelmuth pushed a commit that referenced this issue Sep 6, 2024
…34353)

**Description:** <Describe what has changed.>
Added support for localized time parsing into the
`coreinternal/timeutils` package.

The [tracking
issue](#32977)
is a following up to
#32140,
and the added function (`ParseLocalizedStrptime`) can be used later to
[add locale
support](#32978)
to the ottl `Time` function.

As discussed in the related issues, the plan is to have a similar
support as implemented by the library
[monday](https://github.com/goodsign/monday), making it possible to
parse non-english time strings into `time.Time` objects.
Elastic has built a new OSS library for that same purpose
([lunes](https://github.com/elastic/lunes)), that considering the
discussed requirements, seems to be a better option than `monday`. Both
libraries are just wrapper to the `time.Parse` and
`time.ParseInLocation` features. They work by translating the foreign
language value to English before calling the standard parsing functions.

The main `lunes` differences are:

1 - Performance is considerably better, ~13x faster for complete .Parse
operations:

```
BenchmarkParseLunes-10                2707008	       429.7 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseMonday-10                212086	      5630   ns/op	    3753 B/op	     117 allocs/op

BenchmarkParseInLocationLunes-10      2777323	       429.4 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseInLocationMonday-10      210357	      5596   ns/op	    3754 B/op	     117 allocs/op
```

Given `ParseLocalizedStrptime` uses `lunes.Translate` under the hood,
its performance is similar to the existing `ParseStrptime`, adding an
extra overhead of ~303 ns/op for translating the value before parsing:

```
BenchmarkTranslate-10            3591234	       302.4 ns/op	     220 B/op	       5 allocs/op
```

```
BenchmarkParseLocalizedStrptime-10    	  821572	    1405 ns/op	     429 B/op	       9 allocs/op
BenchmarkParseStrptime-10             	 1000000	    1082 ns/op	     186 B/op	       6 allocs/op
```

2 - Translations are based on the [CLDR](https://cldr.unicode.org/)
project, and it does support 900+ locales (vs 45+), including locales in
draft stage. Those lunes translations are
[generated](https://github.com/elastic/lunes/blob/main/generator.go)
from a CLDR core file, and does not require manually changes.

3 - Replicates all the relevant standard `time.format_test.go` test
cases, parsing foreign language values with and without layout
replacements in all available locales and supported layouts
(https://github.com/elastic/lunes/blob/main/lunes_test.go#L154).

4 - It is actively maintained (hosted under elastic repo).

**Link to tracking Issue:**
#32977

**Testing:** 
- Added unit tests

For now, the only way of manually testing this functionality is by
invoking the `ParseLocalizedStrptime` function manually through tests.

**Documentation:**
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
…pen-telemetry#34353)

**Description:** <Describe what has changed.>
Added support for localized time parsing into the
`coreinternal/timeutils` package.

The [tracking
issue](open-telemetry#32977)
is a following up to
open-telemetry#32140,
and the added function (`ParseLocalizedStrptime`) can be used later to
[add locale
support](open-telemetry#32978)
to the ottl `Time` function.

As discussed in the related issues, the plan is to have a similar
support as implemented by the library
[monday](https://github.com/goodsign/monday), making it possible to
parse non-english time strings into `time.Time` objects.
Elastic has built a new OSS library for that same purpose
([lunes](https://github.com/elastic/lunes)), that considering the
discussed requirements, seems to be a better option than `monday`. Both
libraries are just wrapper to the `time.Parse` and
`time.ParseInLocation` features. They work by translating the foreign
language value to English before calling the standard parsing functions.

The main `lunes` differences are:

1 - Performance is considerably better, ~13x faster for complete .Parse
operations:

```
BenchmarkParseLunes-10                2707008	       429.7 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseMonday-10                212086	      5630   ns/op	    3753 B/op	     117 allocs/op

BenchmarkParseInLocationLunes-10      2777323	       429.4 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseInLocationMonday-10      210357	      5596   ns/op	    3754 B/op	     117 allocs/op
```

Given `ParseLocalizedStrptime` uses `lunes.Translate` under the hood,
its performance is similar to the existing `ParseStrptime`, adding an
extra overhead of ~303 ns/op for translating the value before parsing:

```
BenchmarkTranslate-10            3591234	       302.4 ns/op	     220 B/op	       5 allocs/op
```

```
BenchmarkParseLocalizedStrptime-10    	  821572	    1405 ns/op	     429 B/op	       9 allocs/op
BenchmarkParseStrptime-10             	 1000000	    1082 ns/op	     186 B/op	       6 allocs/op
```

2 - Translations are based on the [CLDR](https://cldr.unicode.org/)
project, and it does support 900+ locales (vs 45+), including locales in
draft stage. Those lunes translations are
[generated](https://github.com/elastic/lunes/blob/main/generator.go)
from a CLDR core file, and does not require manually changes.

3 - Replicates all the relevant standard `time.format_test.go` test
cases, parsing foreign language values with and without layout
replacements in all available locales and supported layouts
(https://github.com/elastic/lunes/blob/main/lunes_test.go#L154).

4 - It is actively maintained (hosted under elastic repo).

**Link to tracking Issue:**
open-telemetry#32977

**Testing:** 
- Added unit tests

For now, the only way of manually testing this functionality is by
invoking the `ParseLocalizedStrptime` function manually through tests.

**Documentation:**
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 13, 2024
TylerHelmuth pushed a commit that referenced this issue Sep 20, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

Added support for locale in the `Time` converter, so it can parse
timestamps written in non-english languages.

The new `locale` parameter's value is optional, and can be specified as:
`Time("Febrero 25 lunes, 2002, 02:03:04 p.m.", "%B %d %A, %Y, %r",
"America/New_York", "es-ES")`

The value must be a well-formed BCP-47 language tag, and a known
[CLDR](https://cldr.unicode.org) v45 locale.

**Link to tracking Issue:**
#32978

**Testing:** Unit tests

**Documentation:** ottl/README was updated to include the new optional
`locale` parameter.
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this issue Oct 4, 2024
…pen-telemetry#34353)

**Description:** <Describe what has changed.>
Added support for localized time parsing into the
`coreinternal/timeutils` package.

The [tracking
issue](open-telemetry#32977)
is a following up to
open-telemetry#32140,
and the added function (`ParseLocalizedStrptime`) can be used later to
[add locale
support](open-telemetry#32978)
to the ottl `Time` function.

As discussed in the related issues, the plan is to have a similar
support as implemented by the library
[monday](https://github.com/goodsign/monday), making it possible to
parse non-english time strings into `time.Time` objects.
Elastic has built a new OSS library for that same purpose
([lunes](https://github.com/elastic/lunes)), that considering the
discussed requirements, seems to be a better option than `monday`. Both
libraries are just wrapper to the `time.Parse` and
`time.ParseInLocation` features. They work by translating the foreign
language value to English before calling the standard parsing functions.

The main `lunes` differences are:

1 - Performance is considerably better, ~13x faster for complete .Parse
operations:

```
BenchmarkParseLunes-10                2707008	       429.7 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseMonday-10                212086	      5630   ns/op	    3753 B/op	     117 allocs/op

BenchmarkParseInLocationLunes-10      2777323	       429.4 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseInLocationMonday-10      210357	      5596   ns/op	    3754 B/op	     117 allocs/op
```

Given `ParseLocalizedStrptime` uses `lunes.Translate` under the hood,
its performance is similar to the existing `ParseStrptime`, adding an
extra overhead of ~303 ns/op for translating the value before parsing:

```
BenchmarkTranslate-10            3591234	       302.4 ns/op	     220 B/op	       5 allocs/op
```

```
BenchmarkParseLocalizedStrptime-10    	  821572	    1405 ns/op	     429 B/op	       9 allocs/op
BenchmarkParseStrptime-10             	 1000000	    1082 ns/op	     186 B/op	       6 allocs/op
```

2 - Translations are based on the [CLDR](https://cldr.unicode.org/)
project, and it does support 900+ locales (vs 45+), including locales in
draft stage. Those lunes translations are
[generated](https://github.com/elastic/lunes/blob/main/generator.go)
from a CLDR core file, and does not require manually changes.

3 - Replicates all the relevant standard `time.format_test.go` test
cases, parsing foreign language values with and without layout
replacements in all available locales and supported layouts
(https://github.com/elastic/lunes/blob/main/lunes_test.go#L154).

4 - It is actively maintained (hosted under elastic repo).

**Link to tracking Issue:**
open-telemetry#32977

**Testing:** 
- Added unit tests

For now, the only way of manually testing this functionality is by
invoking the `ParseLocalizedStrptime` function manually through tests.

**Documentation:**
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this issue Oct 4, 2024
…ry#35107)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

Added support for locale in the `Time` converter, so it can parse
timestamps written in non-english languages.

The new `locale` parameter's value is optional, and can be specified as:
`Time("Febrero 25 lunes, 2002, 02:03:04 p.m.", "%B %d %A, %Y, %r",
"America/New_York", "es-ES")`

The value must be a well-formed BCP-47 language tag, and a known
[CLDR](https://cldr.unicode.org) v45 locale.

**Link to tracking Issue:**
open-telemetry#32978

**Testing:** Unit tests

**Documentation:** ottl/README was updated to include the new optional
`locale` parameter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants