Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Duration.from_iso8601/1 #13473

Merged
merged 20 commits into from
May 27, 2024

Conversation

tfiedlerdejanze
Copy link
Contributor

@tfiedlerdejanze tfiedlerdejanze commented Apr 4, 2024

[Proposal] This is a draft for a proposal which may become relevant in the future if the community finds it beneficial.

This PR implements a parser for ISO 8601-2 formatted duration strings to create Duration.t structs. The extended specification is relatively lose and can be found here. One interesting discussion around the standard can be found here.

The current state of the PR:

  • supports fractional seconds
  • supports sign prefixes -P1D, -P1M-3D and P1M-3D
  • introduces the ~P sigil as ~P[PY4]. While the second P may seem redundant it is more intuitive to use with sign prefixes and allows to easily copy/paste duration strings to or from a sigil, just like the other calendar ISO formatted date/time sigils

Copy link
Contributor

@sabiwara sabiwara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely in favor of having the parse function. Perhaps we could name it from_iso8601 in order to be consistent with other calendar methods, WDYT?

Regarding the sigil, it is a bit cryptic indeed.
I've been thinking, if we prefer something more spelled out, an alternative could be a macro rather than a sigil.
Kernel.duration/1, basically would work like new! but would run at compile time: duration(month: 1, minute: 1).

lib/elixir/lib/calendar/duration.ex Outdated Show resolved Hide resolved
@tfiedlerdejanze tfiedlerdejanze changed the title add Duration.parse/1 and sigil_P add Duration.from_iso8601/1 Apr 4, 2024
@tfiedlerdejanze
Copy link
Contributor Author

I've dropped the sigil for now, and i like the suggestion of adding Kernel.duration/1 to get some compile time safety when using duration strings though!

@wojtekmach
Copy link
Member

duration/1 macro is a nice idea. It’s just couple characters less than either %Duration{…} or Duration.new!(…) but doing things at compile time is potentially appealing. (Eg if all kwargs are literals call Duration.new!/1 at compile time.)

Such macro would have pretty strong record vibes and in general be pretty uncommon FWIW.

@wojtekmach
Copy link
Member

And I think it’d be a closer call whether to support both or just the latter:

DateTime.utc_ago(hour: 42)
DateTime.utc_ago(duration(hour: 42))

@sabiwara
Copy link
Contributor

sabiwara commented Apr 4, 2024

duration/1 macro is a nice idea. It’s just couple characters less than either %Duration{…} or Duration.new!(…) but doing things at compile time is potentially appealing. (Eg if all kwargs are literals call Duration.new!/1 at compile time.)

Also in addition, there is also the fact it could be used in pattern-matching.

Such macro would have pretty strong record vibes and in general be pretty uncommon FWIW.

I agree, although sigils have been the typical way of achieving this which are technically macros, but unlike other calendar structs I'm not sure a sigil would feel very natural. And there's .. for ranges also, which is also a macro but has an operator.
Anyway, I'm fine with the sigil too, was just trying to think of an alternative :)

@josevalim
Copy link
Member

but doing things at compile time is potentially appealing

There is something the Erlang compiler does, and maybe we should start doing it too, which is to inline pure functions in stdlib. If we see Duration.new! given with only literals or the second argument in Date.shift, we could automatically convert to a duration at compile time. That would solve the concerns outlined here and it probably applies in many other scenarios.

Regarding this PR, I agree the sigil is not readable and, while we could add from_iso8601, I don't think we should rush in adding it if we are not going to use it ourselves, so I think we should add it for now.

I would love a human language sigil, I think ~P[3 hours] and ~D[3 hours ago] would both be very readable but there is no consensus, should we should postpone on this for now. Thank you @tfiedlerdejanze!

@josevalim josevalim closed this Apr 5, 2024
@josevalim
Copy link
Member

@tfiedlerdejanze, I am reopening so we add this function, but without the sigil. Can you please update it accordingly?

@josevalim josevalim reopened this May 25, 2024
@tfiedlerdejanze
Copy link
Contributor Author

@tfiedlerdejanze, I am reopening so we add this function, but without the sigil. Can you please update it accordingly?

That's great @josevalim i'll be able to update the PR this afternoon or tomorrow by the latest, thanks!

@tanguilp
Copy link
Contributor

Was reading the code, we'll need to update https://github.com/elixir-lang/elixir/blob/main/lib/elixir/lib/calendar/iso.ex#L32 as well:

No functions exist to parse ISO 8601 durations or time intervals.

@tanguilp
Copy link
Contributor

FYI I've run a benchmark comparing this version with the following (incomplete) regex implementation:

  def from_iso8601_with_regex(string) do
    ~r"^P((?<year>\d+)Y)?((?<month>\d+)M)?((?<week>\d+)W)?((?<day>\d+)D)?(T((?<hour>\d+)H)?((?<minute>\d+)M)?((?<second>\d+)S)?)?$"
    |> Regex.named_captures(string)
    |> case do
      %{} = named_captures ->
        fields =
          for {component_str, value_str} <- named_captures, value_str != "" do
            {String.to_atom(component_str), value_str |> Integer.parse() |> elem(0)}
          end

      {:ok, struct(Duration, fields)}

      nil ->
        {:error, :invalid_format}
    end
  end
test_durations = ~w|P3WT5H3M PT5H3M P1Y2M3D PT4H5M6S P1Y2M P3D PT4H5M PT6S P4Y2W3Y P5HT4MT3S P5H3HT4M invalid|

Benchee.run(
  %{
    "Elixir parser" => fn -> test_durations |> Enum.each(&Duration.from_iso8601(&1)) end,
    "Regex" => fn -> test_durations |> Enum.each(&Duration.from_iso8601_with_regex(&1)) end
  },
  time: 100,
  memory_time: 2
)

Results:

Operating System: Linux
CPU Information: Intel(R) Core(TM) i5-9400H CPU @ 2.50GHz
Number of Available Cores: 8
Available memory: 15.26 GB
Elixir 1.16.1
Erlang 27.0
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 1 min 40 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 3 min 28 s

Benchmarking Elixir parser ...
Benchmarking Regex ...
Calculating statistics...
Formatting results...

Name                    ips        average  deviation         median         99th %
Elixir parser       81.55 K       12.26 μs   ±246.53%       10.45 μs       74.55 μs
Regex               23.84 K       41.94 μs    ±46.29%       41.01 μs       60.03 μs

Comparison: 
Elixir parser       81.55 K
Regex               23.84 K - 3.42x slower +29.68 μs

Memory usage statistics:

Name             Memory usage
Elixir parser        15.35 KB
Regex                20.26 KB - 1.32x memory usage +4.91 KB

@tfiedlerdejanze
Copy link
Contributor Author

I think the implementation is wrong because it allows YDMW in any order, and then HMS in any order?

@josevalim i've reworked the implementation so the parser now takes takes order into account. By parsing the duration components one by one, we can now also keep track of the period designator prefix, so we could more easily negate all the fields after parsing all individual components, in case we decide to support the prefix, in a different PR maybe.

@tanguilp thanks for the benchmarks! they look similar on my machine after the rework, but feel free to double check of course.

Finally @NickNeck thanks for your review and input on approaching the parsing. I definitely took some inspiration from it, as far as in grouping and generating the parse functions for the different duration components. Thanks for taking the time reviewing, it's much appreciated! :)

@tfiedlerdejanze tfiedlerdejanze marked this pull request as ready for review May 26, 2024 20:49
@josevalim josevalim merged commit ae5be31 into elixir-lang:main May 27, 2024
9 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

josevalim pushed a commit that referenced this pull request May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants