Capture range without producing values #75

asl · 2022-05-15T20:45:00Z

asl
May 15, 2022

This is somehow connected with #58: what is the proper lexy'ish way to capture matched range (especially when optionals are involved) but without producing values for intermediate productions.

Consider e.g. a floating point value as in json example:

    struct float_value : lexy::token_production {
        struct integer : lexy::transparent_production {
            static constexpr auto rule
            = dsl::minus_sign + dsl::integer<std::int64_t>(dsl::digits<>.no_leading_zero());
            static constexpr auto value = lexy::as_integer<int64_t>;
        };

        struct fraction : lexy::transparent_production {
            static constexpr auto rule  = dsl::lit_c<'.'> >> dsl::capture(dsl::digits<>);
            static constexpr auto value = lexy::as_string<std::string>;
        };

        struct exponent : lexy::transparent_production {
            static constexpr auto rule = [] {
                auto exp_char = dsl::lit_c<'e'> | dsl::lit_c<'E'>;
                return exp_char >> dsl::sign + dsl::integer<std::int16_t>;
            }();
            static constexpr auto value = lexy::as_integer<std::int16_t>;
        };

        static constexpr auto rule =
                dsl::peek(dsl::lit_c<'-'> / dsl::digit<>) >>
                dsl::p<integer> +
                dsl::opt(dsl::p<fraction>) +
                dsl::opt(dsl::p<exponent>);

        // value omitted
    };

Here, we'd essentially end with 3 values: int64_t and two optionals: std::string and int16_t. Now, if I'd like to construct a float, then I'd need to construct it from parts and this is a bit ugly given that lots of intermediates are constructed here and fractional parts needs to be converted explicitly and separately (though, it could be captured as a float value as 0.fraction making everything is bit simpler).

If I'd just have a position of start and end of match, then things like atof or std::from_char could be used (yes, double parsing, however, no memory allocation for optionals, so might be even faster!). I tried to fold everything into a simple rule in order not to produce intermediate values:

    struct float_value : lexy::token_production {
        static constexpr auto rule = [] {
            auto integer = dsl::if_(dsl::lit_c<'-'>) + dsl::digits<>.no_leading_zero();
            auto fraction  = dsl::lit_c<'.'> >> dsl::digits<>;
            auto exp_char = dsl::lit_c<'e'> | dsl::lit_c<'E'>;
            auto exponent = exp_char >> (dsl::lit_c<'+'> | dsl::lit_c<'-'>) + dsl::digits<>;
            return dsl::peek(dsl::lit_c<'-'> / dsl::digit<>) >>
                    integer +
                    dsl::if_(fraction) +
                    dsl::if_(exponent);
        }();

How I'd capture the integer + fraction + exponent part? So far the closest thing I ended with is a pair of dsl::position like this:

    struct float_value : lexy::token_production {
        static constexpr auto rule = [] {
            auto integer = dsl::if_(dsl::lit_c<'-'>) + dsl::digits<>.no_leading_zero();
            auto fraction  = dsl::lit_c<'.'> >> dsl::digits<>;
            auto exp_char = dsl::lit_c<'e'> | dsl::lit_c<'E'>;
            auto exponent = exp_char >> (dsl::lit_c<'+'> | dsl::lit_c<'-'>) + dsl::digits<>;
            return dsl::peek(dsl::lit_c<'-'> / dsl::digit<>) >>
                    dsl::position +
                    integer +
                    dsl::if_(fraction) +
                    dsl::if_(exponent) +
                    dsl::position;
        }();
        
        static constexpr float atof(const char* first, const char* last) {
            // std::from_chars(const char*, const char*, float) is only
            // available from libc++ starting from LLVM 14 :(
            (void)(last);
            return ::atof(first);
        }

        static constexpr auto value = lexy::callback<float>(
            [](const char *first, const char *last) { return atof(first, last); }
        );
    };

But maybe there is a better way here?

PS: Maybe there is way to have something like as_string_view in addition to as_string and defer the value conversion?

Answered by foonathan

May 22, 2022

I have added support for dsl::capture(dsl::p<token_production>), as well as iterator range support to lexy::as_string. The latter allows lexy::as_string<std::string_view> from two dsl::position (in C++20 where it has the range constructor).

View full answer

foonathan · 2022-05-16T14:50:19Z

foonathan
May 16, 2022
Maintainer

Now, if I'd like to construct a float, then I'd need to construct it from parts and this is a bit ugly given that lots of intermediates are constructed here and fractional parts needs to be converted explicitly and separately (though, it could be captured as a float value as 0.fraction` making everything is bit simpler).

I plan on adding a callback that does that for you at some point in the future. However, this is non-trivial and will take some time; just wanted to let you know.

(yes, double parsing, however, no memory allocation for optionals, so might be even faster!

I'm not sure what you mean, lexy never does memory allocation? But yeah, the performance difference should be neglible - but it only works if you're parsing a float that matches the format. E.g. a German floating point number 3,14 wouldn't work.

How I'd capture the integer + fraction + exponent part? So far the closest thing I ended with is a pair of dsl::position like this:
But maybe there is a better way here?

Not at the moment now. The issue with whitespace from #58 still persists. Maybe I can add dsl::capture<TokenProduction>? This is a more general dsl::capture(token) that still handles whitespace properly.

PS: Maybe there is way to have something like as_string_view in addition to as_string and defer the value conversion?

as_string<std::string_view> does work if you're having a lexy::lexeme and the underlying iterator type is a pointer. https://lexy.foonathan.net/reference/callback/string/#as_string

There is no overload that accepts two iterators as produced by dsl::position however.

0 replies

asl · 2022-05-16T14:56:33Z

asl
May 16, 2022
Author

I'm not sure what you mean, lexy never does memory allocation?

Indeed. but std::optional does. So, I'd like to get rid of extra indirection if it could be possible.

This is a more general dsl::capture(token) that still handles whitespace properly.

Funny enough, in my case I'd not bother about whitespace as no extra whitespace is allowed :)

There is no overload that accepts two iterators as produced by dsl::position however.

Right. So far I've ended in some cases with custom callback, pair of delimiters or in other cases (e.g. when I need to capture exactly 2 characters) with dsl::capture(dsl::token(something)).

1 reply

foonathan May 16, 2022
Maintainer

Indeed. but std::optional does. So, I'd like to get rid of extra indirection if it could be possible.

std::optional doesn't allocate either?

Right. So far I've ended in some cases with custom callback, pair of delimiters or in other cases (e.g. when I need to capture exactly 2 characters) with dsl::capture(dsl::token(something)).

Would dsl::capture<TokenProduction> work for you?

foonathan · 2022-05-22T17:49:36Z

foonathan
May 22, 2022
Maintainer

I have added support for dsl::capture(dsl::p<token_production>), as well as iterator range support to lexy::as_string. The latter allows lexy::as_string<std::string_view> from two dsl::position (in C++20 where it has the range constructor).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture range without producing values #75

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Capture range without producing values #75

asl May 15, 2022

Replies: 3 comments · 1 reply

foonathan May 16, 2022 Maintainer

asl May 16, 2022 Author

foonathan May 16, 2022 Maintainer

foonathan May 22, 2022 Maintainer

asl
May 15, 2022

Replies: 3 comments 1 reply

foonathan
May 16, 2022
Maintainer

asl
May 16, 2022
Author

foonathan May 16, 2022
Maintainer

foonathan
May 22, 2022
Maintainer