-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Refactor string conversion check #7557
Comments
I need to work on a function to check if a string is a valid integer. That means, combining bound check with the current |
I would like to move the I would also like to remove the If the
The documentation should indicate if overflow checking is done or not. I don't believe we should be changing all the |
Since the current |
Good point. Checking overflow for float is more difficult than for integers. @revans2, @andygrove? |
A PR for this has been submitted---just move the |
I just confirmed that for the Spark use case we don't care about overflow checking on floating point values. Even in ANSI operation is enabled when you overflow a floating point value Inf or -Inf is returned. Similar for numbers that are too small for a float 0.0 is returned. The only checking we care about is making sure that the format is correct. |
The current |
I planned to rewrite the |
This will effect the cython/python code that currently uses it. Also, there are 8 integer types which would need to be type-dispatched. (Fixed-point only has two). That is alot of extra generated code if Spark only cares about INT64 say. |
This addresses #7557. In summary: * Move `cudf::strings::is_integer()` code from `strings/chars_types.*` to `strings/convert/convert_integers.hpp/cu` * Move `cudf::strings::is_float()` code from `strings/chars_types.*` to `strings/convert/convert_floats.hpp/cu` * Remove `cudf::strings::all_integer()` and `cudf::strings::all_float()` Authors: - Nghia Truong (@ttnghia) Approvers: - GALI PREM SAGAR (@galipremsagar) - Jason Lowe (@jlowe) - Jake Hemstad (@jrhemstad) - David (@davidwendt) URL: #7599
Currently, there are functions to check whether a string is a valid representation of a number (integer/fixed point/float etc). However, those functions are scattered around and their purposes are inconsistent.
cudf/strings/string.cuh
andcudf/strings/char_types.hpp
, there areis_integer
andis_float
functions, which check whether a string has the correct pattern so it can be converted into a valid number. However, those functions do not do bound check.strings/convert/convert_integer.hpp
, there is functionis_hex
to check if a string can be converted to a hex number. Again, no bound check.cudf/strings/convert/convert_fixed_point.hpp
, there is functionis_fixed_point
which does both pattern check and bound check.I want to refactor/reorganize those functions to enforce consistency. We should either group them together in
char_types.hpp
, or should put them in their correspondingstrings/conver/convert_xxx
places. In addition, since we do bound check for fixed point numbers, we should also support bound check for the other types. If not, we should add something to indicate whether a function supports bound check or not. Otherwise, by simply callingis_integer
oris_fixed_point
we cannot know which function does bound check and which one does not.The text was updated successfully, but these errors were encountered: