-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-31315: [C++][Docs] Document that the strptime kernel ignores %Z #40186
GH-31315: [C++][Docs] Document that the strptime kernel ignores %Z #40186
Conversation
I am guessing the checks are failing because of the clang format.
@rok when I used diff --git a/docs/source/cpp/compute.rst b/docs/source/cpp/compute.rst
index 2db5988b8..64660ec23 100644
--- a/docs/source/cpp/compute.rst
+++ b/docs/source/cpp/compute.rst
@@ -1,4 +1,4 @@
-.. Licensed to the Apache Software Foundation (ASF) under one
+..Licensed to the Apache Software Foundation(ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
@@ -62,44 +62,41 @@ Compute functions can be invoked by name using
:func:`arrow::compute::CallFunction`::
std::shared_ptr<arrow::Array> numbers_array = ...;
- std::shared_ptr<arrow::Scalar> increment = ...;
- arrow::Datum incremented_datum;
+std::shared_ptr<arrow::Scalar> increment = ...;
+arrow::Datum incremented_datum;
- ARROW_ASSIGN_OR_RAISE(incremented_datum,
- arrow::compute::CallFunction("add", {numbers_array, increment}));
- std::shared_ptr<Array> incremented_array = std::move(incremented_datum).make_array();
+ARROW_ASSIGN_OR_RAISE(incremented_datum,
+ arrow::compute::CallFunction("add", {numbers_array, increment}));
+std::shared_ptr<Array> incremented_array = std::move(incremented_datum).make_array();
-(note this example uses implicit conversion from ``std::shared_ptr<Array>``
-to ``Datum``)
+(note this example uses implicit conversion from ``std::shared_ptr<Array>`` to ``Datum``)
-Many compute functions are also available directly as concrete APIs, here
-:func:`arrow::compute::Add`::
+ Many compute functions are also available directly as concrete APIs,
+ here : func :`arrow::compute::Add`::
- std::shared_ptr<arrow::Array> numbers_array = ...;
- std::shared_ptr<arrow::Scalar> increment = ...;
- arrow::Datum incremented_datum;
+ std::shared_ptr<arrow::Array> numbers_array = ...;
+std::shared_ptr<arrow::Scalar> increment = ...;
+arrow::Datum incremented_datum;
- ARROW_ASSIGN_OR_RAISE(incremented_datum,
- arrow::compute::Add(numbers_array, increment));
- std::shared_ptr<Array> incremented_array = std::move(incremented_datum).make_array();
+ARROW_ASSIGN_OR_RAISE(incremented_datum, arrow::compute::Add(numbers_array, increment));
+std::shared_ptr<Array> incremented_array = std::move(incremented_datum).make_array();
-Some functions accept or require an options structure that determines the
-exact semantics of the function::
+Some functions accept or
+ require an options structure that determines the exact semantics of the function::
- ScalarAggregateOptions scalar_aggregate_options;
- scalar_aggregate_options.skip_nulls = false;
+ ScalarAggregateOptions scalar_aggregate_options;
+scalar_aggregate_options.skip_nulls = false;
- std::shared_ptr<arrow::Array> array = ...;
- arrow::Datum min_max;
+std::shared_ptr<arrow::Array> array = ...;
+arrow::Datum min_max;
- ARROW_ASSIGN_OR_RAISE(min_max,
- arrow::compute::CallFunction("min_max", {array},
- &scalar_aggregate_options));
+ARROW_ASSIGN_OR_RAISE(min_max, arrow::compute::CallFunction("min_max", {array},
+ &scalar_aggregate_options));
- // Unpack struct scalar result (a two-field {"min", "max"} scalar)
- std::shared_ptr<arrow::Scalar> min_value, max_value;
- min_value = min_max.scalar_as<arrow::StructScalar>().value[0];
- max_value = min_max.scalar_as<arrow::StructScalar>().value[1];
+// Unpack struct scalar result (a two-field {"min", "max"} scalar)
+std::shared_ptr<arrow::Scalar> min_value, max_value;
+min_value = min_max.scalar_as<arrow::StructScalar>().value[0];
+max_value = min_max.scalar_as<arrow::StructScalar>().value[1];
However, :ref:`Grouped Aggregations <grouped-aggregations-group-by>` are
not invocable via ``CallFunction``.
@@ -257,12 +254,18 @@ the input to a single output value.
* \(4) For decimal inputs, the resulting decimal will have the same
precision and scale. The result is rounded away from zero.
-* \(5) Output is a ``{"min": input type, "max": input type}`` Struct.
+* \(5) Output is a ``{
+ "min" : input type, "max" : input type
+}`` Struct.
- Of the interval types, only the month interval is supported, as the day-time
- and month-day-nano types are not sortable.
+ Of the interval types,
+ only the month interval is supported,
+ as the day - time and month - day -
+ nano types are not sortable.
-* \(6) Output is an array of ``{"mode": input type, "count": Int64}`` Struct.
+ * \(6)Output is an array of ``{
+ "mode" : input type, "count" : Int64
+}`` Struct.
It contains the *N* most common elements in the input, in descending
order, where *N* is given in :member:`ModeOptions::n`.
If two values have the same count, the smallest one comes first.
@@ -394,7 +397,8 @@ equivalents above and reflects how they are implemented internally.
* \(4) For decimal inputs, the resulting decimal will have the same
precision and scale. The result is rounded away from zero.
-* \(5) Output is a ``{"min": input type, "max": input type}`` Struct array.
+* \(5) Output is a ``{
+ "min" : input type, "max" : input type}`` Struct array.
Of the interval types, only the month interval is supported, as the day-time
and month-day-nano types are not sortable.
@@ -440,8 +444,8 @@ Arithmetic functions
~~~~~~~~~~~~~~~~~~~~
These functions expect inputs of numeric type and apply a given arithmetic
-operation to each element(s) gathered from the input(s). If any of the
-input element(s) is null, the corresponding output element is null.
+operation to each element(s)
+gathered from the input(s).If any of the input element(s) is null, the corresponding output element is null.
For binary functions, input(s) will be cast to the
:ref:`common numeric type <common-numeric-type>`
(and dictionary decoded, if applicable) before the operation is applied.
@@ -603,13 +607,16 @@ The example values are given for default values of ``ndigits`` and ``multiple``.
+-----------------------+--------------------------------------------------------------+---------------------------+
| ``round_mode`` | Operation performed | Example values |
+=======================+==============================================================+===========================+
-| DOWN | Round to nearest integer less than or equal in magnitude; | 3.2 -> 3, 3.7 -> 3, |
-| | also known as ``floor(x)`` | -3.2 -> -4, -3.7 -> -4 |
-+-----------------------+--------------------------------------------------------------+---------------------------+
-| UP | Round to nearest integer greater than or equal in magnitude; | 3.2 -> 4, 3.7 -> 4, |
-| | also known as ``ceil(x)`` | -3.2 -> -3, -3.7 -> -3 |
-+-----------------------+--------------------------------------------------------------+---------------------------+
-| TOWARDS_ZERO | Get the integral part without fractional digits; | 3.2 -> 3, 3.7 -> 3, |
+| DOWN | Round to nearest integer less than or equal in magnitude;
+| 3.2->3, 3.7->3, | | | also known as ``floor(x)`` | -3.2->- 4,
+ -3.7->- 4 |
+ +-- -- -- -- -- -- -- -- -- -- -- -+-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --+-- -- -- -- -- -- -- -- -- -- -- -- -- -+ |
+ UP | Round to nearest integer greater than
+ or equal in magnitude;
+| 3.2->4, 3.7->4, | | | also known as ``ceil(x)`` | -3.2->- 3,
+ -3.7->- 3 |
+ +-- -- -- -- -- -- -- -- -- -- -- -+-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --+-- -- -- -- -- -- -- -- -- -- -- -- -- -+ |
+ TOWARDS_ZERO | Get the integral part without fractional digits; | 3.2 -> 3, 3.7 -> 3, |
| | also known as ``trunc(x)`` | -3.2 -> -3, -3.7 -> -3 |
+-----------------------+--------------------------------------------------------------+---------------------------+
| TOWARDS_INFINITY | Round negative values with ``DOWN`` rule, | 3.2 -> 4, 3.7 -> 4, |
@@ -1117,7 +1124,8 @@ String Slicing
This function transforms each sequence of the array to a subsequence, according
to start and stop indices, and a non-zero step (defaulting to 1). Slicing
semantics follow Python slicing semantics: the start index is inclusive,
-the stop index exclusive; if the step is negative, the sequence is followed
+the stop index exclusive;
+if the step is negative, the sequence is followed
in reverse order.
+--------------------------+------------+-------------------------+-------------------------+--------------------------+---------+
@@ -1427,26 +1435,30 @@ null input value is converted into a null output value.
is available).
* \(2) The field names of the output type must be the same or a subset of the
- field names of the input type; they also must have the same order. Casting to
- a subset of field names "selects" those fields such that each output field
- matches the data of the input field with the same name.
-
-* \(3) The list offsets are unchanged, the list values are cast from the
- input value type to the output value type (if a conversion is
- available).
-
-* \(4) Offsets are unchanged, the keys and values are cast from respective input
- to output types (if a conversion is available). If output type is a list of
- struct, the key field is output as the first field and the value field the
- second field, regardless of field names chosen.
-
-* \(5) Any input type that can be cast to the resulting extension's storage type.
- This excludes extension types, unless being cast to the same extension type.
-
-Temporal component extraction
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-These functions extract datetime components (year, month, day, etc) from temporal types.
+ field names of the input type;
+they also must have the same order
+ .Casting to a subset of field names "selects" those fields such that each output field
+ matches the data of the input field with the same name.
+
+ * \(3)The list offsets are unchanged,
+ the list values are cast from the input value type to the output value
+ type(if a conversion is available)
+ .
+
+ * \(4)Offsets are unchanged,
+ the keys and values are cast from respective input to output
+ types(if a conversion is available)
+ .If output type is a list of struct,
+ the key field is output as the first field and the value field the second field,
+ regardless of field names chosen.
+
+ * \(5)Any input type that can be cast to the resulting
+ extension's storage type. This excludes extension types,
+ unless being cast to the same extension type.
+
+ Temporal component extraction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ These functions extract datetime components(year, month, day, etc) from temporal types.
For timestamps inputs with non-empty timezone, localized timestamp components will be returned.
+--------------------+------------+-------------------+---------------+----------------------------+-------+
@@ -1507,7 +1519,8 @@ For timestamps inputs with non-empty timezone, localized timestamp components wi
starts with the first ISO week. ISO week starts on Monday.
See `ISO 8601 week date definition`_ for more details.
-* \(3) Output is a ``{"iso_year": output type, "iso_week": output type, "iso_day_of_week": output type}`` Struct.
+* \(3) Output is a ``{
+ "iso_year" : output type, "iso_week" : output type, "iso_day_of_week" : output type}`` Struct.
* \(4) First US week has the majority (4 or more) of its days in January. US year
starts with the first US week. US week starts on Sunday.
@@ -1517,9 +1530,12 @@ For timestamps inputs with non-empty timezone, localized timestamp components wi
If :member:`WeekOptions::count_from_zero` is true, dates from the current year that fall into the last ISO week
of the previous year are numbered as week 0, else week 52 or 53 if false.
If :member:`WeekOptions::first_week_is_fully_in_year` is true, the first week (week 1) must fully be in January;
- else if false, a week that begins on December 29, 30, or 31 is considered the first week of the new year.
+else if false, a week that begins on December 29, 30,
+ or 31 is considered the first week of the new year.
-* \(6) Output is a ``{"year": int64(), "month": int64(), "day": int64()}`` Struct.
+ * \(6)Output is a ``{
+ "year" : int64(), "month" : int64(), "day" : int64()
+}`` Struct.
.. _ISO 8601 week date definition: https://en.wikipedia.org/wiki/ISO_week_date#First_week
@@ -1681,7 +1697,8 @@ Associative transforms
* \(2) Duplicates are removed from the output while the original order is
maintained.
-* \(3) Output is a ``{"values": input type, "counts": Int64}`` Struct.
+* \(3) Output is a ``{
+ "values" : input type, "counts" : Int64}`` Struct.
Each output element corresponds to a unique value in the input, along
with the number of times this value has appeared.
|
I think you can start with the linter error. |
Co-authored-by: Rok Mihevc <[email protected]>
@github-actions crossbow submit preview-docs |
|
@github-actions crossbow submit preview-docs |
Revision: 084fa49 Submitted crossbow builds: ursacomputing/crossbow @ actions-8eda212e55
|
Co-authored-by: Rok Mihevc <[email protected]>
"fails parsing, an error is returned by default."), | ||
"fails parsing, an error is returned by default.\n" | ||
"\n" | ||
"**Note:** The strptime kernel currently ignores the %Z specifier for any string."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's: line length exceeds 78 characters
.
"**Note:** The strptime kernel currently ignores the %Z specifier for any string."), | |
"**Note:** The strptime kernel currently ignores the %Z specifier for any \n" | |
"string."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. For this change the you want to use is archery lint --clang-format --fix
as you're changing c++ source. As mentioned Python docs are generated from the docstring from scalar_temporal_unary.cc
which makes docs generation a convoluted process and not the easiest to get into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't mind could you please suggest me something to fix these errors that are coming on my compiler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to this? #40186 (comment)
If so - you shouldn't need to run clang-format -i compute.rst
.
If not please specify what errors are you seeing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, #40186 (comment) is not the error I am talking I figured that one out I was talking about when i used archery lint --clang-format --fix
commad below files were getting edited for no reason.Also please take a look at this #38279 (comment)
diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index ad01d4557..e441278ae 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -1748,7 +1748,8 @@ cdef class Array(_PandasConvertible):
inner_array = pyarrow_unwrap_array(casted_array)
except ArrowInvalid as e:
raise ValueError(
- f"Could not cast {self.type} to requested type {target_type}: {e}"
+ f"Could not cast {self.type} to requested type {
+ target_type}: {e}"
)
else:
inner_array = self.sp_array
diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi
index ee3872aa3..18b7944d1 100644
--- a/python/pyarrow/table.pxi
+++ b/python/pyarrow/table.pxi
@@ -3081,7 +3081,8 @@ cdef class RecordBatch(_Tabular):
inner_batch = pyarrow_unwrap_batch(casted_batch)
except ArrowInvalid as e:
raise ValueError(
- f"Could not cast {self.schema} to requested schema {target_schema}: {e}"
+ f"Could not cast {self.schema} to requested schema {
+ target_schema}: {e}"
)
else:
inner_batch = self.sp_batch
diff --git a/python/pyarrow/types.pxi b/python/pyarrow/types.pxi
index e9bf56c62..1f7773174 100644
--- a/python/pyarrow/types.pxi
+++ b/python/pyarrow/types.pxi
@@ -133,7 +133,8 @@ cdef void* _as_c_pointer(v, allow_null=False) except *:
else:
capsule_name_str = capsule_name.decode()
raise ValueError(
- f"Can't convert PyCapsule with name '{capsule_name_str}' to pointer address"
+ f"Can't convert PyCapsule with name '{
+ capsule_name_str}' to pointer address"
)
else:
raise TypeError(f"Expected a pointer value, got {type(v)!r}")
if you require the logs for the commands to understand better I can provide you with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is odd, I would not expect clang-format
to touch .pxi
files. Can you try reverting the changes and rerunning clang-format
?
If that doesn't help I'd suggest reinstalling archery
, flake8
and cython-lint
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think you can ignore this for the purposes of this PR.
Co-authored-by: Rok Mihevc <[email protected]>
@github-actions crossbow submit preview-docs |
Revision: 296f2e7 Submitted crossbow builds: ursacomputing/crossbow @ actions-29833d01de
|
Docs preview was build, here is the Python doc and the c++ doc. It seems the Python change came through ok, while you'd probably want to add the same language to the c++ doc. |
@rok does this change look good to you? diff --git a/docs/source/cpp/compute.rst b/docs/source/cpp/compute.rst
index e7310d2c0..82a8d2a11 100644
--- a/docs/source/cpp/compute.rst
+++ b/docs/source/cpp/compute.rst
@@ -1327,7 +1327,7 @@ provided by a concrete function :func:`~arrow::compute::Cast`.
+-----------------+------------+--------------------+------------------+--------------------------------+-------+
| strftime | Unary | Temporal | String | :struct:`StrftimeOptions` | \(1) |
+-----------------+------------+--------------------+------------------+--------------------------------+-------+
-| strptime | Unary | String-like | Timestamp | :struct:`StrptimeOptions` | |
+| strptime | Unary | String-like | Timestamp | :struct:`StrptimeOptions` | \(2) |
+-----------------+------------+--------------------+------------------+--------------------------------+-------+
The conversions available with ``cast`` are listed below. In all cases, a
@@ -1343,6 +1343,8 @@ null input value is converted into a null output value.
.. _detailed formatting documentation: https://howardhinnant.github.io/date/date.html#to_stream_formatting
+* \(2) The strptime kernel currently ignores the %Z specifier for any string.
+
**Truth value extraction**
+-----------------------------+------------------------------------+--------------+
|
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Can you please run the preview docs build command? @kou |
@github-actions crossbow submit preview-docs |
Revision: 9168d5b Submitted crossbow builds: ursacomputing/crossbow @ actions-e541a11760
|
@@ -1343,6 +1343,8 @@ null input value is converted into a null output value. | |||
|
|||
.. _detailed formatting documentation: https://howardhinnant.github.io/date/date.html#to_stream_formatting | |||
|
|||
* \(2) The strptime kernel currently ignores the ``%Z`` specifier for any string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the "for any string" is fully clear (I am also not a good wordsmith, so not directly an alternative)
Something more verbose: ".. ignores the %Z specifier and any string at the matching location" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The strptime kernel presently overlooks the "%Z" specifier along with any string located at the corresponding position within the input."
"The strptime function does not take into account the %Z format specifier, regardless of the characters that follow it."
Maybe one of these? @jorisvandenbossche
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this stalled @Divyansh200102! We sometimes loose track of waiting PRs. I think we should proceed with this, let's just agree on the wording.
"The strptime kernel presently overlooks the "%Z" specifier along with any string located at the corresponding position within the input."
Seems ok to me. What do you think @jorisvandenbossche ?
Rationale for this change
Document that the strptime kernel ignores %Z
What changes are included in this PR?
Documents that the strptime kernel ignores %Z for any string
Are these changes tested?
No
Are there any user-facing changes?
Yes