Export of internal ZetaSQL changes.

GitOrigin-RevId: 6c6fab8f816cac8828fc8201478c85043e19c106 Change-Id: I69dc4fd4dc7463425e0a9fec8ed61a1aa9b6c41b
google · Mar 4, 2024 · b7b67f5 · b7b67f5
1 parent 589026c
commit b7b67f5
Show file tree

Hide file tree

Showing 606 changed files with 100,992 additions and 39,286 deletions.
diff --git a/docs/aggregate-dp-functions.md b/docs/aggregate-dp-functions.md
diff --git a/docs/aggregate_functions.md b/docs/aggregate_functions.md
@@ -1058,10 +1058,10 @@ window_specification:
 Returns the count of `TRUE` values for `expression`. Returns `0` if there are
 zero input rows, or if `expression` evaluates to `FALSE` or `NULL` for all rows.
 
-Since `expression` must be a `BOOL`, the form
-`COUNTIF(DISTINCT ...)` is generally not useful: there is only one distinct
-value of `TRUE`. So `COUNTIF(DISTINCT ...)` will return 1 if `expression`
-evaluates to `TRUE` for one or more input rows, or 0 otherwise.
+Since `expression` must be a `BOOL`, the form `COUNTIF(DISTINCT ...)` is
+generally not useful: there is only one distinct value of `TRUE`. So
+`COUNTIF(DISTINCT ...)` will return 1 if `expression` evaluates to `TRUE` for
+one or more input rows, or 0 otherwise.
 Usually when someone wants to combine `COUNTIF` and `DISTINCT`, they
 want to count the number of distinct values of an expression for which a certain
 condition is satisfied. One recipe to achieve this is the following:

diff --git a/docs/array_functions.md b/docs/array_functions.md
@@ -185,6 +185,15 @@ ZetaSQL supports the following array functions.
   </td>
 </tr>
 
+<tr>
+  <td><a href="#array_zip"><code>ARRAY_ZIP</code></a>
+
+</td>
+  <td>
+    Combines elements from two to four arrays into one array.
+  </td>
+</tr>
+
 <tr>
   <td><a href="#flatten"><code>FLATTEN</code></a>
 
@@ -1305,6 +1314,174 @@ SELECT
 
 [lambda-definition]: https://github.com/google/zetasql/blob/master/docs/functions-reference.md#lambdas
 
+### `ARRAY_ZIP`
+
+```sql
+ARRAY_ZIP(
+  array_input [ AS alias ],
+  array_input [ AS alias ][, ... ]
+  [, transformation => lambda_expression ]
+  [, mode => { 'STRICT' | 'TRUNCATE' | 'PAD' } ]
+)
+```
+
+**Description**
+
+Combines the elements from two to four arrays into one array.
+
+**Definitions**
+
++   `array_input`: An input `ARRAY` value to be zipped with the other array
+    inputs. `ARRAY_ZIP` supports two to four input arrays.
++   `alias`: An alias optionally supplied for an `array_input`. In the results,
+    the alias is the name of the associated `STRUCT` field.
++   `transformation`: A optionally-named lambda argument. `lambda_expression`
+    specifies how elements are combined as they are zipped. This overrides
+    the default `STRUCT` creation behavior.
++   `mode`: A mandatory-named argument that determines how arrays of differing
+    lengths are zipped. If this optional argument is not supplied, the function
+    uses `STRICT` mode by default. This argument can be one of the following
+    values:
+
+    +   `STRICT` (default): If the length of any array is different from the
+        others, produce an error.
+
+    +   `TRUNCATE`: Truncate longer arrays to match the length of the shortest
+        array.
+
+    +   `PAD`: Pad shorter arrays with `NULL` values to match the length of the
+        longest array.
+
+**Details**
+
++   If an `array_input` or `mode` is `NULL`, this function returns `NULL`, even when
+    `mode` is `STRICT`.
++   Argument aliases can't be used with the `transformation` argument.
+
+**Return type**
+
++   If `transformation` is used and `lambda_expression` returns type `T`, the
+    return type is `ARRAY<T>`.
++   Otherwise, the return type is `ARRAY<STRUCT>`, with the `STRUCT` having a
+    number of fields equal to the number of input arrays. Each field's name is
+    either the user-provided `alias` for the corresponding `array_input`, or a
+    default alias assigned by the compiler, following the same logic used for
+    [naming columns in a SELECT list][implicit-aliases].
+
+**Examples**
+
+The following query zips two arrays into one:
+
+```sql
+SELECT ARRAY_ZIP([1, 2], ['a', 'b']) AS results
+
+/*----------------------*
+ | results              |
+ +----------------------+
+ | [(1, 'a'), (2, 'b')] |
+ *----------------------*/
+```
+
+You can give an array an alias. For example, in the following
+query, the returned array is of type `ARRAY<STRUCT<A1, alias_inferred>>`,
+where:
+
++   `A1` is the alias provided for array `[1, 2]`.
++   `alias_inferred` is the inferred alias provided for array `['a', 'b']`.
+
+```sql
+WITH T AS (
+  SELECT ['a', 'b'] AS alias_inferred
+)
+SELECT ARRAY_ZIP([1, 2] AS A1, alias_inferred) AS results
+FROM T
+
+/*----------------------------------------------------------+
+ | results                                                  |
+ +----------------------------------------------------------+
+ | [{1 A1, 'a' alias_inferred}, {2 A1, 'b' alias_inferred}] |
+ +----------------------------------------------------------*/
+```
+
+To provide a custom transformation of the input arrays, use the `transformation`
+argument:
+
+```sql
+SELECT ARRAY_ZIP([1, 2], [3, 4], transformation => (e1, e2) -> (e1 + e2))
+
+/*---------+
+ | results |
+ +---------+
+ | [4, 6]  |
+ +---------*/
+```
+
+The argument name `transformation` is not required. For example:
+
+```sql
+SELECT ARRAY_ZIP([1, 2], [3, 4], (e1, e2) -> (e1 + e2))
+
+/*---------+
+ | results |
+ +---------+
+ | [4, 6]  |
+ +---------*/
+```
+
+When `transformation` is provided, the input arrays are not allowed to have
+aliases. For example, the following query is invalid:
+
+```sql {.bad}
+-- Error: ARRAY_ZIP function with lambda argument does not allow providing
+-- argument aliases
+SELECT ARRAY_ZIP([1, 2], [3, 4] AS alias_not_allowed, (e1, e2) -> (e1 + e2))
+```
+
+To produce an error when arrays with different lengths are zipped, don't
+add `mode`, or if you do, set it as `STRICT`. For example:
+
+```sql {.bad}
+-- Error: Unequal array length
+SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd']) AS results
+```
+
+```sql {.bad}
+-- Error: Unequal array length
+SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd'], mode => 'STRICT') AS results
+```
+
+Use the `PAD` mode to pad missing values with `NULL` when input arrays have
+different lengths. For example:
+
+```sql
+SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd'], [], mode => 'PAD') AS results
+
+/*------------------------------------------------------------------------+
+ | results                                                                |
+ +------------------------------------------------------------------------+
+ | [{1, 'a', NULL}, {2, 'b', NULL}, {NULL, 'c', NULL}, {NULL, 'd', NULL}] |
+ +------------------------------------------------------------------------*/
+```
+
+Use the `TRUNCATE` mode to truncate all arrays that are longer than the shortest
+array. For example:
+
+```sql
+SELECT ARRAY_ZIP([1, 2], ['a', 'b', 'c', 'd'], mode => 'TRUNCATE') AS results
+
+/*----------------------*
+ | results              |
+ +----------------------+
+ | [(1, 'a'), (2, 'b')] |
+ *----------------------*/
+```
+
+<!-- mdlint off(WHITESPACE_LINE_LENGTH) -->
+
+[implicit-aliases]: https://github.com/google/zetasql/blob/master/docs/query-syntax.md#implicit_aliases
+
+<!-- mdlint on -->
+
 ### `FLATTEN`
 
 ```sql

diff --git a/docs/arrays.md b/docs/arrays.md
@@ -1233,75 +1233,7 @@ consisting of pairs of elements from input arrays, taken from their
 corresponding positions. This operation is sometimes called
 [zipping][convolution].
 
-You can zip arrays with `UNNEST` and `WITH OFFSET`. In this example, each
-value pair is stored as a `STRUCT` in an array.
-
-```sql
-WITH
-  Combinations AS (
-    SELECT
-      ['a', 'b'] AS letters,
-      [1, 2, 3] AS numbers
-  )
-SELECT
-  ARRAY(
-    SELECT AS STRUCT
-      letters[SAFE_OFFSET(index)] AS letter,
-      numbers[SAFE_OFFSET(index)] AS number
-    FROM Combinations
-    CROSS JOIN
-      UNNEST(
-        GENERATE_ARRAY(
-          0,
-          LEAST(ARRAY_LENGTH(letters), ARRAY_LENGTH(numbers)) - 1)) AS index
-    ORDER BY index
-  );
-
-/*------------------------------*
- | pairs                        |
- +------------------------------+
- | [{ letter: "a", number: 1 }, |
- |  { letter: "b", number: 2 }] |
- *------------------------------*/
-```
-
-You can use input arrays of different lengths as long as the first array
-is equal to or less than the length of the second array. The zipped array
-will be the length of the shortest input array.
-
-To get a zipped array that includes all the elements even when the input arrays
-are different lengths, change `LEAST` to `GREATEST`. Elements of either array
-that have no associated element in the other array will be paired with `NULL`.
-
-```sql
-WITH
-  Combinations AS (
-    SELECT
-      ['a', 'b'] AS letters,
-      [1, 2, 3] AS numbers
-  )
-SELECT
-  ARRAY(
-    SELECT AS STRUCT
-      letters[SAFE_OFFSET(index)] AS letter,
-      numbers[SAFE_OFFSET(index)] AS number
-    FROM Combinations
-    CROSS JOIN
-      UNNEST(
-        GENERATE_ARRAY(
-          0,
-          GREATEST(ARRAY_LENGTH(letters), ARRAY_LENGTH(numbers)) - 1)) AS index
-    ORDER BY index
-  );
-
-/*-------------------------------*
- | pairs                         |
- +-------------------------------+
- | [{ letter: "a", number: 1 },  |
- |  { letter: "b", number: 2 },  |
- |  { letter: null, number: 3 }] |
- *-------------------------------*/
-```
+You can zip arrays with the function [`ARRAY_ZIP`]([array-zip]).
 
 ## Building arrays of arrays
 

diff --git a/docs/conditional_expressions.md b/docs/conditional_expressions.md
@@ -340,7 +340,7 @@ SELECT NULLIFZERO(0) AS result
 ### `ZEROIFNULL`
 
 ```sql
-NULLIFZERO(expr)
+ZEROIFNULL(expr)
 ```
 
 **Description**

diff --git a/docs/conversion_rules.md b/docs/conversion_rules.md
@@ -145,7 +145,7 @@ literals and parameters can also be coerced. See
       <td><code>STRING</code></td>
       <td>
 
-<span><code>BOOL</code></span><br /><span><code>INT32</code></span><br /><span><code>INT64</code></span><br /><span><code>UINT32</code></span><br /><span><code>UINT64</code></span><br /><span><code>NUMERIC</code></span><br /><span><code>BIGNUMERIC</code></span><br /><span><code>FLOAT</code></span><br /><span><code>DOUBLE</code></span><br /><span><code>STRING</code></span><br /><span><code>BYTES</code></span><br /><span><code>DATE</code></span><br /><span><code>DATETIME</code></span><br /><span><code>TIME</code></span><br /><span><code>TIMESTAMP</code></span><br /><span><code>ENUM</code></span><br /><span><code>PROTO</code></span><br />
+<span><code>BOOL</code></span><br /><span><code>INT32</code></span><br /><span><code>INT64</code></span><br /><span><code>UINT32</code></span><br /><span><code>UINT64</code></span><br /><span><code>NUMERIC</code></span><br /><span><code>BIGNUMERIC</code></span><br /><span><code>FLOAT</code></span><br /><span><code>DOUBLE</code></span><br /><span><code>STRING</code></span><br /><span><code>BYTES</code></span><br /><span><code>DATE</code></span><br /><span><code>DATETIME</code></span><br /><span><code>TIME</code></span><br /><span><code>TIMESTAMP</code></span><br /><span><code>ENUM</code></span><br /><span><code>PROTO</code></span><br /><span><code>RANGE</code></span><br />
 
       </td>
       <td>&nbsp;</td>
@@ -247,6 +247,15 @@ literals and parameters can also be coerced. See
       <td><code>PROTO</code> (with the same <code>PROTO</code> name)</td>
     </tr>
 
+    <tr>
+      <td><code>RANGE</code></td>
+      <td>
+
+<span><code>RANGE</code></span><br /><span><code>STRING</code></span><br />
+</td>
+      <td>&nbsp;</td>
+    </tr>
+
   </tbody>
 </table>
 
@@ -581,6 +590,11 @@ or more supertypes, including itself, which defines its set of supertypes.
 </td>
     </tr>
 
+    <tr>
+      <td><code>RANGE</code></td>
+      <td><code>RANGE</code> with the same subtype.</td>
+    </tr>
+
   </tbody>
 </table>