Skip to content

Commit

Permalink
[docs] [TA] Added TA-23476: YCQL currenttimestamp() precision (#23642)
Browse files Browse the repository at this point in the history
* added new TA

* Apply suggestions from code review

Co-authored-by: Dwight Hodge <[email protected]>

* added missed query

---------

Co-authored-by: Dwight Hodge <[email protected]>
  • Loading branch information
aishwarya24 and ddhodge authored Aug 28, 2024
1 parent 63f1d65 commit 1ad4795
Show file tree
Hide file tree
Showing 2 changed files with 184 additions and 0 deletions.
6 changes: 6 additions & 0 deletions docs/content/preview/releases/techadvisories/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ It is strongly recommended that you take appropriate measures as outlined in the
{{%table%}}
| Advisory&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Synopsis | Product | Affected Versions | Date |
| :------------------------------- | :------- | :-----: | :---------------: | :--- |
| {{<ta 23476>}}
|YCQL currenttimestamp() precision
| {{<product "ycql">}}
| {{<release "All">}}
| {{<nobreak "01 Januaury 2024">}}
|
| {{<ta 22935>}}
|Potential issues with server-side sequence caching in multi-database clusters
| {{<product "ysql">}}
Expand Down
178 changes: 178 additions & 0 deletions docs/content/preview/releases/techadvisories/ta-23476.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
---
title: TA-23476
headerTitle: YCQL currenttimestamp() precision
headcontent: 01 Jan 2024
type: docs
showRightNav: true
cascade:
unversioned: true
menu:
preview_releases:
identifier: ta-23476
weight: 1
rightNav:
hideH2: true
type: docs
---

| Product | Affected Versions | Related Issues | Fixed In |
| :------------------------- | :------------------ | :---------------- | :------- |
| {{<product "ycql">}} | {{<release "All">}} | {{<issue 11052>}}, {{<issue 23476>}} | TBD |

## Description

In YCQL, timestamp data inserted with the `currenttimestamp()` function may not be returned for queries with an equality condition on the timestamp. If the timestamp is part of the primary key, this also affects index scans as the table lookup for an index scan internally uses an equality condition to look up the corresponding table rows. The data can still be retrieved as expected by primary key or index-only scan queries with no condition or inequality conditions on the timestamp column.

The root cause is that in YCQL, the timestamp type has millisecond precision, to align with vanilla Cassandra, ensuring compatibility with Cassandra drivers and tools. However, YCQL can internally store data with microsecond precision, adjusting the precision to milliseconds when data is sent or received through the YCQL API. The discrepancy between the wire protocol's millisecond precision and the storage system's microsecond precision can lead to unexpected behavior.

## Mitigation

There are two key aspects to mitigating the issues caused by the discrepancy between the YCQL wire protocol's millisecond precision and the storage system's microsecond precision:

1. Ensure new writes use the correct precision.

Use totimestamp(now()) instead of currenttimestamp() to generate millisecond-precision timestamp values that are compatible with the wire protocol.

1. Adjust existing data to millisecond precision.

If you have already inserted data with non-zero microsecond precision, contact {{% support-general %}} for help with running a script to read and re-insert the affected rows with millisecond precision.

Additionally, if millisecond precision is not sufficient for your use case, there are two options:

1. Switch from timestamp to timeuuid data type.

Timeuuid uses 100-nanosecond precision and includes a node-identifier component, ensuring values generated on different nodes can never collide.

1. If millisecond-precision collisions are expected but rare, use IF NOT EXISTS.

Use IF NOT EXISTS on the data insert path and handle the existing row case in the application by retrying the insert. However, if collisions are common, this approach may result in too many retries.

## Details

The YCQL wire protocol uses millisecond precision for timestamps, while the YCQL storage system uses microsecond precision internally. This means that all timestamps that can be inserted are expected to have `000` for the microseconds component.

However, the `currenttimestamp()` function currently generates values with microsecond precision, so the microseconds component is likely to be non-zero 99.9% of the time.

When these non-zero microsecond timestamp values are stored, the microsecond information will be lost when the precision is adjusted to milliseconds for the wire protocol (for example, when responding to a user query). As a result, you will always see timestamp values with only millisecond precision.

This can create issues in certain scenarios as follows:

- Equality conditions: If you query for a timestamp value that was just read, the condition will not match, as the microsecond component will differ (0 for the query, non-zero for the stored value).

- Index scans: If the primary key of the main table includes an affected timestamp column, the data in both the table and the index will be in microsecond precision, so they will match. However, when performing an index scan, the main-table read request uses the same protocol as the YCQL API, which adjusts the precision to milliseconds. This may cause the main-table read request to not match, and the index scan result to miss the affected rows with non-zero microsecond values.

## Examples

### Primary-key scan with equality condition

1. Set up the keyspace and table as follows:

```cql
CREATE KEYSPACE temp;
USE temp;
CREATE TABLE test(id int, ts timestamp, v1 int, v2 int, primary key( id, ts)) WITH transactions = {'enabled': 'true'};
```

1. Insert a row using `currenttimestamp()` as follows:

```cql
INSERT INTO test(id, ts, v1, v2) VALUES (1, currenttimestamp(), 5, 10);
SELECT * FROM test;
```

```output
id | ts | v1 | v2
----+---------------------------------+----+----
1 | 2024-08-21 17:44:47.647000+0000 | 5 | 10
```

As expected, the row is present. However, most likely the internal row has a non-zero microsecond component, just that due to the YCQL API millisecond standard, it will always return `000` for the microsecond part.

1. Select the row using WHERE condition as follows:

```cql
SELECT * FROM test WHERE id = 1 AND ts = '2024-08-21 17:44:47.647000+0000';
```

```output
id | ts | v1 | v2
----+----+----+----
```

```cql
SELECT * FROM test WHERE id = 1 AND ts >= '2024-08-21 17:44:47.647000+0000' and ts < '2024-08-21 17:44:47.648000+0000';
```

```output
id | ts | v1 | v2
----+---------------------------------+----+----
1 | 2024-08-21 17:44:47.647000+0000 | 5 | 10
```

A simple query with equality on the same (id and) timestamp value returned before does not match. However, changing this to a range query search for all timestamps within that millisecond does return the row, confirming that the row is there but the microsecond value is non-zero. Note that for the second query the upper bound is the next millisecond value (648 rather than 647) so it will match everything in [647000, 647999] range from the microsecond perspective.

### Index scan case

1. Set up the keyspace and table as follows:

```cql
CREATE KEYSPACE temp;
USE temp;
CREATE TABLE test(id int, ts timestamp, v1 int, v2 int, primary key( id, ts)) WITH transactions = {'enabled': 'true'};
CREATE INDEX ON test(v1);
```

This example includes the same keyspace and table as the preceding primary-key example, and adds an index.
(To continue the preceding example, just create the index and skip steps 1 and 2.)

1. Insert a row using `currenttimestamp()` as follows:

```cql
INSERT INTO test(id, ts, v1, v2) VALUES (1, currenttimestamp(), 5, 10);
SELECT * FROM test;
```

```output
id | ts | v1 | v2
----+---------------------------------+----+----
1 | 2024-08-21 17:44:47.647000+0000 | 5 | 10
```

A row is present, but likely with a non-zero microsecond component internally.

1. Select the row using WHERE condition as follows:

```cql
ycqlsh:temp> SELECT * FROM test WHERE id = 1;
```

```output
id | ts | v1 | v2
----+---------------------------------+----+----
1 | 2024-08-21 17:44:47.647000+0000 | 5 | 10
```

```cql
ycqlsh:temp> SELECT * FROM test WHERE v1 = 5;
```

```output
id | ts | v1 | v2
----+----+----+----
```

```cql
ycqlsh:temp> SELECT id, ts FROM test WHERE v1 = 5;
```

```output
id | ts
----+---------------------------------
1 | 2024-08-21 17:44:47.647000+0000
```

This shows that the row exists and can be retrieved with both a primary key scan on id (first query) and an index-only scan on v1 (third query). The second query (unlike the third) selects all columns including v2, which is not in the index, which means it has to be an index scan; that is, query the table row to read all the target columns. As a result, it hits the index scan issue and misses returning the affected row(s).

## Related Articles

- [Date and time functions [YCQL]](../../../api/ycql/function_datetime/#currentdate-currenttime-and-currenttimestamp)

0 comments on commit 1ad4795

Please sign in to comment.