Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.7.0 #302

Merged
merged 61 commits into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
0fda912
Improve pattern matching docs
prrao87 Oct 18, 2024
1cf79bc
Update rdbms.mdx
acquamarin Nov 5, 2024
bd0b2be
Update rdbms.mdx
acquamarin Nov 6, 2024
f829af1
Update httpfs.md
acquamarin Nov 6, 2024
b359b84
Update rdbms.mdx
acquamarin Nov 6, 2024
1ea6789
update with random multi copy split testing info (#253)
yiyun-sj Aug 22, 2024
c81b3e9
Added docs for multiline mode (#254)
MSebanc Aug 26, 2024
59e1dcd
Add Documentation for CSV Ignore Errors (#257)
royi-luo Sep 27, 2024
8a47fba
Updated cli highlighting docs (#275)
MSebanc Sep 29, 2024
16b9280
Update docs for copy from warning feature (#278)
royi-luo Oct 1, 2024
3b49fcf
Update cli.mdx
acquamarin Nov 7, 2024
6a8e13d
Apply suggestions from code review
prrao87 Nov 7, 2024
8e9b8fa
Update rdbms.mdx
acquamarin Nov 5, 2024
2870e51
Update docs
royi-luo Nov 6, 2024
9bf327f
remove data type rdf variant; more cleanup on rdf
ray6080 Nov 6, 2024
b89cccd
Merge pull request #295 from kuzudb/remote-duckdb
ray6080 Nov 11, 2024
9a4ec7f
add the description to auto detection
SterlingT3485 Oct 23, 2024
b7741b1
Merge pull request #294 from kuzudb/auto_detect_doc
ray6080 Nov 11, 2024
8cd0ea5
Add spill to disk setting doc
benjaminwinger Oct 3, 2024
490f21f
Merge pull request #280 from kuzudb/spill-to-disk-docs
ray6080 Nov 11, 2024
d7ee235
Merge pull request #291 from kuzudb/regexp_replace
ray6080 Nov 11, 2024
16e8005
Docs changes
WWW0030 Sep 27, 2024
0b41674
Updating docs
WWW0030 Sep 27, 2024
8f459fc
addressing comments
WWW0030 Sep 30, 2024
580f702
Updating User Table creation to use DEFAULT
WWW0030 Sep 30, 2024
adbf19c
Better default expression
WWW0030 Sep 30, 2024
2c3ddc3
Addressing comments
WWW0030 Oct 1, 2024
fcd89d5
Update src/content/docs/cypher/query-clauses/call.md
prrao87 Oct 1, 2024
46da9a0
More formatting fixes
prrao87 Oct 18, 2024
fad743b
format
ray6080 Nov 11, 2024
e3b28a3
format
ray6080 Nov 11, 2024
80aa038
Merge pull request #273 from kuzudb/howe/updating_docs
ray6080 Nov 11, 2024
75c4e30
Add semantic option to recursive pattern (#301)
ray6080 Nov 11, 2024
c5ec373
Add JSON function docs (#298)
prrao87 Nov 12, 2024
370fe1c
Adding documentation for regexp_split_to_array
WWW0030 Nov 12, 2024
bd3e594
Show how to handle decimals in Python UDF (#303)
prrao87 Nov 12, 2024
6e05f98
Update json.mdx (#304)
acquamarin Nov 12, 2024
37feb7a
Update python.mdx (#306)
acquamarin Nov 13, 2024
bdbb5e1
csv, load from, and csv configurations changes.
semihsalihoglu-uw Nov 13, 2024
9b05836
Fix groovy syntax highlight (#307)
prrao87 Nov 13, 2024
880a840
Enhance documentation by adding nowrap styling to configuration param…
mewim Nov 13, 2024
879bb3a
Fix file name (#309)
prrao87 Nov 13, 2024
fe5f1cb
change the sample_size default value from 1024 to 256
SterlingT3485 Nov 13, 2024
fa02184
change the sample_size default value from 1024 to 256
SterlingT3485 Nov 13, 2024
f403c09
fix grammar
SterlingT3485 Nov 13, 2024
a7d92f7
Update JSON examples (#310)
prrao87 Nov 13, 2024
b817cf2
Apply suggestions from code review
prrao87 Nov 14, 2024
5a22766
Update src/content/docs/cypher/configuration.md
prrao87 Nov 14, 2024
b828e77
Update src/content/docs/cypher/query-clauses/load-from.md
prrao87 Nov 14, 2024
eee22c5
Move docs on COPY FROM warnings to separate page + other changes (#308)
royi-luo Nov 14, 2024
eba92e0
Update src/content/docs/cypher/query-clauses/load-from.md
prrao87 Nov 14, 2024
b440a8b
fix some inaccurate claims regarding to csv sniffing
SterlingT3485 Nov 14, 2024
7e136fe
update docs
ray6080 Nov 15, 2024
bcae553
minor polish
ray6080 Nov 15, 2024
f613257
Update src/content/docs/extensions/json.mdx
ray6080 Nov 15, 2024
f3ece30
Fix formatting
prrao87 Nov 15, 2024
09da23c
polishing
semihsalihoglu-uw Nov 15, 2024
6b9f2e1
minor
semihsalihoglu-uw Nov 15, 2024
61a2881
fix load from example
ray6080 Nov 15, 2024
2cc1a3d
Update installation instructions and URLs to version 0.7.0
mewim Nov 15, 2024
358f81f
add warning of test file name restriction
SterlingT3485 Nov 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added public/img/cli/highlighting.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
55 changes: 53 additions & 2 deletions src/content/docs/client-apis/cli.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,10 @@ kuzu> :help
:max_width [max_width] set maximum width in characters for display
:mode [mode] set output mode (default: box)
:stats [on|off] toggle query stats on or off
:multiline set multiline mode (default)
:singleline set singleline mode
:highlight [on|off] toggle syntax highlighting on or off
:render_errors [on|off] toggle error highlighting on or off

Note: you can change and see several system configurations, such as num-threads,
timeout, and progress_bar using Cypher CALL statements.
Expand All @@ -106,6 +110,46 @@ Set output mode. The default mode is `box`. See the output modes section below f
#### `:stats [on|off]`
Toggle query statistics on or off. The default is `on`. Query statistics include the number of tuples, columns, and execution time.

#### `:multiline`
Set multiline editing mode. This is the default editing mode. In multiline editing mode, you can write queries that span multiple lines. In this mode, you are able to go back
to previous lines and edit them. Comments and newlines will stay when saved into history when using this mode.

```bash
kuzu> :multiline
Multi line mode enabled
kuzu> CREATE NODE TABLE
· Person(name STRING,
· age INT64,
‣ PRIMARY KEY (name) // a comment here too
· );
```

The `‣` symbol indicates the current line.

#### `:singleline`
Set singleline editing mode. In singleline editing mode, you can only write queries on a single line.
If your query spans multiple lines, you will not be able to go back to previous lines and edit them. Single-line comments and newlines
will not be saved into history when using this mode.

```bash
kuzu> :singleline
Single line mode enabled
kuzu> CREATE NODE TABLE
..> Person(name STRING,
..> age INT64,
..> PRIMARY KEY (name) // a comment here too
..> );
```

#### `:highlight [on|off]`
Toggle syntax highlighting on or off. The default is `on`. When enabled, the shell will highlight Cypher keywords,
constants and literals, syntax errors, and comments. Error highlighting and multiline comment highlighting are not available in singleline mode.
![](/img/cli/highlighting.png)

#### `:render_errors [on|off]`
Toggle error highlighting on or off. The default is `on`. When enabled, the shell will highlight syntax errors in red. In particular,
mismatched brackets and unclosed quotes will be highlighted. Error highlighting is not available in singleline mode.

## Interrupt shell

To interrupt a running query, use `Ctrl + C` in CLI. Note: We currently don't support interrupting a running `COPY` statement.
Expand Down Expand Up @@ -156,7 +200,7 @@ the number of pipelines that have been executed (each query is broken down into
as well as the percentage of the data processed in a pipeline. This gives an estimate for how much of a pipeline
has executed.

![](/img/progress-bar.gif)
![](/img/cli/progress-bar.gif)

The progress bar is not enabled by default. To enable the progress bar, use the following command:

Expand Down Expand Up @@ -198,4 +242,11 @@ kuzu> CREATE NODE TABLE Person (name STRING, age INT64, PRIMARY KEY(name));
```

The `:max_rows` and `:max_width` commands can be used to control the number of rows and the width
of the `box`, `column`, `table`, and `markdown` output modes.
of the `box`, `column`, `table`, and `markdown` output modes.

## Multi-line Cypher statements
The CLI supports queries written in multiple lines. If a semicolon is omitted, hitting enter will allow users to continue the query in a newline instead of executing it.
```
kuzu> MATCH (a:person)
‣ RETURN a.id;
```
55 changes: 41 additions & 14 deletions src/content/docs/client-apis/python.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -377,20 +377,6 @@ types to a Kùzu `LogicalTypeID`, which will be used to infer types via Python t
|`list`|`LIST`|
|`dict`|`MAP`|

### Nested types

When defining a UDF, you can also specify nested types, though in this case, there are some differences
from the example shown above.

If the parameter is a nested type, you must also provide the children's type information. As such, with nested types,
it's not valid to use `kuzu.Type`. Instead, a string representation of the type should be given.

- A list of `INT64` would be `"INT64[]"`
- A map from a `STRING` to a `DOUBLE` would be `"MAP(STRING, DOUBLE)"`.

Note that it's also valid to define child types through Python's type annotations, e.g. `list[int]`,
or `dict(str, float)`. It is also valid to use string representations to denote non-nested types.

## UDF

Kùzu's Python API also supports the registration of User Defined Functions (UDFs),
Expand Down Expand Up @@ -490,3 +476,44 @@ In case you want to remove the UDF, you can call the `remove_function` method on
# Use existing connection object
conn.remove_function(difference)
```

### Nested and complex types

When working with UDFs, you can also specify nested or complex types, though in this case, there are some differences
from the examples shown above. With these additional types, a string representation should be given
for the parameters which are then manually cast to the respective Kùzu type.
prrao87 marked this conversation as resolved.
Show resolved Hide resolved

Some examples of where this is relevant are listed below:

- A list of `INT64` would be `"INT64[]"`
- A map from a `STRING` to a `DOUBLE` would be `"MAP(STRING, DOUBLE)"`
- A Decimal value with 7 significant figures and 2 decimals would be `"DECIMAL(7, 2)"`

Note that it's also valid to define child types through Python's type annotations, e.g. `list[int]`,
or `dict(str, float)` for simple types.

Below, we show an example to calculate the discounted price of an item using a Python UDF.

```python
def calculate_discounted_price(price: float, has_discount: bool) -> float:
# Assume 10% discount on all items for simplicity
return float(price) * 0.9 if has_discount else price

# define the expected type of the UDF's parameters
parameters = ['DECIMAL(7, 2)', kuzu.Type.BOOL]

# define expected type of the UDF's returned value
return_type = 'DECIMAL(7, 2)'

# register the UDF
conn.create_function(
"current_price",
calculate_discounted_price,
parameters,
return_type
)
```

The second parameter is a built-in native type in Kùzu, i.e., `kuzu.Type.BOOL`. For the first parameter,
we need to specify a string, i.e. `"DECIMAL(7,2)"` that's then parsed and used by the binder in Kùzu
to map to the internal Decimal representation.
38 changes: 26 additions & 12 deletions src/content/docs/cypher/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,19 @@ statement, described in this section. Different from [the `CALL` clause](/cypher
configuration **cannot** be used with other query clauses, such as `RETURN`.

### Connection configuration
| Option | Description | Default |
| ----------- |--------------------------------------------------------------------------------|------------------------|
| `THREADS` | number of threads used by execution | system maximum threads |
| `TIMEOUT` | timeout of query execution in ms | N/A |
| `VAR_LENGTH_EXTEND_MAX_DEPTH` | maximum depth of recursive extend | 30 |
| `ENABLE_SEMI_MASK` | enables the semi mask optimization | true |
| `HOME_DIRECTORY`| system home directory | user home directory |
| `FILE_SEARCH_PATH`| file search path | N/A |
| `PROGRESS_BAR` | enable progress bar in CLI | false |
| `PROGRESS_BAR_TIME` | show progress bar after time in ms | 1000 |
| `CHECKPOINT_THRESHOLD` | the WAL size threshold in bytes at which to automatically trigger a checkpoint | 16777216 (16MB) |
| Option | Description | Default |
| ----------- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|
| `THREADS` | number of threads used by execution | system maximum threads |
| `TIMEOUT` | timeout of query execution in ms | N/A |
| `VAR_LENGTH_EXTEND_MAX_DEPTH` | maximum depth of recursive extend | 30 |
| `ENABLE_SEMI_MASK` | enables the semi mask optimization | true |
| `HOME_DIRECTORY`| system home directory | user home directory |
| `FILE_SEARCH_PATH`| file search path | N/A |
| `PROGRESS_BAR` | enable progress bar in CLI | false |
| `PROGRESS_BAR_TIME` | show progress bar after time in ms | 1000 |
| `CHECKPOINT_THRESHOLD` | the WAL size threshold in bytes at which to automatically trigger a checkpoint | 16777216 (16MB) |
| `WARNING_LIMIT` | Maximum number of warnings that can be stored in a single connection. Currently only the warnings related to [malformed CSV lines](/import/csv#ignoring-erroneous-rows) are stored if `ignore_errors` parameter is set to true in `COPY FROM` and `LOAD FROM` statements. | 8192 |
prrao87 marked this conversation as resolved.
Show resolved Hide resolved
| `SPILL_TO_DISK_TMP_FILE` | The location of the temporary file to use to store data if there is not enough memory during a copy | `copy.tmp` inside the database directory |

### Database configuration
| Option | Description | Default |
Expand Down Expand Up @@ -67,4 +69,16 @@ CALL progress_bar=true;
#### Configure checkpoint threshold
```cypher
CALL checkpoint_threshold=16777216;
```
```

#### Configure warning limit
```cypher
CALL warning_limit=1024;
```

#### Configure Spill to disk temporary file
```cypher
CALL spill_to_disk_tmp_file="/path/to/tmp/file";
# Disables spilling to disk
CALL spill_to_disk_tmp_file="";
```
10 changes: 8 additions & 2 deletions src/content/docs/cypher/data-definition/create-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,13 @@ To create a node table, use the `CREATE NODE TABLE` statement as shown below:
```sql
CREATE NODE TABLE User (name STRING, age INT64 DEFAULT 0, reg_date DATE, PRIMARY KEY (name))
```
The above statement adds a `User` table to the catalog of the system with three properties: `name`, `age`, and `reg_date`,

Alternatively, you can specify the keyword `PRIMARY KEY` immediately after the column name, as follows:
```sql
CREATE NODE TABLE User (name STRING PRIMARY KEY, age INT64 DEFAULT 0, reg_date DATE)
```

The above statements adds a `User` table to the catalog of the system with three properties: `name`, `age`, and `reg_date`,
with the primary key being set to the `name` property in this case.

The name of the node table, `User`, specified above will serve as the "label" which we want to query
Expand All @@ -49,7 +55,7 @@ MATCH (a:User) RETURN *

### Primary key

Kùzu requires a primary key column for node table which can be either a `STRING` or `INT64` property of the node. Kùzu will generate an index to do quick lookups on the primary key (e.g., `name` in the above example). Alternatively, you can use the [`SERIAL`](/cypher/data-types/#serial) data type to generate an auto-increment column as primary key.
Kùzu requires a primary key column for node table which can be either a `STRING`, numeric, `DATE`, or `BLOB` property of the node. Kùzu will generate an index to do quick lookups on the primary key (e.g., `name` in the above example). Alternatively, you can use the [`SERIAL`](/cypher/data-types/#serial) data type to generate an auto-increment column as primary key.

### Default value

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ are shown below.
### User nodes
Schema:
```cypher
CREATE NODE TABLE User(name STRING, age INT64, PRIMARY KEY (name))
CREATE NODE TABLE User(name STRING, age INT64 DEFAULT 0, PRIMARY KEY (name))
```

user.csv:
Expand Down
Loading