Skip to content

Commit

Permalink
docs: Add links checker (apache#9965)
Browse files Browse the repository at this point in the history
* docs: Add links checker

* Comments

* Fix broken paths

* Fix moar links

* Last few
  • Loading branch information
Fokko authored Mar 23, 2024
1 parent 5d58750 commit 33838d5
Show file tree
Hide file tree
Showing 26 changed files with 172 additions and 94 deletions.
40 changes: 40 additions & 0 deletions .github/workflows/docs-check-links.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: Check Markdown docs links

on:
push:
paths:
- docs/**
- site/**
branches:
- 'main'
pull_request:
workflow_dispatch:

jobs:
markdown-link-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: gaurav-nelson/github-action-markdown-link-check@v1
with:
config-file: 'site/link-checker-config.json'
use-verbose-mode: yes
7 changes: 2 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
- under the License.
-->

![Iceberg](https://iceberg.apache.org/docs/latest/img/Iceberg-logo.png)
![Iceberg](https://iceberg.apache.org/assets/images/Iceberg-logo.svg)

[![](https://github.com/apache/iceberg/actions/workflows/java-ci.yml/badge.svg)](https://github.com/apache/iceberg/actions/workflows/java-ci.yml)
[![Slack](https://img.shields.io/badge/chat-on%20Slack-brightgreen.svg)](https://apache-iceberg.slack.com/)
Expand All @@ -37,11 +37,8 @@ The core Java library is located in this repository and is the reference impleme

[Documentation][iceberg-docs] is available for all libraries and integrations.

Current work is tracked in the [roadmap][roadmap].

[iceberg-docs]: https://iceberg.apache.org/docs/latest/
[iceberg-spec]: https://iceberg.apache.org/spec
[roadmap]: https://iceberg.apache.org/roadmap/
[iceberg-spec]: https://iceberg.apache.org/spec/

## Collaboration

Expand Down
8 changes: 4 additions & 4 deletions docs/docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,9 @@ Iceberg tables support table properties to configure table behavior, like the de
Reserved table properties are only used to control behaviors when creating or updating a table.
The value of these properties are not persisted as a part of the table metadata.

| Property | Default | Description |
| -------------- | -------- | ------------------------------------------------------------- |
| format-version | 2 | Table's format version (can be 1 or 2) as defined in the [Spec](../../../spec/#format-versioning). Defaults to 2 since version 1.4.0. |
| Property | Default | Description |
| -------------- | -------- |--------------------------------------------------------------------------------------------------------------------------------------|
| format-version | 2 | Table's format version (can be 1 or 2) as defined in the [Spec](../../spec.md#format-versioning). Defaults to 2 since version 1.4.0. |

### Compatibility flags

Expand All @@ -131,7 +131,7 @@ Iceberg catalogs support using catalog properties to configure catalog behaviors
| clients | 2 | client pool size |
| cache-enabled | true | Whether to cache catalog entries |
| cache.expiration-interval-ms | 30000 | How long catalog entries are locally cached, in milliseconds; 0 disables caching, negative values disable expiration |
| metrics-reporter-impl | org.apache.iceberg.metrics.LoggingMetricsReporter | Custom `MetricsReporter` implementation to use in a catalog. See the [Metrics reporting](../metrics-reporting.md) section for additional details |
| metrics-reporter-impl | org.apache.iceberg.metrics.LoggingMetricsReporter | Custom `MetricsReporter` implementation to use in a catalog. See the [Metrics reporting](metrics-reporting.md) section for additional details |

`HadoopCatalog` and `HiveCatalog` can access the properties in their constructors.
Any other custom catalog can access the properties by implementing `Catalog.initialize(catalogName, catalogProperties)`.
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/daft.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ title: "Daft"

# Daft

[Daft](www.getdaft.io) is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry.
[Daft](https://www.getdaft.io/) is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry.

It exposes its flavor of the familiar [Python DataFrame API](https://www.getdaft.io/projects/docs/en/latest/api_docs/dataframe.html) which is a common abstraction over querying tables of data in the Python data ecosystem.

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/flink-actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ title: "Flink Actions"

## Rewrite files action

Iceberg provides API to rewrite small files into large files by submitting Flink batch jobs. The behavior of this Flink action is the same as Spark's [rewriteDataFiles](../maintenance.md#compact-data-files).
Iceberg provides API to rewrite small files into large files by submitting Flink batch jobs. The behavior of this Flink action is the same as Spark's [rewriteDataFiles](maintenance.md#compact-data-files).

```java
import org.apache.iceberg.flink.actions.Actions;
Expand Down
6 changes: 3 additions & 3 deletions docs/docs/flink-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@ To create the table in Flink SQL by using SQL syntax `CREATE TABLE test (..) WIT
* `connector`: Use the constant `iceberg`.
* `catalog-name`: User-specified catalog name. It's required because the connector don't have any default value.
* `catalog-type`: `hive` or `hadoop` for built-in catalogs (defaults to `hive`), or left unset for custom catalog implementations using `catalog-impl`.
* `catalog-impl`: The fully-qualified class name of a custom catalog implementation. Must be set if `catalog-type` is unset. See also [custom catalog](../flink.md#adding-catalogs) for more details.
* `catalog-impl`: The fully-qualified class name of a custom catalog implementation. Must be set if `catalog-type` is unset. See also [custom catalog](flink.md#adding-catalogs) for more details.
* `catalog-database`: The iceberg database name in the backend catalog, use the current flink database name by default.
* `catalog-table`: The iceberg table name in the backend catalog. Default to use the table name in the flink `CREATE TABLE` sentence.

## Table managed in Hive catalog.

Before executing the following SQL, please make sure you've configured the Flink SQL client correctly according to the [quick start documentation](../flink.md).
Before executing the following SQL, please make sure you've configured the Flink SQL client correctly according to the [quick start documentation](flink.md).

The following SQL will create a Flink table in the current Flink catalog, which maps to the iceberg table `default_database.flink_table` managed in iceberg catalog.

Expand Down Expand Up @@ -138,4 +138,4 @@ SELECT * FROM flink_table;
3 rows in set
```

For more details, please refer to the Iceberg [Flink documentation](../flink.md).
For more details, please refer to the Iceberg [Flink documentation](flink.md).
2 changes: 1 addition & 1 deletion docs/docs/flink-ddl.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ Table create commands support the commonly used [Flink create clauses](https://n

* `PARTITION BY (column1, column2, ...)` to configure partitioning, Flink does not yet support hidden partitioning.
* `COMMENT 'table document'` to set a table description.
* `WITH ('key'='value', ...)` to set [table configuration](../configuration.md) which will be stored in Iceberg table properties.
* `WITH ('key'='value', ...)` to set [table configuration](configuration.md) which will be stored in Iceberg table properties.

Currently, it does not support computed column and watermark definition etc.

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/flink-queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ SET table.exec.iceberg.use-flip27-source = true;

### Reading branches and tags with SQL
Branch and tags can be read via SQL by specifying options. For more details
refer to [Flink Configuration](../flink-configuration.md#read-options)
refer to [Flink Configuration](flink-configuration.md#read-options)

```sql
--- Read from branch b1
Expand Down
10 changes: 5 additions & 5 deletions docs/docs/flink-writes.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Iceberg supports `UPSERT` based on the primary key when writing data into v2 tab
) with ('format-version'='2', 'write.upsert.enabled'='true');
```

2. Enabling `UPSERT` mode using `upsert-enabled` in the [write options](#write-options) provides more flexibility than a table level config. Note that you still need to use v2 table format and specify the [primary key](../flink-ddl.md/#primary-key) or [identifier fields](../../spec.md#identifier-field-ids) when creating the table.
2. Enabling `UPSERT` mode using `upsert-enabled` in the [write options](#write-options) provides more flexibility than a table level config. Note that you still need to use v2 table format and specify the [primary key](flink-ddl.md/#primary-key) or [identifier fields](../../spec.md#identifier-field-ids) when creating the table.

```sql
INSERT INTO tableName /*+ OPTIONS('upsert-enabled'='true') */
Expand Down Expand Up @@ -185,7 +185,7 @@ FlinkSink.builderFor(

### Branch Writes
Writing to branches in Iceberg tables is also supported via the `toBranch` API in `FlinkSink`
For more information on branches please refer to [branches](../branching.md).
For more information on branches please refer to [branches](branching.md).
```java
FlinkSink.forRowData(input)
.tableLoader(tableLoader)
Expand Down Expand Up @@ -262,13 +262,13 @@ INSERT INTO tableName /*+ OPTIONS('upsert-enabled'='true') */
...
```

Check out all the options here: [write-options](../flink-configuration.md#write-options)
Check out all the options here: [write-options](flink-configuration.md#write-options)

## Notes

Flink streaming write jobs rely on snapshot summary to keep the last committed checkpoint ID, and
store uncommitted data as temporary files. Therefore, [expiring snapshots](../maintenance.md#expire-snapshots)
and [deleting orphan files](../maintenance.md#delete-orphan-files) could possibly corrupt
store uncommitted data as temporary files. Therefore, [expiring snapshots](maintenance.md#expire-snapshots)
and [deleting orphan files](maintenance.md#delete-orphan-files) could possibly corrupt
the state of the Flink job. To avoid that, make sure to keep the last snapshot created by the Flink
job (which can be identified by the `flink.job-id` property in the summary), and only delete
orphan files that are old enough.
35 changes: 18 additions & 17 deletions docs/docs/flink.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,22 +22,22 @@ title: "Flink Getting Started"

Apache Iceberg supports both [Apache Flink](https://flink.apache.org/)'s DataStream API and Table API. See the [Multi-Engine Support](../../multi-engine-support.md#apache-flink) page for the integration of Apache Flink.

| Feature support | Flink | Notes |
| ----------------------------------------------------------- |-------|----------------------------------------------------------------------------------------|
| [SQL create catalog](../flink-ddl.md#create-catalog) | ✔️ | |
| [SQL create database](../flink-ddl.md#create-database) | ✔️ | |
| [SQL create table](../flink-ddl.md#create-table) | ✔️ | |
| [SQL create table like](../flink-ddl.md#create-table-like) | ✔️ | |
| [SQL alter table](../flink-ddl.md#alter-table) | ✔️ | Only support altering table properties, column and partition changes are not supported |
| [SQL drop_table](../flink-ddl.md#drop-table) | ✔️ | |
| [SQL select](../flink-queries.md#reading-with-sql) | ✔️ | Support both streaming and batch mode |
| [SQL insert into](../flink-writes.md#insert-into) | ✔️ ️ | Support both streaming and batch mode |
| [SQL insert overwrite](../flink-writes.md#insert-overwrite) | ✔️ ️ | |
| [DataStream read](../flink-queries.md#reading-with-datastream) | ✔️ ️ | |
| [DataStream append](../flink-writes.md#appending-data) | ✔️ ️ | |
| [DataStream overwrite](../flink-writes.md#overwrite-data) | ✔️ ️ | |
| [Metadata tables](../flink-queries.md#inspecting-tables) | ✔️ | |
| [Rewrite files action](../flink-actions.md#rewrite-files-action) | ✔️ ️ | |
| Feature support | Flink | Notes |
| -------------------------------------------------------- |-------|----------------------------------------------------------------------------------------|
| [SQL create catalog](flink-ddl.md#create-catalog) | ✔️ | |
| [SQL create database](flink-ddl.md#create-database) | ✔️ | |
| [SQL create table](flink-ddl.md#create-table) | ✔️ | |
| [SQL create table like](flink-ddl.md#create-table-like) | ✔️ | |
| [SQL alter table](flink-ddl.md#alter-table) | ✔️ | Only support altering table properties, column and partition changes are not supported |
| [SQL drop_table](flink-ddl.md#drop-table) | ✔️ | |
| [SQL select](flink-queries.md#reading-with-sql) | ✔️ | Support both streaming and batch mode |
| [SQL insert into](flink-writes.md#insert-into) | ✔️ ️ | Support both streaming and batch mode |
| [SQL insert overwrite](flink-writes.md#insert-overwrite) | ✔️ ️ | |
| [DataStream read](flink-queries.md#reading-with-datastream) | ✔️ ️ | |
| [DataStream append](flink-writes.md#appending-data) | ✔️ ️ | |
| [DataStream overwrite](flink-writes.md#overwrite-data) | ✔️ ️ | |
| [Metadata tables](flink-queries.md#inspecting-tables) | ✔️ | |
| [Rewrite files action](flink-actions.md#rewrite-files-action) | ✔️ ️ | |

## Preparation when using Flink SQL Client

Expand Down Expand Up @@ -69,6 +69,7 @@ export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
./bin/start-cluster.sh
```

<!-- markdown-link-check-disable-next-line -->
Start the Flink SQL client. There is a separate `flink-runtime` module in the Iceberg project to generate a bundled jar, which could be loaded by Flink SQL client directly. To build the `flink-runtime` bundled jar manually, build the `iceberg` project, and it will generate the jar under `<iceberg-root-dir>/flink-runtime/build/libs`. Or download the `flink-runtime` jar from the [Apache repository](https://repo.maven.apache.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.16/{{ icebergVersion }}/).

```bash
Expand Down Expand Up @@ -271,7 +272,7 @@ env.execute("Test Iceberg DataStream");

### Branch Writes
Writing to branches in Iceberg tables is also supported via the `toBranch` API in `FlinkSink`
For more information on branches please refer to [branches](../branching.md).
For more information on branches please refer to [branches](branching.md).
```java
FlinkSink.forRowData(input)
.tableLoader(tableLoader)
Expand Down
Loading

0 comments on commit 33838d5

Please sign in to comment.