docs: Add links checker (apache#9965)

* docs: Add links checker * Comments * Fix broken paths * Fix moar links * Last few
sasankpagolu · Mar 23, 2024 · 33838d5 · 33838d5
1 parent 5d58750
commit 33838d5
Show file tree

Hide file tree

Showing 26 changed files with 172 additions and 94 deletions.
diff --git a/.github/workflows/docs-check-links.yml b/.github/workflows/docs-check-links.yml
@@ -0,0 +1,40 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: Check Markdown docs links
+
+on:
+  push:
+    paths:
+      - docs/**
+      - site/**
+    branches:
+      - 'main'
+  pull_request:
+  workflow_dispatch:
+
+jobs:
+  markdown-link-check:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: gaurav-nelson/github-action-markdown-link-check@v1
+        with:
+          config-file: 'site/link-checker-config.json'
+          use-verbose-mode: yes
diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@
   - under the License.
   -->
 
-![Iceberg](https://iceberg.apache.org/docs/latest/img/Iceberg-logo.png)
+![Iceberg](https://iceberg.apache.org/assets/images/Iceberg-logo.svg)
 
 [![](https://github.com/apache/iceberg/actions/workflows/java-ci.yml/badge.svg)](https://github.com/apache/iceberg/actions/workflows/java-ci.yml)
 [![Slack](https://img.shields.io/badge/chat-on%20Slack-brightgreen.svg)](https://apache-iceberg.slack.com/)
@@ -37,11 +37,8 @@ The core Java library is located in this repository and is the reference impleme
 
 [Documentation][iceberg-docs] is available for all libraries and integrations.
 
-Current work is tracked in the [roadmap][roadmap].
-
 [iceberg-docs]: https://iceberg.apache.org/docs/latest/
-[iceberg-spec]: https://iceberg.apache.org/spec
-[roadmap]: https://iceberg.apache.org/roadmap/
+[iceberg-spec]: https://iceberg.apache.org/spec/
 
 ## Collaboration
 

diff --git a/docs/docs/configuration.md b/docs/docs/configuration.md
@@ -108,9 +108,9 @@ Iceberg tables support table properties to configure table behavior, like the de
 Reserved table properties are only used to control behaviors when creating or updating a table.
 The value of these properties are not persisted as a part of the table metadata.
 
-| Property       | Default  | Description                                                   |
-| -------------- | -------- | ------------------------------------------------------------- |
-| format-version | 2        | Table's format version (can be 1 or 2) as defined in the [Spec](../../../spec/#format-versioning). Defaults to 2 since version 1.4.0. |
+| Property       | Default  | Description                                                                                                                          |
+| -------------- | -------- |--------------------------------------------------------------------------------------------------------------------------------------|
+| format-version | 2        | Table's format version (can be 1 or 2) as defined in the [Spec](../../spec.md#format-versioning). Defaults to 2 since version 1.4.0. |
 
 ### Compatibility flags
 
@@ -131,7 +131,7 @@ Iceberg catalogs support using catalog properties to configure catalog behaviors
 | clients                           | 2                  | client pool size                                       |
 | cache-enabled                     | true               | Whether to cache catalog entries |
 | cache.expiration-interval-ms      | 30000              | How long catalog entries are locally cached, in milliseconds; 0 disables caching, negative values disable expiration |
-| metrics-reporter-impl | org.apache.iceberg.metrics.LoggingMetricsReporter | Custom `MetricsReporter` implementation to use in a catalog. See the [Metrics reporting](../metrics-reporting.md) section for additional details |
+| metrics-reporter-impl | org.apache.iceberg.metrics.LoggingMetricsReporter | Custom `MetricsReporter` implementation to use in a catalog. See the [Metrics reporting](metrics-reporting.md) section for additional details |
 
 `HadoopCatalog` and `HiveCatalog` can access the properties in their constructors.
 Any other custom catalog can access the properties by implementing `Catalog.initialize(catalogName, catalogProperties)`.

diff --git a/docs/docs/daft.md b/docs/docs/daft.md
@@ -20,7 +20,7 @@ title: "Daft"
 
 # Daft
 
-[Daft](www.getdaft.io) is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry.
+[Daft](https://www.getdaft.io/) is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry.
 
 It exposes its flavor of the familiar [Python DataFrame API](https://www.getdaft.io/projects/docs/en/latest/api_docs/dataframe.html) which is a common abstraction over querying tables of data in the Python data ecosystem.
 

diff --git a/docs/docs/flink-actions.md b/docs/docs/flink-actions.md
@@ -20,7 +20,7 @@ title: "Flink Actions"
 
 ## Rewrite files action
 
-Iceberg provides API to rewrite small files into large files by submitting Flink batch jobs. The behavior of this Flink action is the same as Spark's [rewriteDataFiles](../maintenance.md#compact-data-files).
+Iceberg provides API to rewrite small files into large files by submitting Flink batch jobs. The behavior of this Flink action is the same as Spark's [rewriteDataFiles](maintenance.md#compact-data-files).
 
 ```java
 import org.apache.iceberg.flink.actions.Actions;

diff --git a/docs/docs/flink-connector.md b/docs/docs/flink-connector.md
@@ -29,13 +29,13 @@ To create the table in Flink SQL by using SQL syntax `CREATE TABLE test (..) WIT
 * `connector`: Use the constant `iceberg`.
 * `catalog-name`: User-specified catalog name. It's required because the connector don't have any default value.
 * `catalog-type`: `hive` or `hadoop` for built-in catalogs (defaults to `hive`), or left unset for custom catalog implementations using `catalog-impl`.
-* `catalog-impl`: The fully-qualified class name of a custom catalog implementation. Must be set if `catalog-type` is unset. See also [custom catalog](../flink.md#adding-catalogs) for more details.
+* `catalog-impl`: The fully-qualified class name of a custom catalog implementation. Must be set if `catalog-type` is unset. See also [custom catalog](flink.md#adding-catalogs) for more details.
 * `catalog-database`: The iceberg database name in the backend catalog, use the current flink database name by default.
 * `catalog-table`: The iceberg table name in the backend catalog. Default to use the table name in the flink `CREATE TABLE` sentence.
 
 ## Table managed in Hive catalog.
 
-Before executing the following SQL, please make sure you've configured the Flink SQL client correctly according to the [quick start documentation](../flink.md).
+Before executing the following SQL, please make sure you've configured the Flink SQL client correctly according to the [quick start documentation](flink.md).
 
 The following SQL will create a Flink table in the current Flink catalog, which maps to the iceberg table `default_database.flink_table` managed in iceberg catalog.
 
@@ -138,4 +138,4 @@ SELECT * FROM flink_table;
 3 rows in set
 ```
 
-For more details, please refer to the Iceberg [Flink documentation](../flink.md).
+For more details, please refer to the Iceberg [Flink documentation](flink.md).
diff --git a/docs/docs/flink-ddl.md b/docs/docs/flink-ddl.md
@@ -150,7 +150,7 @@ Table create commands support the commonly used [Flink create clauses](https://n
 
 * `PARTITION BY (column1, column2, ...)` to configure partitioning, Flink does not yet support hidden partitioning.
 * `COMMENT 'table document'` to set a table description.
-* `WITH ('key'='value', ...)` to set [table configuration](../configuration.md) which will be stored in Iceberg table properties.
+* `WITH ('key'='value', ...)` to set [table configuration](configuration.md) which will be stored in Iceberg table properties.
 
 Currently, it does not support computed column and watermark definition etc.
 

diff --git a/docs/docs/flink-queries.md b/docs/docs/flink-queries.md
@@ -75,7 +75,7 @@ SET table.exec.iceberg.use-flip27-source = true;
 
 ### Reading branches and tags with SQL
 Branch and tags can be read via SQL by specifying options. For more details
-refer to [Flink Configuration](../flink-configuration.md#read-options)
+refer to [Flink Configuration](flink-configuration.md#read-options)
 
 ```sql
 --- Read from branch b1

diff --git a/docs/docs/flink-writes.md b/docs/docs/flink-writes.md
@@ -67,7 +67,7 @@ Iceberg supports `UPSERT` based on the primary key when writing data into v2 tab
     ) with ('format-version'='2', 'write.upsert.enabled'='true');
     ```
 
-2. Enabling `UPSERT` mode using `upsert-enabled` in the [write options](#write-options) provides more flexibility than a table level config. Note that you still need to use v2 table format and specify the [primary key](../flink-ddl.md/#primary-key) or [identifier fields](../../spec.md#identifier-field-ids) when creating the table.
+2. Enabling `UPSERT` mode using `upsert-enabled` in the [write options](#write-options) provides more flexibility than a table level config. Note that you still need to use v2 table format and specify the [primary key](flink-ddl.md/#primary-key) or [identifier fields](../../spec.md#identifier-field-ids) when creating the table.
 
     ```sql
     INSERT INTO tableName /*+ OPTIONS('upsert-enabled'='true') */
@@ -185,7 +185,7 @@ FlinkSink.builderFor(
 
 ### Branch Writes
 Writing to branches in Iceberg tables is also supported via the `toBranch` API in `FlinkSink`
-For more information on branches please refer to [branches](../branching.md).
+For more information on branches please refer to [branches](branching.md).
 ```java
 FlinkSink.forRowData(input)
     .tableLoader(tableLoader)
@@ -262,13 +262,13 @@ INSERT INTO tableName /*+ OPTIONS('upsert-enabled'='true') */
 ...
 ```
 
-Check out all the options here: [write-options](../flink-configuration.md#write-options) 
+Check out all the options here: [write-options](flink-configuration.md#write-options) 
 
 ## Notes
 
 Flink streaming write jobs rely on snapshot summary to keep the last committed checkpoint ID, and
-store uncommitted data as temporary files. Therefore, [expiring snapshots](../maintenance.md#expire-snapshots)
-and [deleting orphan files](../maintenance.md#delete-orphan-files) could possibly corrupt
+store uncommitted data as temporary files. Therefore, [expiring snapshots](maintenance.md#expire-snapshots)
+and [deleting orphan files](maintenance.md#delete-orphan-files) could possibly corrupt
 the state of the Flink job. To avoid that, make sure to keep the last snapshot created by the Flink
 job (which can be identified by the `flink.job-id` property in the summary), and only delete
 orphan files that are old enough.
diff --git a/docs/docs/flink.md b/docs/docs/flink.md
@@ -22,22 +22,22 @@ title: "Flink Getting Started"
 
 Apache Iceberg supports both [Apache Flink](https://flink.apache.org/)'s DataStream API and Table API. See the [Multi-Engine Support](../../multi-engine-support.md#apache-flink) page for the integration of Apache Flink.
 
-| Feature support                                             | Flink | Notes                                                                                  |
-| ----------------------------------------------------------- |-------|----------------------------------------------------------------------------------------|
-| [SQL create catalog](../flink-ddl.md#create-catalog) | ✔️    |                                                                                        |
-| [SQL create database](../flink-ddl.md#create-database) | ✔️    |                                                                                        |
-| [SQL create table](../flink-ddl.md#create-table)                           | ✔️    |                                                                                        |
-| [SQL create table like](../flink-ddl.md#create-table-like)                 | ✔️    |                                                                                        |
-| [SQL alter table](../flink-ddl.md#alter-table)                             | ✔️    | Only support altering table properties, column and partition changes are not supported |
-| [SQL drop_table](../flink-ddl.md#drop-table)                               | ✔️    |                                                                                        |
-| [SQL select](../flink-queries.md#reading-with-sql)                            | ✔️    | Support both streaming and batch mode                                                  |
-| [SQL insert into](../flink-writes.md#insert-into)                             | ✔️ ️  | Support both streaming and batch mode                                                  |
-| [SQL insert overwrite](../flink-writes.md#insert-overwrite)                   | ✔️ ️  |                                                                                        |
-| [DataStream read](../flink-queries.md#reading-with-datastream)                 | ✔️ ️  |                                                                                        |
-| [DataStream append](../flink-writes.md#appending-data)                        | ✔️ ️  |                                                                                        |
-| [DataStream overwrite](../flink-writes.md#overwrite-data)                     | ✔️ ️  |                                                                                        |
-| [Metadata tables](../flink-queries.md#inspecting-tables)                       | ✔️    |                                                                                        |
-| [Rewrite files action](../flink-actions.md#rewrite-files-action)               | ✔️ ️  |                                                                                        |
+| Feature support                                          | Flink | Notes                                                                                  |
+| -------------------------------------------------------- |-------|----------------------------------------------------------------------------------------|
+| [SQL create catalog](flink-ddl.md#create-catalog) | ✔️    |                                                                                        |
+| [SQL create database](flink-ddl.md#create-database) | ✔️    |                                                                                        |
+| [SQL create table](flink-ddl.md#create-table)                        | ✔️    |                                                                                        |
+| [SQL create table like](flink-ddl.md#create-table-like)              | ✔️    |                                                                                        |
+| [SQL alter table](flink-ddl.md#alter-table)                          | ✔️    | Only support altering table properties, column and partition changes are not supported |
+| [SQL drop_table](flink-ddl.md#drop-table)                            | ✔️    |                                                                                        |
+| [SQL select](flink-queries.md#reading-with-sql)                         | ✔️    | Support both streaming and batch mode                                                  |
+| [SQL insert into](flink-writes.md#insert-into)                          | ✔️ ️  | Support both streaming and batch mode                                                  |
+| [SQL insert overwrite](flink-writes.md#insert-overwrite)                | ✔️ ️  |                                                                                        |
+| [DataStream read](flink-queries.md#reading-with-datastream)              | ✔️ ️  |                                                                                        |
+| [DataStream append](flink-writes.md#appending-data)                    | ✔️ ️  |                                                                                        |
+| [DataStream overwrite](flink-writes.md#overwrite-data)                 | ✔️ ️  |                                                                                        |
+| [Metadata tables](flink-queries.md#inspecting-tables)                    | ✔️    |                                                                                        |
+| [Rewrite files action](flink-actions.md#rewrite-files-action)           | ✔️ ️  |                                                                                        |
 
 ## Preparation when using Flink SQL Client
 
@@ -69,6 +69,7 @@ export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
 ./bin/start-cluster.sh
 ```
 
+<!-- markdown-link-check-disable-next-line -->
 Start the Flink SQL client. There is a separate `flink-runtime` module in the Iceberg project to generate a bundled jar, which could be loaded by Flink SQL client directly. To build the `flink-runtime` bundled jar manually, build the `iceberg` project, and it will generate the jar under `<iceberg-root-dir>/flink-runtime/build/libs`. Or download the `flink-runtime` jar from the [Apache repository](https://repo.maven.apache.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.16/{{ icebergVersion }}/).
 
 ```bash
@@ -271,7 +272,7 @@ env.execute("Test Iceberg DataStream");
 
 ### Branch Writes
 Writing to branches in Iceberg tables is also supported via the `toBranch` API in `FlinkSink`
-For more information on branches please refer to [branches](../branching.md).
+For more information on branches please refer to [branches](branching.md).
 ```java
 FlinkSink.forRowData(input)
     .tableLoader(tableLoader)
-Original file line number
+Diff line change
@@ Expand Up / @@ -20,7 +20,7 @@ title: "Daft" @@
     # Daft
-    [Daft](www.getdaft.io) is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry.
+    [Daft](https://www.getdaft.io/) is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry.
     It exposes its flavor of the familiar [Python DataFrame API](https://www.getdaft.io/projects/docs/en/latest/api_docs/dataframe.html) which is a common abstraction over querying tables of data in the Python data ecosystem.
@@ Expand Down @@