diff --git a/docs/glossary.md b/docs/glossary.md index be06940c1f8..c6f90a117b2 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -4,37 +4,182 @@ date: 2023-11-28 license: "This software is licensed under the Apache License version 2." --- +## API + +- Application Programming Interface, defining the methods and protocols for interacting with a server. + +## AWS + +- Amazon Web Services, a cloud computing platform provided by Amazon. + +## AWS Glue + +- A compatible implementation of the Hive Metastore Service (HMS). + +## GPG/GnuPG + +- Gnu Privacy Guard or GnuPG, an open-source implementation of the OpenPGP standard, + used for encrypting and signing files and emails. + +## HDFS + +- **HDFS** (Hadoop Distributed File System) is an open-source, distributed file system and + a key component of the Apache Hadoop ecosystem. + It is designed to store and process large-scale datasets, providing high reliability, + fault tolerance, and performance for distributed storage solutions. + +## HTTP port + +- The port number on which a server listens for incoming connections. + +## IP address + +- Internet Protocol address, a numerical label assigned to each device participating in + a computer network. + +## JDBC + +- Java Database Connectivity, an API for connecting Java applications to relational databases. + +## JDBC URI + +- The JDBC connection address specified in the catalog configuration, + including details such as the database type, host, port, and database name. + +## JDK + +- The software development kit for the Java programming language, including tools for + compiling, debugging, and running Java applications. + +## JMX + +- Java Management Extensions provides tools for managing and monitoring Java applications. + +## JSON + +- JavaScript Object Notation, a lightweight data interchange format. + +## JSON Web Token + +- See [JWT](#jwt) + +## JVM + +- A virtual machine that enables a computer to run Java applications, + providing an abstraction layer between the application and the underlying hardware. + +## JVM instrumentation + +- The process of adding monitoring and management capabilities to the Java Virtual Machine, + allowing for the collection of performance metrics. + +## JVM metrics + +- Metrics related to the performance and behavior of the Java Virtual Machine (JVM), + including memory usage, garbage collection, and buffer pool metrics. + +## JWT(JSON Web Token) + +- A compact, URL-safe means of representing claims between two parties. + +## KEYS file + +- A file containing public keys used to sign previous releases, necessary for verifying signatures. + +## PGP Signature + +- A digital signature generated using the Pretty Good Privacy (PGP) algorithm, + confirming the authenticity of a file. + +## REST + +- A set of architectural principles for designing networked applications. + +## REST API + +- Representational State Transfer Application Programming Interface. + A set of rules and conventions for building and interacting with Web services + using standard HTTP methods. + +## SHA256 checksum + +- A cryptographic hash function used to verify the integrity of files. + +## SHA256 checksum file + +- A file containing the SHA256 hash value of another file, used for verification purposes. + +## SQL + +- A programming language used to manage and manipulate relational databases. + +## SSH + +- Secure Shell, a cryptographic network protocol used for secure communication over a computer network. + +## URI + +- Uniform Resource Identifier, a string that identifies the name or resource on the internet. + +## YAML + +- YAML Ain't Markup Language, a human-readable data serialization format often used for + structured configuration files. + +## Amazon Elastic Block Store (EBS) + +- A scalable block storage service provided by Amazon Web Services. + +## Apache Gravitino + +- An open-source software platform originally created by Datastrato for high-performance, + geo-distributed, and federated metadata lakes. + Designed to manage metadata directly in different sources, types, and regions, + providing unified metadata access for data and AI assets. + +## Apache Gravitino configuration file (gravitino.conf) + +- The configuration file for the Gravitino server, located in the `conf` directory. + It follows the standard property file format and contains settings for the Gravitino server. + ## Apache Hadoop - An open-source distributed storage and processing framework. ## Apache Hive -- An open-source data warehousing and SQL-like query language software project for managing and querying large datasets. +- An open-source data warehousing and SQL-like query language software project + for managing and querying large datasets. ## Apache Iceberg - An open-source, versioned table format for large-scale data processing. -## Apache License version 2 +## Apache Iceberg Hive catalog -- A permissive, open-source software license written by The Apache Software Foundation. +- The **Iceberg Hive catalog** is a specialized metadata service designed for + the Apache Iceberg table format, allowing external systems to interact with + Iceberg metadata via a Hive metastore thrift client. -## API +## Apache Iceberg JDBC catalog -- Application Programming Interface, defining the methods and protocols for interacting with a server. +- The **Iceberg JDBC Catalog** is a specialized metadata service designed for + the Apache Iceberg table format, allowing external systems to interact with + Iceberg metadata using JDBC (Java Database Connectivity). -## Authentication mechanism +## Apache Iceberg REST catalog -- The method used to verify the identity of users and clients accessing a server. +- The **Iceberg REST Catalog** is a specialized metadata service designed for + the Apache Iceberg table format, allowing external systems to interact with + Iceberg metadata via a RESTful API. -## AWS +## Apache License version 2 -- Amazon Web Services, a cloud computing platform provided by Amazon. +- A permissive, open-source software license written by The Apache Software Foundation. -## AWS Glue +## Authentication mechanism -- A compatible implementation of the Hive Metastore Service (HMS). +- The method used to verify the identity of users and clients accessing a server. ## Binary distribution package @@ -58,7 +203,8 @@ license: "This software is licensed under the Apache License version 2." ## Contributor covenant -- A widely-used and recognized code of conduct for open-source communities. It provides guidelines for creating a welcoming and inclusive environment for all contributors. +- A widely-used and recognized code of conduct for open-source communities. + It provides guidelines for creating a welcoming and inclusive environment for all contributors. ## Dependencies @@ -74,7 +220,8 @@ license: "This software is licensed under the Apache License version 2." ## Docker container -- A lightweight, standalone, executable package that includes everything needed to run a piece of software, including the code, runtime, libraries, and system tools. +- A lightweight, standalone, executable package that includes everything needed to run + a piece of software, including the code, runtime, libraries, and system tools. ## Docker Hub @@ -82,20 +229,18 @@ license: "This software is licensed under the Apache License version 2." ## Docker image -- A lightweight, standalone, and executable package that includes everything needed to run a piece of software, including the code, runtime, libraries, and system tools. +- A lightweight, standalone, and executable package that includes everything needed to + run a piece of software, including the code, runtime, libraries, and system tools. -## Docker file +## Dockerfile -- A configuration file used to create a Docker image, specifying the base image, dependencies, and commands for building the image. +- A configuration file used to create a Docker image, specifying the base image, dependencies, + and commands for building the image. ## Dropwizard Metrics - A Java library for measuring the performance of applications and providing support for various metric types. -## Amazon Elastic Block Store (EBS) - -- A scalable block storage service provided by Amazon Web Services. - ## Environment variables - Variables used to pass information to running processes. @@ -104,6 +249,10 @@ license: "This software is licensed under the Apache License version 2." - The distribution of data or services across multiple geographic locations. +## Git + +- A version control system used for tracking changes and collaborating on source code. + ## GitHub - A web-based platform for version control and collaboration using Git. @@ -128,14 +277,6 @@ license: "This software is licensed under the Apache License version 2." - A series of automated steps defined in a YAML file that runs in response to events on a GitHub repository. -## Git - -- A version control system used for tracking changes and collaborating on source code. - -## GPG/GnuPG - -- Gnu Privacy Guard or GnuPG, an open-source implementation of the OpenPGP standard, used for encrypting and signing files and emails. - ## Gradle - A build automation tool for building, testing, and deploying projects. @@ -144,42 +285,14 @@ license: "This software is licensed under the Apache License version 2." - A Gradle wrapper script, used for executing Gradle commands without installing Gradle separately. -## Apache Gravitino - -- An open-source software platform originally created by Datastrato for high-performance, geo-distributed, and federated metadata lakes. Designed to manage metadata directly in different sources, types, and regions, providing unified metadata access for data and AI assets. - -## Apache Gravitino configuration file (gravitino.conf) - -- The configuration file for the Gravitino server, located in the `conf` directory. It follows the standard property file format and contains settings for the Gravitino server. - ## Hashes - Cryptographic hash values generated from the contents of a file, often used for integrity verification. -## HDFS - -- **HDFS** (Hadoop Distributed File System) is an open-source, distributed file system and a key component of the Apache Hadoop ecosystem. It is designed to store and process large-scale datasets, providing high reliability, fault tolerance, and performance for distributed storage solutions. - ## Headless - A system without a graphical user interface. -## HTTP port - -- The port number on which a server listens for incoming connections. - -## Apache Iceberg Hive catalog - -- The **Iceberg Hive catalog** is a specialized metadata service designed for the Apache Iceberg table format, allowing external systems to interact with Iceberg metadata via a Hive metastore thrift client. - -## Apache Iceberg REST catalog - -- The **Iceberg REST Catalog** is a specialized metadata service designed for the Apache Iceberg table format, allowing external systems to interact with Iceberg metadata via a RESTful API. - -## Apache Iceberg JDBC catalog - -- The **Iceberg JDBC Catalog** is a specialized metadata service designed for the Apache Iceberg table format, allowing external systems to interact with Iceberg metadata using JDBC (Java Database Connectivity). - ## Identity fields - Fields in tables that define the identity of the table, specifying how rows in the table are uniquely identified. @@ -188,61 +301,36 @@ license: "This software is licensed under the Apache License version 2." - Tests designed to ensure the correctness and compatibility of software when integrated into a unified system. -## IP address - -- Internet Protocol address, a numerical label assigned to each device participating in a computer network. - ## Java Database Connectivity (JDBC) -- Java Database Connectivity, an API for connecting Java applications to relational databases. +- See [JDBC](#jdbc) ## Java Development Kits (JDKs) -- Software development kits for the Java programming language, including tools for compiling, debugging, and running Java applications. - -## Java Toolchain - -- A feature introduced in Gradle to detect and manage JDK versions. - -## JDBC URI - -- The JDBC connection address specified in the catalog configuration, including details such as the database type, host, port, and database name. - -## JMX - -- Java Management Extensions provides tools for managing and monitoring Java applications. - -## JSON +- See [JDK](#jdk) -- JavaScript Object Notation, a lightweight data interchange format. +## Java Management Extensions -## JWT(JSON Web Token) - -- A compact, URL-safe means of representing claims between two parties. +- See [JMX](#jmx) -## Java Virtual Machine (JVM) - -- A virtual machine that enables a computer to run Java applications, providing an abstraction layer between the application and the underlying hardware. - -## JVM metrics +## Java Toolchain -- Metrics related to the performance and behavior of the Java Virtual Machine (JVM), including memory usage, garbage collection, and buffer pool metrics. +- A feature introduced in Gradle to detect and manage JDK versions. -## JVM instrumentation +## Java Virtual Machine -- The process of adding monitoring and management capabilities to the Java Virtual Machine, allowing for the collection of performance metrics. +- See [JVM](#jvm) ## Key pair - A pair of cryptographic keys, including a public key used for verification and a private key used for signing. -## KEYS file - -- A file containing public keys used to sign previous releases, necessary for verifying signatures. - ## Lakehouse -- **Lakehouse** refers to a modern data management architecture that combines elements of data lakes and data warehouses. It aims to provide a unified platform for storing, managing, and analyzing both raw unstructured data (similar to data lakes) and curated structured data. +- **Lakehouse** refers to a modern data management architecture that combines elements of + data lakes and data warehouses. It aims to provide a unified platform for storing, + managing, and analyzing both raw unstructured data (similar to data lakes) and + curated structured data. ## Manifest @@ -254,7 +342,8 @@ license: "This software is licensed under the Apache License version 2." ## Metalake -- The top-level container for metadata. Typically, a metalake is a tenant-like mapping to an organization or a company. All the catalogs, users, and roles are under one metalake. +- The top-level container for metadata. Typically, a metalake is a tenant-like mapping to + an organization or a company. All the catalogs, users, and roles are under one metalake. ## Metastore @@ -264,17 +353,14 @@ license: "This software is licensed under the Apache License version 2." - A distinct and separable part of a project. -## OrbStack - -- A tool mentioned as an alternative to Docker for macOS when running Gravitino integration tests. - ## Open authorization / OAuth -- A standard protocol for authorization that allows third-party applications to access user data without exposing user credentials. +- A standard protocol for authorization that allows third-party applications to access user data + without exposing user credentials. -## PGP Signature +## OrbStack -- A digital signature generated using the Pretty Good Privacy (PGP) algorithm, confirming the authenticity of a file. +- A tool mentioned as an alternative to Docker for macOS when running Gravitino integration tests. ## Private key @@ -286,19 +372,18 @@ license: "This software is licensed under the Apache License version 2." ## Protocol buffers (protobuf) -- A method developed by Google for serializing structured data, similar to XML or JSON. It is often used for efficient and extensible communication between systems. +- A method developed by Google for serializing structured data, similar to XML or JSON. + It is often used for efficient and extensible communication between systems. ## Public key - An openly shared key used for verification, encryption, or other operations intended for public knowledge. -## Representational State Transfer (REST) +## Representational State Transfer -- A set of architectural principles for designing networked applications. - -## REST API (Representational State Transfer Application Programming Interface) +- See [REST](#rest) -- A set of rules and conventions for building and interacting with web services using standard HTTP methods. +- A set of architectural principles for designing networked applications. ## RocksDB @@ -308,9 +393,9 @@ license: "This software is licensed under the Apache License version 2." - A logical container for organizing tables in a database. -## Secure Shell (SSH) +## Secure Shell -- Secure Shell, a cryptographic network protocol used for secure communication over a computer network. +- See [SSH](#ssh) ## Security group @@ -318,15 +403,8 @@ license: "This software is licensed under the Apache License version 2." ## Serde -- A Serialization/Deserialization library responsible for transforming data between a tabular format and a format suitable for storage or transmission. - -## SHA256 checksum - -- A cryptographic hash function used to verify the integrity of files. - -## SHA256 checksum file - -- A file containing the SHA256 hash value of another file, used for verification purposes. +- A serialization/deserialization library responsible for transforming data between + a tabular format and a format suitable for storage or transmission. ## Snapshot @@ -340,21 +418,23 @@ license: "This software is licensed under the Apache License version 2." - A tool or process used to enforce code formatting standards and apply automatic formatting to code. -## Structured Query Language (SQL) +## Structured Query Language -- A programming language used to manage and manipulate relational databases. +- See [SQL](#sql) ## Table - A structured set of data elements stored in columns and rows. -## Token +## Thrift -- A **token** in the context of computing and security commonly refers to a small, indivisible unit of data. Tokens play a crucial role in various domains, including authentication, authorization, and cryptographic systems. +- A network protocol used for communication with Hive Metastore Service (HMS). -## Thrift protocol +## Token -- The network protocol used for communication with Hive Metastore Service (HMS). +- A **token** in the context of computing and security commonly refers to a small, + indivisible unit of data. Tokens play a crucial role in various domains, including + authentication, authorization, and cryptographic systems. ## Trino @@ -364,10 +444,6 @@ license: "This software is licensed under the Apache License version 2." - A connector module for integrating Gravitino with Trino. -## Trino Apache Gravitino connector documentation - -- Documentation providing information on using the Trino connector to access metadata in Gravitino. - ## Ubuntu - A Linux distribution based on Debian, widely used for cloud computing and servers. @@ -376,18 +452,12 @@ license: "This software is licensed under the Apache License version 2." - A type of testing where individual components or functions of a program are tested to ensure they work as expected in isolation. -## URI - -- Uniform Resource Identifier, a string that identifies the name or resource on the internet. - ## Verification - The process of confirming the authenticity and integrity of a release by checking its signature and associated hashes. -## WEB UI +## Web UI - A graphical interface accessible through a web browser. -## YAML -- YAML Ain't Markup Language, a human-readable data serialization format often used for configuration files.