Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create changelog for datafusion and ballista release #801

Merged
merged 21 commits into from
Aug 10, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/workflows/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,8 @@ jobs:
# if you encounter error, try rerun the command below with --write instead of --check
# and commit the changes
npx [email protected] --check \
{ballista,datafusion,datafusion-examples,docs,python}/**/*.md \
'{ballista,datafusion,datafusion-examples,docs,python}/**/*.md' \
'!{ballista,datafusion,python}/CHANGELOG.md' \
README.md \
DEVELOPERS.md \
ballista/**/*.{ts,tsx}
'ballista/**/*.{ts,tsx}'
6 changes: 3 additions & 3 deletions .github_changelog_generator
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@
# point to the old changelog in apache/arrow
front-matter=For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/arrow/blob/master/CHANGELOG.md)\n
# some issues are just documentation
add-sections={"documentation":{"prefix":"**Documentation updates:**","labels":["documentation"]}}
add-sections={"documentation":{"prefix":"**Documentation updates:**","labels":["documentation"]},"performance":{"prefix":"**Performance improvements:**","labels":["performance"]}}
# uncomment to not show PRs. TBD if we shown them or not.
#pull-requests=false
# so that the component is shown associated with the issue
issue-line-labels=ballista,datafusion,python
issue-line-labels=sql
exclude-labels=development-process,invalid
breaking_labels=api-change
breaking-labels=api change
9,513 changes: 27 additions & 9,486 deletions CHANGELOG.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion ballista-examples/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
[package]
name = "ballista-examples"
description = "Ballista usage examples"
version = "0.5.0-SNAPSHOT"
version = "0.5.0"
homepage = "https://github.com/apache/arrow-datafusion"
repository = "https://github.com/apache/arrow-datafusion"
authors = ["Apache Arrow <[email protected]>"]
Expand Down
168 changes: 168 additions & 0 deletions ballista/CHANGELOG.md

Large diffs are not rendered by default.

20 changes: 10 additions & 10 deletions ballista/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@

# Ballista: Distributed Compute with Apache Arrow and DataFusion

Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow and
DataFusion. It is built on an architecture that allows other programming languages (such as Python, C++, and
Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow and
DataFusion. It is built on an architecture that allows other programming languages (such as Python, C++, and
Java) to be supported as first-class citizens without paying a penalty for serialization costs.

The foundational technologies in Ballista are:
Expand All @@ -37,23 +37,23 @@ redundancy in the case of a scheduler failing.

# Getting Started

Fully working examples are available. Refer to the [Ballista Examples README](../ballista-examples/README.md) for
Fully working examples are available. Refer to the [Ballista Examples README](../ballista-examples/README.md) for
more information.

## Distributed Scheduler Overview

Ballista uses the DataFusion query execution framework to create a physical plan and then transforms it into a
Ballista uses the DataFusion query execution framework to create a physical plan and then transforms it into a
distributed physical plan by breaking the query down into stages whenever the partitioning scheme changes.

Specifically, any `RepartitionExec` operator is replaced with an `UnresolvedShuffleExec` and the child operator
Specifically, any `RepartitionExec` operator is replaced with an `UnresolvedShuffleExec` and the child operator
of the repartition operator is wrapped in a `ShuffleWriterExec` operator and scheduled for execution.

Each executor polls the scheduler for the next task to run. Tasks are currently always `ShuffleWriterExec` operators
and each task represents one *input* partition that will be executed. The resulting batches are repartitioned
according to the shuffle partitioning scheme and each *output* partition is streamed to disk in Arrow IPC format.
Each executor polls the scheduler for the next task to run. Tasks are currently always `ShuffleWriterExec` operators
and each task represents one _input_ partition that will be executed. The resulting batches are repartitioned
according to the shuffle partitioning scheme and each _output_ partition is streamed to disk in Arrow IPC format.

The scheduler will replace `UnresolvedShuffleExec` operators with `ShuffleReaderExec` operators once all shuffle
tasks have completed. The `ShuffleReaderExec` operator connects to other executors as required using the Flight
The scheduler will replace `UnresolvedShuffleExec` operators with `ShuffleReaderExec` operators once all shuffle
tasks have completed. The `ShuffleReaderExec` operator connects to other executors as required using the Flight
interface, and streams the shuffle IPC files.

# How does this compare to Apache Spark?
Expand Down
4 changes: 2 additions & 2 deletions ballista/rust/client/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
name = "ballista"
description = "Ballista Distributed Compute"
license = "Apache-2.0"
version = "0.5.0-SNAPSHOT"
version = "0.5.0"
homepage = "https://github.com/apache/arrow-datafusion"
repository = "https://github.com/apache/arrow-datafusion"
authors = ["Apache Arrow <[email protected]>"]
Expand All @@ -37,4 +37,4 @@ datafusion = { path = "../../../datafusion" }

[features]
default = []
standalone = ["ballista-executor", "ballista-scheduler"]
standalone = ["ballista-executor", "ballista-scheduler"]
2 changes: 1 addition & 1 deletion ballista/rust/core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
name = "ballista-core"
description = "Ballista Distributed Compute"
license = "Apache-2.0"
version = "0.5.0-SNAPSHOT"
version = "0.5.0"
homepage = "https://github.com/apache/arrow-datafusion"
repository = "https://github.com/apache/arrow-datafusion"
authors = ["Apache Arrow <[email protected]>"]
Expand Down
2 changes: 1 addition & 1 deletion ballista/rust/executor/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
name = "ballista-executor"
description = "Ballista Distributed Compute - Executor"
license = "Apache-2.0"
version = "0.5.0-SNAPSHOT"
version = "0.5.0"
homepage = "https://github.com/apache/arrow-datafusion"
repository = "https://github.com/apache/arrow-datafusion"
authors = ["Apache Arrow <[email protected]>"]
Expand Down
2 changes: 1 addition & 1 deletion ballista/rust/scheduler/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
name = "ballista-scheduler"
description = "Ballista Distributed Compute - Scheduler"
license = "Apache-2.0"
version = "0.5.0-SNAPSHOT"
version = "0.5.0"
homepage = "https://github.com/apache/arrow-datafusion"
repository = "https://github.com/apache/arrow-datafusion"
authors = ["Apache Arrow <[email protected]>"]
Expand Down
Loading