Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expand file glob within prettier #803

Merged
merged 1 commit into from
Aug 1, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ jobs:
# if you encounter error, try rerun the command below with --write instead of --check
# and commit the changes
npx [email protected] --check \
{ballista,datafusion,datafusion-examples,docs,python}/**/*.md \
'{ballista,datafusion,datafusion-examples,docs,python}/**/*.md' \
README.md \
DEVELOPERS.md \
ballista/**/*.{ts,tsx}
'ballista/**/*.{ts,tsx}'
20 changes: 10 additions & 10 deletions ballista/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@

# Ballista: Distributed Compute with Apache Arrow and DataFusion

Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow and
DataFusion. It is built on an architecture that allows other programming languages (such as Python, C++, and
Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow and
DataFusion. It is built on an architecture that allows other programming languages (such as Python, C++, and
Java) to be supported as first-class citizens without paying a penalty for serialization costs.

The foundational technologies in Ballista are:
Expand All @@ -37,23 +37,23 @@ redundancy in the case of a scheduler failing.

# Getting Started

Fully working examples are available. Refer to the [Ballista Examples README](../ballista-examples/README.md) for
Fully working examples are available. Refer to the [Ballista Examples README](../ballista-examples/README.md) for
more information.

## Distributed Scheduler Overview

Ballista uses the DataFusion query execution framework to create a physical plan and then transforms it into a
Ballista uses the DataFusion query execution framework to create a physical plan and then transforms it into a
distributed physical plan by breaking the query down into stages whenever the partitioning scheme changes.

Specifically, any `RepartitionExec` operator is replaced with an `UnresolvedShuffleExec` and the child operator
Specifically, any `RepartitionExec` operator is replaced with an `UnresolvedShuffleExec` and the child operator
of the repartition operator is wrapped in a `ShuffleWriterExec` operator and scheduled for execution.

Each executor polls the scheduler for the next task to run. Tasks are currently always `ShuffleWriterExec` operators
and each task represents one *input* partition that will be executed. The resulting batches are repartitioned
according to the shuffle partitioning scheme and each *output* partition is streamed to disk in Arrow IPC format.
Each executor polls the scheduler for the next task to run. Tasks are currently always `ShuffleWriterExec` operators
and each task represents one _input_ partition that will be executed. The resulting batches are repartitioned
according to the shuffle partitioning scheme and each _output_ partition is streamed to disk in Arrow IPC format.

The scheduler will replace `UnresolvedShuffleExec` operators with `ShuffleReaderExec` operators once all shuffle
tasks have completed. The `ShuffleReaderExec` operator connects to other executors as required using the Flight
The scheduler will replace `UnresolvedShuffleExec` operators with `ShuffleReaderExec` operators once all shuffle
tasks have completed. The `ShuffleReaderExec` operator connects to other executors as required using the Flight
interface, and streams the shuffle IPC files.

# How does this compare to Apache Spark?
Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/src/distributed/docker-compose.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ demonstrates how to start a cluster using a single process that acts as both a s
volume mounted into the container so that Ballista can access the host file system.

```yaml
version: '2.2'
version: "2.2"
services:
etcd:
image: quay.io/coreos/etcd:v3.4.9
Expand Down
30 changes: 15 additions & 15 deletions docs/user-guide/src/distributed/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,16 +129,16 @@ spec:
ballista-cluster: ballista
spec:
containers:
- name: ballista-scheduler
image: <your-image>
command: ["/scheduler"]
args: ["--bind-port=50050"]
ports:
- containerPort: 50050
name: flight
volumeMounts:
- mountPath: /mnt
name: data
- name: ballista-scheduler
image: <your-image>
command: ["/scheduler"]
args: ["--bind-port=50050"]
ports:
- containerPort: 50050
name: flight
volumeMounts:
- mountPath: /mnt
name: data
volumes:
- name: data
persistentVolumeClaim:
Expand Down Expand Up @@ -245,10 +245,10 @@ spec:
minReplicaCount: 0
maxReplicaCount: 5
triggers:
- type: external
metadata:
# Change this DNS if the scheduler isn't deployed in the "default" namespace
scalerAddress: ballista-scheduler.default.svc.cluster.local:50050
- type: external
metadata:
# Change this DNS if the scheduler isn't deployed in the "default" namespace
scalerAddress: ballista-scheduler.default.svc.cluster.local:50050
```

And then deploy it into the cluster:
Expand All @@ -261,4 +261,4 @@ If the cluster is inactive, Keda will now scale the number of executors down to
you launch a query. Please note that Keda will perform a scan once every 30 seconds, so it might take a bit to
scale the executors.

Please visit Keda's [documentation page](https://keda.sh/docs/2.3/concepts/scaling-deployments/) for more information.
Please visit Keda's [documentation page](https://keda.sh/docs/2.3/concepts/scaling-deployments/) for more information.