Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master' into mathematics_op_de…
Browse files Browse the repository at this point in the history
…cimal
  • Loading branch information
liukun4515 committed Jan 17, 2022
2 parents bf63119 + f027e5f commit 3d15e8f
Show file tree
Hide file tree
Showing 223 changed files with 15,804 additions and 14,059 deletions.
11 changes: 11 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: 2
updates:
- package-ecosystem: cargo
directory: "/"
schedule:
interval: weekly
day: sunday
time: "7:00"
open-pull-requests-limit: 10
target-branch: master
labels: [auto-dependencies]
131 changes: 0 additions & 131 deletions .github/workflows/python_build.yml

This file was deleted.

62 changes: 0 additions & 62 deletions .github/workflows/python_test.yaml

This file was deleted.

2 changes: 0 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,6 @@ members = [
"ballista-examples",
]

exclude = ["python"]

[profile.release]
lto = true
codegen-units = 1
19 changes: 12 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,19 @@ the convenience of an SQL interface or a DataFrame API.

## Known Uses

Projects that adapt to or service as plugins to DataFusion:

- [datafusion-python](https://github.com/datafusion-contrib/datafusion-python)
- [datafusion-java](https://github.com/datafusion-contrib/datafusion-java)
- [datafusion-ruby](https://github.com/j-a-m-l/datafusion-ruby)
- [datafusion-objectstore-s3](https://github.com/datafusion-contrib/datafusion-objectstore-s3)
- [datafusion-hdfs-native](https://github.com/datafusion-contrib/datafusion-hdfs-native)

Here are some of the projects known to use DataFusion:

- [Ballista](ballista) Distributed Compute Platform
- [Cloudfuse Buzz](https://github.com/cloudfuse-io/buzz-rust)
- [Cube Store](https://github.com/cube-js/cube.js/tree/master/rust)
- [datafusion-python](https://pypi.org/project/datafusion)
- [datafusion-java](https://github.com/datafusion-contrib/datafusion-java)
- [datafusion-ruby](https://github.com/j-a-m-l/datafusion-ruby)
- [delta-rs](https://github.com/delta-io/delta-rs)
- [InfluxDB IOx](https://github.com/influxdata/influxdb_iox) Time Series Database
- [ROAPI](https://github.com/roapi/roapi)
Expand Down Expand Up @@ -256,7 +261,7 @@ DataFusion is designed to be extensible at all points. To that end, you can prov

## Rust Version Compatbility

This crate is tested with the latest stable version of Rust. We do not currrently test against other, older versions of the Rust compiler.
This crate is tested with the latest stable version of Rust. We do not currently test against other, older versions of the Rust compiler.

# Supported SQL

Expand All @@ -266,9 +271,9 @@ This library currently supports many SQL constructs, including
- `SELECT ... FROM ...` together with any expression
- `ALIAS` to name an expression
- `CAST` to change types, including e.g. `Timestamp(Nanosecond, None)`
- most mathematical unary and binary expressions such as `+`, `/`, `sqrt`, `tan`, `>=`.
- Many mathematical unary and binary expressions such as `+`, `/`, `sqrt`, `tan`, `>=`.
- `WHERE` to filter
- `GROUP BY` together with one of the following aggregations: `MIN`, `MAX`, `COUNT`, `SUM`, `AVG`
- `GROUP BY` together with one of the following aggregations: `MIN`, `MAX`, `COUNT`, `SUM`, `AVG`, `CORR`, `VAR`, `COVAR`, `STDDEV` (sample and population)
- `ORDER BY` together with an expression and optional `ASC` or `DESC` and also optional `NULLS FIRST` or `NULLS LAST`

## Supported Functions
Expand Down Expand Up @@ -368,7 +373,7 @@ Please see [Roadmap](docs/source/specification/roadmap.md) for information of wh
There is no formal document describing DataFusion's architecture yet, but the following presentations offer a good overview of its different components and how they interact together.

- (March 2021): The DataFusion architecture is described in _Query Engine Design and the Rust-Based DataFusion in Apache Arrow_: [recording](https://www.youtube.com/watch?v=K6eCAVEk4kU) (DataFusion content starts [~ 15 minutes in](https://www.youtube.com/watch?v=K6eCAVEk4kU&t=875s)) and [slides](https://www.slideshare.net/influxdata/influxdb-iox-tech-talks-query-engine-design-and-the-rustbased-datafusion-in-apache-arrow-244161934)
- (Feburary 2021): How DataFusion is used within the Ballista Project is described in \*Ballista: Distributed Compute with Rust and Apache Arrow: [recording](https://www.youtube.com/watch?v=ZZHQaOap9pQ)
- (February 2021): How DataFusion is used within the Ballista Project is described in \*Ballista: Distributed Compute with Rust and Apache Arrow: [recording](https://www.youtube.com/watch?v=ZZHQaOap9pQ)

# Developer's guide

Expand Down
6 changes: 3 additions & 3 deletions ballista-examples/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ license = "Apache-2.0"
keywords = [ "arrow", "distributed", "query", "sql" ]
edition = "2021"
publish = false
rust-version = "1.57"
rust-version = "1.58"

[dependencies]
datafusion = { path = "../datafusion" }
ballista = { path = "../ballista/rust/client", version = "0.6.0"}
prost = "0.8"
tonic = "0.5"
prost = "0.9"
tonic = "0.6"
tokio = { version = "1.0", features = ["macros", "rt", "rt-multi-thread", "sync"] }
futures = "0.3"
num_cpus = "1.13.0"
2 changes: 1 addition & 1 deletion ballista/rust/client/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ homepage = "https://github.com/apache/arrow-datafusion"
repository = "https://github.com/apache/arrow-datafusion"
authors = ["Apache Arrow <[email protected]>"]
edition = "2021"
rust-version = "1.57"
rust-version = "1.58"

[dependencies]
ballista-core = { path = "../core", version = "0.6.0" }
Expand Down
8 changes: 4 additions & 4 deletions ballista/rust/core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,19 +35,19 @@ async-trait = "0.1.36"
futures = "0.3"
hashbrown = "0.11"
log = "0.4"
prost = "0.8"
prost = "0.9"
serde = {version = "1", features = ["derive"]}
sqlparser = "0.13"
tokio = "1.0"
tonic = "0.5"
tonic = "0.6"
uuid = { version = "0.8", features = ["v4"] }
chrono = { version = "0.4", default-features = false }

arrow-flight = { version = "6.4.0" }
arrow-flight = { version = "7.0.0" }
datafusion = { path = "../../../datafusion", version = "6.0.0" }

[dev-dependencies]
tempfile = "3"

[build-dependencies]
tonic-build = { version = "0.5" }
tonic-build = { version = "0.6" }
14 changes: 14 additions & 0 deletions ballista/rust/core/proto/ballista.proto
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,13 @@ enum AggregateFunction {
COUNT = 4;
APPROX_DISTINCT = 5;
ARRAY_AGG = 6;
VARIANCE=7;
VARIANCE_POP=8;
COVARIANCE=9;
COVARIANCE_POP=10;
STDDEV=11;
STDDEV_POP=12;
CORRELATION=13;
}

message AggregateExprNode {
Expand Down Expand Up @@ -1037,11 +1044,18 @@ message Struct{
repeated Field sub_field_types = 1;
}

enum UnionMode{
sparse = 0;
dense = 1;
}

message Union{
repeated Field union_types = 1;
UnionMode union_mode = 2;
}



message ScalarListValue{
ScalarType datatype = 1;
repeated ScalarValue values = 2;
Expand Down
14 changes: 9 additions & 5 deletions ballista/rust/core/src/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

//! Client API for sending requests to executors.
use std::sync::Arc;
use std::sync::{Arc, Mutex};
use std::{collections::HashMap, pin::Pin};
use std::{
convert::{TryFrom, TryInto},
Expand Down Expand Up @@ -135,24 +135,28 @@ impl BallistaClient {
}

struct FlightDataStream {
stream: Streaming<FlightData>,
stream: Mutex<Streaming<FlightData>>,
schema: SchemaRef,
}

impl FlightDataStream {
pub fn new(stream: Streaming<FlightData>, schema: SchemaRef) -> Self {
Self { stream, schema }
Self {
stream: Mutex::new(stream),
schema,
}
}
}

impl Stream for FlightDataStream {
type Item = ArrowResult<RecordBatch>;

fn poll_next(
mut self: std::pin::Pin<&mut Self>,
self: std::pin::Pin<&mut Self>,
cx: &mut Context<'_>,
) -> Poll<Option<Self::Item>> {
self.stream.poll_next_unpin(cx).map(|x| match x {
let mut stream = self.stream.lock().expect("mutex is bad");
stream.poll_next_unpin(cx).map(|x| match x {
Some(flight_data_chunk_result) => {
let converted_chunk = flight_data_chunk_result
.map_err(|e| ArrowError::from_external_error(Box::new(e)))
Expand Down
Loading

0 comments on commit 3d15e8f

Please sign in to comment.