Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Statistics, introduce precision estimates (Exact, Inexact, Absent) #7793

Merged
merged 69 commits into from
Oct 17, 2023
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
95f419b
analysis context refactored
berkaysynnada Sep 14, 2023
62be1e2
is_exact fix
berkaysynnada Sep 14, 2023
76da729
Minor changes
mustafasrepo Sep 15, 2023
1bb940d
minor changes
mustafasrepo Sep 15, 2023
c380f4a
Minor changes
mustafasrepo Sep 15, 2023
b801e46
Minor changes
mustafasrepo Sep 15, 2023
96b97c4
datatype check added, statistics default removed
berkaysynnada Sep 15, 2023
dd5e4c9
MemExec uses the stats of projections, agg optimize excludes unbounde…
berkaysynnada Sep 18, 2023
628eff3
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Sep 18, 2023
c891ea9
fix after merge
berkaysynnada Sep 18, 2023
2450ef2
proto fix
berkaysynnada Sep 19, 2023
a2d1f2e
Simplifications
mustafasrepo Sep 19, 2023
fa72d41
statistics() returns result
berkaysynnada Sep 20, 2023
7036184
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Sep 21, 2023
f2e99f9
fix after merge
berkaysynnada Sep 21, 2023
571654f
Simplifications
mustafasrepo Sep 22, 2023
6fbaa7b
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Sep 22, 2023
db10487
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Sep 25, 2023
6ef4582
Remove option from column stats
mustafasrepo Sep 25, 2023
6fa17a5
exact info added
berkaysynnada Sep 26, 2023
b56a81e
error in agg optimization
berkaysynnada Sep 28, 2023
cb11054
bugs are fixed
berkaysynnada Sep 28, 2023
946c341
negative expr support
berkaysynnada Sep 28, 2023
5829ca4
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Sep 28, 2023
47dc4e7
fix after merge
berkaysynnada Sep 29, 2023
69ac12e
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Sep 29, 2023
e8fa615
fix after merge
berkaysynnada Sep 29, 2023
91c9a6b
Minor changes, simplifications
mustafasrepo Sep 29, 2023
cfe6fca
minor changes
berkaysynnada Sep 29, 2023
8e95696
min max accs removed
berkaysynnada Oct 2, 2023
4f52b97
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Oct 2, 2023
0924618
fix after merge
berkaysynnada Oct 2, 2023
99fc41c
minor changes
berkaysynnada Oct 3, 2023
681cab3
fix initialization of stats in limit
berkaysynnada Oct 5, 2023
d489993
minor changes
berkaysynnada Oct 5, 2023
2d643ca
Simplifications
mustafasrepo Oct 5, 2023
cd3de94
more accurate row calculations
berkaysynnada Oct 5, 2023
872bc40
Improve comments
ozankabak Oct 6, 2023
f785f9a
min-max values are init as absent, not inf
berkaysynnada Oct 6, 2023
de53daa
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Oct 6, 2023
7fe9946
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Oct 9, 2023
429b5af
fix after merge
berkaysynnada Oct 9, 2023
5e0df33
Review Part 1
ozankabak Oct 10, 2023
a104fab
Cardinality calculation is fixed
berkaysynnada Oct 10, 2023
9bb034e
Review Part 2
ozankabak Oct 10, 2023
41b364f
get_int_range replaced by cardinality function
berkaysynnada Oct 11, 2023
3328280
Fix imports
ozankabak Oct 11, 2023
ca44991
Merge remote-tracking branch 'upstream/main' into refactor/analysis-c…
ozankabak Oct 11, 2023
ba31d44
Statistics display is shortened.
berkaysynnada Oct 11, 2023
aaaa71d
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Oct 13, 2023
8479595
fix after merge
berkaysynnada Oct 13, 2023
10b9100
Harmonize imports
ozankabak Oct 13, 2023
19dbd46
Update datafusion/physical-expr/src/intervals/interval_aritmetic.rs
berkaysynnada Oct 14, 2023
e62e545
Addresses the reviews
berkaysynnada Oct 14, 2023
036333c
Merge branch 'refactor/analysis-context' of https://github.com/synnad…
berkaysynnada Oct 14, 2023
3f62d85
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Oct 15, 2023
ab8c0c7
Update tests
berkaysynnada Oct 15, 2023
3471978
Remove panics
ozankabak Oct 16, 2023
23c564b
1 bug-fix, 2 code simplifications
berkaysynnada Oct 16, 2023
645555a
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Oct 16, 2023
e104a21
conflict resolved
berkaysynnada Oct 16, 2023
0865eb8
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Oct 16, 2023
81baf4c
conflict resolved
berkaysynnada Oct 16, 2023
a5ae0e2
Update datafusion/physical-plan/src/filter.rs
berkaysynnada Oct 17, 2023
ed4f759
Simplify set_max/min helpers
ozankabak Oct 17, 2023
5401051
fix vector copy, remove clones
berkaysynnada Oct 17, 2023
a02ea34
Merge branch 'apache_main' into refactor/analysis-context
berkaysynnada Oct 17, 2023
426462b
resolving conflict
berkaysynnada Oct 17, 2023
16d7882
remove clone
berkaysynnada Oct 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion datafusion-examples/examples/csv_opener.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

use std::{sync::Arc, vec};

use datafusion::common::Statistics;
use datafusion::{
assert_batches_eq,
datasource::{
Expand All @@ -29,6 +30,7 @@ use datafusion::{
physical_plan::metrics::ExecutionPlanMetricsSet,
test_util::aggr_test_schema,
};

use futures::StreamExt;
use object_store::local::LocalFileSystem;

Expand Down Expand Up @@ -60,7 +62,7 @@ async fn main() -> Result<()> {
object_store_url: ObjectStoreUrl::local_filesystem(),
file_schema: schema.clone(),
file_groups: vec![vec![PartitionedFile::new(path.display().to_string(), 10)]],
statistics: Default::default(),
statistics: Statistics::new_unknown(&schema),
projection: Some(vec![12, 0]),
limit: Some(5),
table_partition_cols: vec![],
Expand Down
21 changes: 11 additions & 10 deletions datafusion-examples/examples/custom_datasource.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,17 @@
// specific language governing permissions and limitations
// under the License.

use async_trait::async_trait;
use std::any::Any;
use std::collections::{BTreeMap, HashMap};
use std::fmt::{self, Debug, Formatter};
use std::sync::{Arc, Mutex};
use std::time::Duration;

use datafusion::arrow::array::{UInt64Builder, UInt8Builder};
use datafusion::arrow::datatypes::{DataType, Field, Schema, SchemaRef};
use datafusion::arrow::record_batch::RecordBatch;
use datafusion::dataframe::DataFrame;
use datafusion::datasource::provider_as_source;
use datafusion::datasource::{TableProvider, TableType};
use datafusion::datasource::{provider_as_source, TableProvider, TableType};
use datafusion::error::Result;
use datafusion::execution::context::{SessionState, TaskContext};
use datafusion::physical_plan::expressions::PhysicalSortExpr;
Expand All @@ -32,11 +36,8 @@ use datafusion::physical_plan::{
};
use datafusion::prelude::*;
use datafusion_expr::{Expr, LogicalPlanBuilder};
use std::any::Any;
use std::collections::{BTreeMap, HashMap};
use std::fmt::{self, Debug, Formatter};
use std::sync::{Arc, Mutex};
use std::time::Duration;

use async_trait::async_trait;
use tokio::time::timeout;

/// This example demonstrates executing a simple query against a custom datasource
Expand Down Expand Up @@ -270,7 +271,7 @@ impl ExecutionPlan for CustomExec {
)?))
}

fn statistics(&self) -> Statistics {
Statistics::default()
fn statistics(&self) -> Result<Statistics> {
Ok(Statistics::new_unknown(&self.schema()))
}
}
4 changes: 3 additions & 1 deletion datafusion-examples/examples/json_opener.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ use datafusion::{
error::Result,
physical_plan::metrics::ExecutionPlanMetricsSet,
};
use datafusion_common::Statistics;

use futures::StreamExt;
use object_store::ObjectStore;

Expand Down Expand Up @@ -63,7 +65,7 @@ async fn main() -> Result<()> {
object_store_url: ObjectStoreUrl::local_filesystem(),
file_schema: schema.clone(),
file_groups: vec![vec![PartitionedFile::new(path.to_string(), 10)]],
statistics: Default::default(),
statistics: Statistics::new_unknown(&schema),
projection: Some(vec![1, 0]),
limit: Some(5),
table_partition_cols: vec![],
Expand Down
2 changes: 1 addition & 1 deletion datafusion/common/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ pub use functional_dependencies::{
pub use join_type::{JoinConstraint, JoinType};
pub use scalar::{ScalarType, ScalarValue};
pub use schema_reference::{OwnedSchemaReference, SchemaReference};
pub use stats::{ColumnStatistics, Statistics};
pub use stats::{ColumnStatistics, Sharpness, Statistics};
pub use table_reference::{OwnedTableReference, ResolvedTableReference, TableReference};
pub use unnest::UnnestOptions;
pub use utils::project_schema;
Expand Down
3 changes: 2 additions & 1 deletion datafusion/common/src/scalar.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1004,7 +1004,8 @@ impl ScalarValue {
| ScalarValue::Int16(None)
| ScalarValue::Int32(None)
| ScalarValue::Int64(None)
| ScalarValue::Float32(None) => Ok(self.clone()),
| ScalarValue::Float32(None)
| ScalarValue::Float64(None) => Ok(self.clone()),
ScalarValue::Float64(Some(v)) => Ok(ScalarValue::Float64(Some(-v))),
ScalarValue::Float32(Some(v)) => Ok(ScalarValue::Float32(Some(-v))),
ScalarValue::Int8(Some(v)) => Ok(ScalarValue::Int8(Some(-v))),
Expand Down
Loading