Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying against delta lake table does not seem to work #5202

Closed
dadepo opened this issue Feb 6, 2023 · 2 comments
Closed

Querying against delta lake table does not seem to work #5202

dadepo opened this issue Feb 6, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@dadepo
Copy link

dadepo commented Feb 6, 2023

Describe the bug

I have a delta table I am accessing using datafusion.

A select * query works just fine, but any other query - like selecting only a column of summing a column does not do anything. No error or warning is thrown.

Basically the code is

async fn run_from_delta_table(ctx: &SessionContext) -> Result<(), DeltaTableError> {
    let table = open_table("../data/delta-table")
        .await
        .unwrap();

    ctx.register_table("demo", Arc::new(table)).unwrap();

    let df = ctx
        .sql("SELECT * FROM demo").await?;

    df.show().await?; // prints to the console

    let df = ctx
        .sql("SELECT ViewCount FROM demo").await?;

    df.show().await?; // does not print to the console

    let df = ctx
        .sql("SELECT SUM(ViewCount) FROM demo").await?;

    df.show().await?; // does not print to the console

    Ok(())
}

It is worth mentioning that querying using the dataframe API works as expected:

async fn run_df(ctx: &SessionContext) -> Result<(), DeltaTableError> {
    let table = open_table("../data/delta-table")
        .await
        .unwrap();

    let df = ctx.read_table(Arc::new(table))?;
    df.show().await?; // prints to the console

    let view_col = df.select(vec![col("ViewCount")])?;
    view_col.show().await?; // also prints to the console

    let view_sum = df
        .aggregate(vec![], vec![sum(col("ViewCount"))])?;
    view_sum.show().await?; // also prints to the console

    Ok(())
}

My Cargo.toml looks like this:

[dependencies]
datafusion = "15.0.0"
deltalake = {version="0.6.0", features = ["datafusion-ext"]}
tokio = {version="1.25.0", features = ["macros", "rt", "parking_lot"]}

To Reproduce

  • Create a project with the above mentioned dependencies in Cargo.toml
  • Have a detla-lake table at a known path
  • Run the two functions above run_df and run_from_delta_table while updating the path to the delta-lake path

Expected behavior

The SQL API should perform the mentioned query just as the dataframe API did

Additional context
Note I am using version 15.0.0 as this is the version compatible with deltalake

@dadepo dadepo added the bug Something isn't working label Feb 6, 2023
@dadepo
Copy link
Author

dadepo commented Feb 6, 2023

It was a case of wrong capitalisation. This portion of the documentation here alluded to it. Switching to lowercase, or escaping the column name and everything works fine.

@dadepo dadepo closed this as completed Feb 6, 2023
@alamb
Copy link
Contributor

alamb commented Feb 7, 2023

It was a case of wrong capitalisation. This portion of the documentation here alluded to it. Switching to lowercase, or escaping the column name and everything works fine.

It is unfortunate that there was not error that would have pointed you at the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants