Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happened for NDJSON support on CLI? #4198

Closed
thomas-k-cameron opened this issue Nov 14, 2022 · 5 comments
Closed

What happened for NDJSON support on CLI? #4198

thomas-k-cameron opened this issue Nov 14, 2022 · 5 comments
Labels
bug Something isn't working
Milestone

Comments

@thomas-k-cameron
Copy link
Contributor

thomas-k-cameron commented Nov 14, 2022

Describe the bug
When I tried to run this script, it shown an error.
I believe that it worked before.

CREATE EXTERNAL TABLE IF NOT EXISTS asdf STORED AS NDJSON LOCATION 'some.ndjson';
SELECT * FROM asdf LIMIT 5;

Execution("Unable to find factory for NDJSON")

contents of the file is

{"asdf": 1}
{"asdf": 2}

To Reproduce
Use that file and run that script.

Expected behavior
It should create a table.

Additional context

datafusion-cli 14.0.0

When I tried it with STORED AS JSON it still didn't work.

CREATE EXTERNAL TABLE IF NOT EXISTS asdf STORED AS JSON LOCATION 'some.ndjson';
SELECT * FROM asdf LIMIT 5;
@thomas-k-cameron thomas-k-cameron added the bug Something isn't working label Nov 14, 2022
@andygrove
Copy link
Member

There have been changes in how table providers are registered so this is likely a regression. It would be good to get some regression tests added when we fix this.

I am guessing that the fix is simply to add NDJSON to this code in datafusion-cli:

fn create_runtime_env() -> Result<RuntimeEnv> {
    let mut table_factories: HashMap<String, Arc<dyn TableProviderFactory>> =
        HashMap::new();
    table_factories.insert(
        "csv".to_string(),
        Arc::new(ListingTableFactory::new(FileType::CSV)),
    );
    table_factories.insert(
        "parquet".to_string(),
        Arc::new(ListingTableFactory::new(FileType::PARQUET)),
    );
    table_factories.insert(
        "avro".to_string(),
        Arc::new(ListingTableFactory::new(FileType::AVRO)),
    );
    table_factories.insert(
        "json".to_string(),
        Arc::new(ListingTableFactory::new(FileType::JSON)),
    );

@thomas-k-cameron
Copy link
Contributor Author

thomas-k-cameron commented Nov 16, 2022

What do you think about adding several test-cases for the datafusion-cli to check if some features works as intended?

Is there a directory intended for adding test-cases for the cli?(I couldn't find it/)

@timvw
Copy link
Contributor

timvw commented Nov 16, 2022

Seems to have been gone missing in #1010

@timvw
Copy link
Contributor

timvw commented Nov 17, 2022

Seems that this now available as regular "json" format:
https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/datasource/file_format/json.rs#L46

The following works on my machine:

DataFusion CLI v14.0.0
❯ create external table x stored as json location 'datafusion/core/tests/jsons/1.json';
0 rows in set. Query took 0.021 seconds.
❯ select * from x;
+-----+----------------+---------------+------+
| a   | b              | c             | d    |
+-----+----------------+---------------+------+
| 1   | [2, 1.3, -6.1] | [false, true] | 4    |
| -10 | [2, 1.3, -6.1] | [true, true]  | 4    |
| 2   | [2, , -6.1]    | [false, ]     | text |
|     |                |               |      |
+-----+----------------+---------------+------+
4 rows in set. Query took 0.030 seconds.

I have updated PR-4427 such that both JSON and NDJSON are registered by default

@thomas-k-cameron
Copy link
Contributor Author

I'm closing this issue because the problem has been fixed with PR-4427.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants