Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataFrame] Parallel Load into dataframe #6983

Closed
alamb opened this issue Jul 16, 2023 · 14 comments
Closed

[DataFrame] Parallel Load into dataframe #6983

alamb opened this issue Jul 16, 2023 · 14 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Jul 16, 2023

Is your feature request related to a problem or challenge?

When loading data into a DataFusion via SessionContext::read_parquet, DataFrame , only a single core is used even when there are many cores available.

This leads to slower performance, as reported by @mispp on #6908

Reproducer

Create data using

cd datafusion/benchmarks
./bench.sh data tpch10

Then lad the

use std::{io::Error, time::Instant};
use datafusion::prelude::*;
use chrono;

const FILENAME: &str = "/Users/alamb/Software/arrow-datafusion/benchmarks/data/tpch_sf10/lineitem/part-0.parquet";

#[tokio::main]
async fn main() -> Result<(), Error> {
    env_logger::init();
    {
        let _ = _datafusion().await;
    }

    Ok(())
}

pub async fn _datafusion() {
    let _ctx = SessionContext::new();

    let _read_options = ParquetReadOptions { file_extension: ".parquet", table_partition_cols: vec!(), parquet_pruning: None, skip_metadata: None };
    let _df = _ctx.read_parquet(FILENAME, _read_options).await.unwrap();

    let start = Instant::now();
    println!("datafusion start -> {:?}", chrono::offset::Local::now());

    let _cached = _df.cache().await;
    let elapsed = Instant::now() - start;
    println!("datafusion end -> {:?} {elapsed:?}", chrono::offset::Local::now());
}

Cargo.toml

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[package]
name = "perf_test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
env_logger = "0.10.0"

parquet = "40.0.0"
serde = "1.0.163"
serde_json = "1.0.96"
datafusion = "27.0.0"
tokio = "1.0"
chrono = "0.4.26"

Describe the solution you'd like

I would like datafusion to read the parquet file in parallel, using target_partitions config parameter

https://docs.rs/datafusion/latest/datafusion/config/struct.ExecutionOptions.html#structfield.target_partitions

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added the enhancement New feature or request label Jul 16, 2023
@alamb alamb assigned alamb and unassigned alamb Jul 16, 2023
@alamb
Copy link
Contributor Author

alamb commented Jul 17, 2023

I made a POC on #6984 which demonstrates the issue is indeed using more cores to do the write. However, the implementation of doing repartitioning is probably not right -- I think the better approach would be to set the target partitions when writing into memory table

Perhaps this could be done by creating a LogicalPlan::DmlStatement for write and then letting the existing insert machinery work rather than doing a custom "collect".

https://docs.rs/datafusion/latest/datafusion/logical_expr/logical_plan/struct.DmlStatement.html

Marking this as a good first issue as I think the approach will work well and should be able to follow existing patterns, has a reproducer, and was asked for by a customer

@alamb alamb added the good first issue Good for newcomers label Jul 17, 2023
@gobraves
Copy link

@alamb Hello, I'm new to DataFusion, but can I give this issue a try?

@alamb
Copy link
Contributor Author

alamb commented Jul 17, 2023

Thank you @gobraves -- that would be great. Once you have looked around let me know if you have any questions

Basically I would suggest first verifying that running the equivalent SQL with datafusion-cli

create table t;
INSERT INTO t from SELECT * from `data.parquet`

is properly parallelized

Then look at the plan that comes out

INSERT INTO t from SELECT * from `data.parquet`

And try to update DataFrame::cache() to use the same

@gobraves
Copy link

gobraves commented Aug 1, 2023

hi @alamb, I apologize for the delayed response. Based on your tips, I executed the following commands in the CLI and also ran the code you provided to reproduce the issue. I noticed that executing the commands in the CLI was almost 8 times faster than running the code mentioned above, which is consistent with my CPU core count.

Here are the commands I executed in the CLI:

create external table test stored as parquet location 'part-0.parquet';
create table t as select * from test;
explain create table t as select * from test;

In the logical_plan of the explain output, I observed CreateMemoryTable and TableScan. Consequently, I reviewed the code for CreateMemoryTable in the datafusion-cli and the .cache() function, hoping to identify the differences. I noticed that the target_partitions are indeed passed in both cases, but I'm unsure why they are not utilized in .cache(). However, from the commit mentioned in issue #6984 , it seems that the problem is resolved by using repartitioning. Therefore, it appears that the difference lies in one implementation using Partitioning, while the other does not. However, when browsing through the code myself, I couldn't find any relevant settings. If this is the case, could you please provide some hints as to which part of the code this operation occurs?

I have one more question: Do we need to create a new DmlStatement to address this issue or improve the existing one?

Perhaps this could be done by creating a LogicalPlan::DmlStatement for write and then letting the existing insert machinery work rather than doing a custom "collect".

I'm not entirely clear about this statement, and I believe it might be because I haven't fully grasped the problem described above.

@2010YOUY01
Copy link
Contributor

@gobraves Thank you for trying! I also took a look at this issue (and find it pretty difficult to solve 😨 ), hope the following info might be helpful:
Here is an overview of parallel parquet scan

let _df = _ctx.read_parquet(FILENAME, _read_options).await.unwrap();

let _cached = _df.cache().await;

After _ctx.read_parquet(..., a LogicalPlan with TableScan is created and stored inside the dataframe.
Inside _df.cache(), the LogicalPlan will first be converted into a physical plan with ParquetExec node, and then the physical optimizer will try to modify the ParquetExec node's file_groups to make it parallel.

My reproducer:

DataFusion CLI v28.0.0
❯ create external table test stored as parquet location '/Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet';
0 rows in set. Query took 0.064 seconds.
❯ create table t as select * from test;
0 rows in set. Query took 16.364 seconds.
❯ create table t as (select * from test where l_orderkey > 0);
0 rows in set. Query took 3.646 seconds.
❯ explain select * from test;
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                         |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | TableScan: test projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment]                                                                                                                                      |
| physical_plan | ParquetExec: file_groups={1 group: [[Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet]]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment] |
|               |                                                                                                                                                                                                                                                                                                                                                                              |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.009 seconds.
❯ explain select * from test where l_orderkey > 0;
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Filter: test.l_orderkey > Int64(0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|               |   TableScan: test projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], partial_filters=[test.l_orderkey > Int64(0)]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|               |   FilterExec: l_orderkey@0 > 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|               |     ParquetExec: file_groups={12 groups: [[Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:0..13271296], [Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:13271296..26542592], [Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:26542592..39813888], [Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:39813888..53085184], [Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:53085184..66356480], ...]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], predicate=l_orderkey@0 > 0, pruning_predicate=l_orderkey_max@0 > 0 |
|               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The 2nd one is parallelized, explain verbose select ... can be used to see the specific physical optimizer rule to repartition the ParquetExec
https://github.com/apache/arrow-datafusion/blob/a9561a0f06c25f370dc39df08d057db85c4e0c7a/datafusion/core/src/physical_optimizer/repartition.rs#L166
I think there might be some bug inside this function, if parquet_exec.get_repartitioned() inside it gets called, then the ParquetExec should be properly parallelized

This reproducer should have the same root cause as the original one, for the original reproducer, adding a filter to _df can also get it parallelized:

    let _df = _ctx
        .read_parquet(FILENAME, _read_options)
        .await
        .unwrap()
        .filter(col("l_orderkey").gt(lit(0)))
        .unwrap();
// Then can be parallelized

@2010YOUY01
Copy link
Contributor

I made a POC on #6984 which demonstrates the issue is indeed using more cores to do the write. However, the implementation of doing repartitioning is probably not right -- I think the better approach would be to set the target partitions when writing into memory table

This POC and adding predicate seem both suppress the physical optimizer bug in repartition rule by adding another execution node on top of ParquetExec node 🤔

@gobraves
Copy link

gobraves commented Aug 2, 2023

@2010YOUY01 Thank you!
Based on your findings, I also retested the datafusion-cli and arrow-datafusion crate after updating them to version 28.0.0. Here are the results:

DataFusion CLI v28.0.0
❯ create external table test stored as parquet location 'part-0.parquet';
0 rows in set. Query took 0.023 seconds.
❯ create table t as select * from test;
0 rows in set. Query took 14.621 seconds.

DataFusion CLI v28.0.0
❯ create external table test stored as parquet location 'part-0.parquet';
0 rows in set. Query took 0.014 seconds.
❯ create table t as (select * from test where l_linenumber > 0);
0 rows in set. Query took 4.280 seconds.

use chrono;
use datafusion::common::DataFusionError;
use datafusion::prelude::*;
use object_store::local::LocalFileSystem;
use std::{sync::Arc, time::Instant};
use url::Url;

const FILENAME: &str =
    "/home/neo/project_learning/arrow-datafusion/benchmarks/data/tpch_sf10/lineitem/part-0.parquet";

#[tokio::main]
async fn main() -> Result<(), DataFusionError> {
    let _ctx = SessionContext::new();
    let config = _ctx.copied_config();
    for item in config.options().entries().iter() {
        let key = &item.key;
        let value = &item.value;
        println!("{key} {value:?}")
    }
    let local = Arc::new(LocalFileSystem::new());
    let local_url = Url::parse("file://local").unwrap();
    _ctx.runtime_env().register_object_store(&local_url, local);

    let _read_options = ParquetReadOptions {
        file_extension: ".parquet",
        table_partition_cols: vec![],
        parquet_pruning: None,
        skip_metadata: None,
    };
    let _df = _ctx
        .read_parquet(FILENAME, _read_options)
        .await
        .unwrap();

    let start = Instant::now();
    let _cached = _df.cache().await;
    let elapsed = Instant::now() - start;
    println!(
        "datafusion end -> {:?} {elapsed:?}",
        chrono::offset::Local::now()
    );
    Ok(())
}

without filter: 114.913562535s

If the code is modified with filter

 let _df = _ctx
        .read_parquet(FILENAME, _read_options)
        .await
        .unwrap()
        .filter(col("l_linenumber").gt(lit(0)))
        .unwrap();

with filter: 15.583268924s

datafuison-cli script
without filter 14.621s 114.913562535s
with filter 4.280s 15.583268924s

I will need to continue examining the code to understand the specific reason behind this performance difference.

@alamb
Copy link
Contributor Author

alamb commented Aug 3, 2023

I think there is something in the physical planning that assumes the result of the final plan should be in a single partition (or at least it won't expand it when adding additional partioning). because when connecting to a client this is what makes the most sense

I believe this is controlled by ExecutionPlan::benefits_from_input_partitioning: https://github.com/apache/arrow-datafusion/blob/6a2d4a3a254c0495a398608d178496a191450750/datafusion/core/src/physical_plan/mod.rs#L169-L183

ProjectionExec returns false if it is only columns (which is what these queries are doing)

https://github.com/apache/arrow-datafusion/blob/6a2d4a3a254c0495a398608d178496a191450750/datafusion/core/src/physical_plan/projection.rs#L285

So the reason the filter case goes faster is that the filter is that the filter will return true for benefits from repartitioning but the Partition won't.

I wonder if we could somehow add a flag to LogicalProjection / ProjectionExec that says "always benefits from repartitioning somehow and set that flag for the writes 🤔

Alternately, I was thinking the `ExecutionPlan that does the writing could say "I want the input partitioned" and the optimizer would do the right thing. But given the DataFrame API doesn't use an ExecutionPlan for writing it might not work.

Thank you both for pushing on this -- it is going to be awesome to get this working correctly

@alamb
Copy link
Contributor Author

alamb commented Aug 3, 2023

BTW @devinjdangelo has been looking at using ExecutionPlan for dataframes here: #7141

@alamb
Copy link
Contributor Author

alamb commented Dec 11, 2023

This might well be done, I think all that remains is for someone to test / verify that the reproducer now runs in parallel

@marvinlanhenke
Copy link
Contributor

marvinlanhenke commented Dec 27, 2023

@alamb
I ran the same reproducer as stated here: #6983 (comment) and I can report the same results as before (CLI & Dataframe). The issue seems not to be resolved, unfortunately.

Edit:

... I did some debugging on this issue:

When running the query without a filter, we get a plan OutputRequirementExec. By looking at the implementation of fn benefits_from_input_partitioning we can see it returns always vec![false]. This causes the EnforceDistribution optimizer to do nothing and file_groups remains at 1.

When running the query with a filter, we get a plan FilterExec which doesn't have an implementation of fn benefits_from_input_partitioning. So it relies on the default impl of the trait - which returns true. So the EnforceDistribution optimizer is allowed to do its job.

Possible Solution:
Simply remove the implementation on OutputRequirementsExec and also rely on the default impl from the trait?

image

@alamb
Copy link
Contributor Author

alamb commented Dec 28, 2023

Thank you for the follow up @marvinlanhenke

Possible Solution:
Simply remove the implementation on OutputRequirementsExec and also rely on the default impl from the trait?

I think this is likely a great thing to try. @devinjdangelo perhaps you have some more input or ideas to try

@pmcgleenon
Copy link
Contributor

I ran the reproducer #6983 (comment) and didn't see this issue.

  1. generate benchmark data
cd benchmarks
./bench.sh data tpch10
  1. run CLI with query (3.2 seconds) and without query (3.5 seconds)
DataFusion CLI v36.0.0
❯ create external table test stored as parquet location '/Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet';
0 rows in set. Query took 0.115 seconds.

❯ create table t as select * from test;
0 rows in set. Query took 3.527 seconds.
DataFusion CLI v36.0.0
❯ create external table test stored as parquet location '/Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet';
0 rows in set. Query took 0.006 seconds.

❯ create table t as (select * from test where l_linenumber > 0);
0 rows in set. Query took 3.216 seconds.
  1. ran the rust program with query (3.1 seconds) and without query (3 seconds)
    let _df = _ctx
        .read_parquet(FILENAME, _read_options)
        .await
        .unwrap();
        // .filter(col("l_orderkey").gt(lit(0)))
        // .unwrap();
  1. checked the plan output for the presence of file_groups in the physical plan to make it parallel.
❯ explain select * from test;
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | TableScan: test projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment]                                                                                                                                                                                                                                                                                                                                                                                     |
| physical_plan | ParquetExec: file_groups={4 groups: [[Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:0..10165445], [Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:10165445..20330890], [Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:20330890..30496335], [Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:30496335..40661778]]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment] |
|               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.010 seconds.
❯ explain select * from test where l_orderkey > 0;
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Filter: test.l_orderkey > Int64(0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|               |   TableScan: test projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], partial_filters=[test.l_orderkey > Int64(0)]                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|               |   FilterExec: l_orderkey@0 > 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|               |     ParquetExec: file_groups={4 groups: [[Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:0..10165445], [Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:10165445..20330890], [Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:20330890..30496335], [Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:30496335..40661778]]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], predicate=l_orderkey@0 > 0, pruning_predicate=l_orderkey_max@0 > 0, required_guarantees=[] |
|               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.012 seconds.

@alamb this looks ok to me (unless I've missed something). file_groups = 4 means it's loaded in parallel on each of the 4 CPUs available?

@alamb
Copy link
Contributor Author

alamb commented Feb 20, 2024

@alamb this looks ok to me (unless I've missed something). file_groups = 4 means it's loaded in parallel on each of the 4 CPUs available?

I agree -- thank you for checking @pmcgleenon . Let's close this issue and we can open new issues for future improvements if warranted

@alamb alamb closed this as completed Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants