feat: Integrate datafusion #93

scsmithr · 2022-09-10T23:09:04Z

Integrates datafusion as our query execution engine.

Existing sqllogictest tests pass (even though we don't really have a lot).

Notable changes:

Removes the lemur and sqlengine from the main path. These crates will be deleted. lemur is obviated by datafusion, and sqlexec replaces sqlengine. The structure of the session in sqlexec is also a bit more amenable to following implicit transaction semantics as it relates to the Postgres protocol.
Removes options for starting GlareDB with an embedded RocksDB. It would have been difficult to try to keep that in. Currently everything is stored in memory, and is lost on shutdown.
Changes pgsrv to stream back arrow record batches to the client instead of buffering everything in memory.

I've noted some peculiarities/future enhancements below.

scsmithr · 2022-09-14T17:58:08Z

crates/arrowstore/src/timestamp.rs

@@ -0,0 +1,47 @@
+//! Transaction timestamp utilities.


This module was added to explore what client-generated timestamp would look like when interacting with ArrowStore. This isn't currently being used, as the datafusion stuff doesn't make use of any of the ArrowStore stuff.

scsmithr · 2022-09-14T18:00:22Z

crates/pgsrv/src/codec.rs

+        match scalar {
+            ScalarValue::Boolean(Some(v)) => write!(buf, "{}", if v { "t" } else { "f" }),
+            scalar => write!(buf, "{}", scalar), // Note this won't write null, that's checked above.
+        }


There may be alternate formats we want to use in the future, but the default Display impls for most of the scalar values works for now.

Eventually we may want to determine a subset of the arrow types we want to support. E.g. we probably don't care about Struct scalars.

scsmithr · 2022-09-14T18:02:20Z

crates/pgsrv/src/types.rs

@@ -1,4 +1,4 @@
-use lemur::repr::value::{Value, ValueRef, ValueType};
+use lemur::repr::value::ValueType;


There's still a reference to lemur here. I didn't want to get carried away with tweaking this module to more accurately support arrow types, so left it in for now.

scsmithr · 2022-09-14T18:03:45Z

crates/sqlexec/src/catalog.rs

+
+#[derive(Clone, Default)]
+pub struct SchemaCatalog {
+    tables: Arc<RwLock<HashMap<String, Arc<dyn TableProvider>>>>,


We'll eventually want this to use a concretely typed enum for tables so that we can more easily distinguish between mutable and immutable (external) tables.

scsmithr · 2022-09-14T18:04:51Z

crates/sqlexec/src/executor.rs

+                let stream = self.session.execute_physical(physical)?;
+                Ok(ExecutionResult::Query { stream })
+            }
+            other => Err(internal!("unimplemented logical plan: {:?}", other)),


Left for future iterations.

scsmithr · 2022-09-14T18:06:42Z

crates/sqlexec/src/session.rs

+        &self,
+        plan: DfLogicalPlan,
+    ) -> Result<Arc<dyn ExecutionPlan>> {
+        let plan = self.state.create_physical_plan(&plan).await?;


Datafusion does optimization here. We can extend the query planner to add custom node types and optimizations in the future if we need to.

scsmithr · 2022-09-14T18:07:41Z

crates/sqlexec/src/session.rs

+        let table = table
+            .as_any()
+            .downcast_ref::<MemTable>()
+            .ok_or_else(|| internal!("cannot downcast to mem table"))?;


See comment about using enum for tables. Right now we're assuming the only mutable table is the memory table impl.

RustomMS

🚀

RustomMS · 2022-09-15T17:50:29Z

crates/arrowstore/src/timestamp.rs

+/// Every timestamp is a u64 serialized to its big-endian representation.
+#[derive(Debug)]
+pub struct SingleNodeTimestampGen {
+    ts: AtomicU64,


Will these timestamps be tracking the Unix Epoch time? While these timestamps will always be increasing we should keep in mind most Unix systems use an i64 for time. Not a big deal but just something to keep in mind in general about how we are storing time (and how to store time of the past in other places in the db).

I wasn't planning for this one to track unix time. I don't plan on this making it to production in any way.

Evenutally we'll want something like this: https://github.com/GlareDB/glaredb/blob/b30db1e7ee18a0fdc321871a3d0f25a492ba0b4a/crates/diststore/src/accord/timestamp.rs. Each timestamp is totally ordered, and includes the node id, unix time, and a logical time. This was part of one of my previous iterations on replications.

justinrubek

Very nice. I see a good amount of things that overlap with some things I had to remove from #63. It'd be good for me to try to use those structs/enums elsewhere

justinrubek · 2022-09-15T20:58:28Z

crates/sqlexec/src/catalog.rs

+
+    fn schema_names(&self) -> Vec<String> {
+        let schemas = self.schemas.read();
+        schemas.iter().map(|(name, _)| name).cloned().collect()


Suggested change

schemas.iter().map(|(name, _)| name).cloned().collect()

schemas.keys().cloned().collect()

Opened a followup: #99

justinrubek · 2022-09-15T20:58:54Z

crates/sqlexec/src/catalog.rs

+
+    fn table_names(&self) -> Vec<String> {
+        let tables = self.tables.read();
+        tables.iter().map(|(name, _)| name).cloned().collect()


Suggested change

tables.iter().map(|(name, _)| name).cloned().collect()

tables.keys().cloned().collect()

justinrubek · 2022-09-15T21:10:51Z

crates/sqlexec/src/datasource.rs

+use datafusion::logical_plan::Expr;
+use datafusion::physical_plan::{memory::MemoryExec, ExecutionPlan};
+use dfutil::cast::cast_record_batch;
+use parking_lot::RwLock;


Are there benefits/drawbacks to using this synchronous lock over tokio::sync::RwLock? I see it is being written to inside a synchronous function but read in an async one

It really depends. It's correct to use either a tokio or parking lot lock here.

The benefit/drawback comes down to have long the lock is held, and if that ends up blocking other threads. If the inserts do take a long time (like on the order of 100s of ns) then it would make sense to use a tokio lock and change the insert to be async. I just always default to using parking lot/std then move to tokio if necessary depending on performance.

I would only default to use tokio from the start if there's an intent on passing around guards in async code.

scsmithr added the experiment label Sep 10, 2022

scsmithr force-pushed the df branch from 1dc6065 to 696a868 Compare September 13, 2022 16:46

scsmithr added 2 commits September 13, 2022 11:47

Get a feel for datafusion

01d6bba

Add some types

cc9ba94

scsmithr force-pushed the df branch from 696a868 to cc9ba94 Compare September 13, 2022 16:47

scsmithr removed the experiment label Sep 13, 2022

scsmithr changed the title ~~Datafusion experiment~~ Integrate datafusion Sep 13, 2022

scsmithr added 4 commits September 13, 2022 21:43

Add executor, start tying into pgsrv

abed43a

Return record batches through the pg protocol

5dc23bc

Wire up everything in main

5d77c45

Fix warnings

1d302e6

scsmithr changed the title ~~Integrate datafusion~~ feat: Integrate datafusion Sep 14, 2022

scsmithr marked this pull request as ready for review September 14, 2022 17:51

scsmithr commented Sep 14, 2022

View reviewed changes

scsmithr requested review from justinrubek and RustomMS September 14, 2022 18:12

scsmithr linked an issue Sep 14, 2022 that may be closed by this pull request

lemur: Explore replacing our query stuff with datafusion #92

Closed

scsmithr mentioned this pull request Sep 14, 2022

feat: External (parquet) tables #97

Merged

justinrubek mentioned this pull request Sep 15, 2022

feat: Implement raft via gRPC #63

Merged

RustomMS approved these changes Sep 15, 2022

View reviewed changes

scsmithr merged commit fc1839f into main Sep 15, 2022

scsmithr deleted the df branch September 15, 2022 21:07

justinrubek reviewed Sep 15, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Integrate datafusion #93

feat: Integrate datafusion #93

scsmithr commented Sep 10, 2022 •

edited

Loading

scsmithr Sep 14, 2022

scsmithr Sep 14, 2022

scsmithr Sep 14, 2022

scsmithr Sep 14, 2022

scsmithr Sep 14, 2022

scsmithr Sep 14, 2022

scsmithr Sep 14, 2022

RustomMS left a comment

RustomMS Sep 15, 2022

scsmithr Sep 15, 2022

justinrubek left a comment

justinrubek Sep 15, 2022

scsmithr Sep 15, 2022

justinrubek Sep 15, 2022

justinrubek Sep 15, 2022

scsmithr Sep 15, 2022

		@@ -1,4 +1,4 @@
		use lemur::repr::value::{Value, ValueRef, ValueType};
		use lemur::repr::value::ValueType;

	schemas.iter().map(\|(name, _)\| name).cloned().collect()
	schemas.keys().cloned().collect()

	tables.iter().map(\|(name, _)\| name).cloned().collect()
	tables.keys().cloned().collect()

feat: Integrate datafusion #93

feat: Integrate datafusion #93

Conversation

scsmithr commented Sep 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RustomMS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinrubek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scsmithr commented Sep 10, 2022 •

edited

Loading