Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine refactor #353

Merged
merged 1 commit into from
Nov 16, 2023
Merged

Engine refactor #353

merged 1 commit into from
Nov 16, 2023

Conversation

brennanjl
Copy link
Collaborator

@brennanjl brennanjl commented Oct 16, 2023

This is still very much WIP, but I wanted to push this up in case anyone was interested. Don't be intimidated by the current amount of "new code"; I have copied a lot of stuff to be able to modify it freely or make the imports sensible.

Things like the SQL analyzer, the dataset DB (which is a SQL db that understands schema metadata), and types are all directly copied over, and mostly unchanged.

What has been done so far

2pc (internal/engine2/commit)

So far, the bulk of the work has been in solving the issue of database deployments and drops not conforming to the 2pc protocol. I've had 4 different attempts on this, but I think the current one is in the right direction.

Very naively, I have identified what operations in Kwil are "2-phase-commit-able". Right now, this is only deploy, drop, and execute DML, however this will expand to include migrations. This layer is responsible for handling making these conform to Kwil's 2pc protocol.

The main impetus for creating this as a separate layer is that it allows us to build anything else that we want within the engine without having to worry about the 2pc protocol, which is nice.

There are still two things I do not like about this package:

1. Databases cannot be deployed and dropped in the same session.

There isn't really a good reason for this, except that there are a lot of edge cases that make it hard to implement. For the sake of time, I figured this is an ok tradeoff to have for an initial unreleased version, however I would like to get it fixed before we actually use this.

2. The code is pretty complex

While I really like the interfaces used by this package, the code itself is fairly complex. In particular, it feels weird that each operation has completely different code depending on its phase.

execution (internal/engine2/execution)

This is still very much more a WIP than the commit layer, but it is at least a sensible first attempt for executing logic. It is sort've similar to the internal/engine/execution package, except it is a LOT easier to add new functionality.

There are still a lot of unanswered questions regarding where action parsing should take place, where we should handle argument evaluation, and where + how we should handle "modifiers" (I have some thoughts on this that I am testing out).

A few things that this does differently than its predecessor:

1. Namespaces

We sort've have a concept of "namespaces" evolving within Kwil. Right now, it is only possible to create a namespace with an extension, However, we have talked quite a bit about having other databases be callable if they are on the same node. Furthermore, we have been talking a bit about how we can give the user control of some sort of token functionality (essentially, allowing them to modify the account store).

While we aren't implementing these features here, it is quite easy to see how we could create an account store "namespace" (or any other type, for that matter) that could be included as part of Kuneiform's "standard library", and simply make that accessible here.

There are still some nuances I need to work out for determining which namespaces are and are not accessible, but the general idea is there.

2. DDL

While I am still going back and forth on whether this should be done this way, this package also handles database deployments and drops. It treats them like any other operation. I did this as more of an experiment, with database migrations in mind; if we do want to allow some sort of programmability / logic for migrations, we might want to handle that here. Conceptually, those are the same as DDL statements, and so I threw those in here.

I'm leaning towards taking these out, but haven't decided yet.

Things that still need to be done

Higher Level Engine Package

This does not yet include a higher level package that ties all of these together for a consumer. I think a mistake I made with the old engine was that I started high level, and progressed downwards. This created a really non-sensical structure, so I tried to reverse that here

internal/sql

Our SQLite clients are really a mess, and should be cleaned up. There should really only be 1 client, and there are some changes that should be made to allow for better testing, as well as having a more sensible interface.

I have copied the old package to allow me to hack around it for testing new things, but the internal/sql2 package currently does not mean anything, and is still yet to be totally overhauled.

Conclusion

This is still very much an active construction site, and thus a lot of work to do, but I think I am on the right track with things. The hardest problem (2pc) has a decent solution, as well as the second hardest problem (execution) has a plausible framework for how it can be solved.

Overall, I'm confident that this will be a much better structure with which we can delineate responsibilities, allowing us to more easily add features to our engine in the future. I am also confident that it will be quite easy to plug the end result of this into our current system with minimal friction.

@brennanjl
Copy link
Collaborator Author

This includes a potential way of handling atomic commits. I need to work on testing it, but in case anyone is curious about what we were discussing in slack, I have something here that I think works. It can be found in internal/engine2/registry.

Ignore a lot of the other stuff, obviously still a lot to do wrt deleting stuff, tying it all together, and testing.

@brennanjl brennanjl force-pushed the refactor-engine branch 3 times, most recently from 230b253 to d2a0dfb Compare November 7, 2023 05:38
@brennanjl
Copy link
Collaborator Author

brennanjl commented Nov 7, 2023

At this point things are building. Acceptance tests are still not passing; I sort've know why, but will spend some more time tomorrow checking it out more. Regardless, it is likely ready for review, as anything that changes will be a minor bug fix.

There are three main areas that got affected. Obviously the refactor was aimed at the engine, but since the engine really spans the whole system, almost everything got touched.

engine

Pretty much everything in engine, besides sqlanalyzer and types, is new. The majority of it is found in engine/execution. I am considering moving this package to just engine, since everything else is really just utilities to help engine/execution do its job.

sql

sql was totally rethought, and broken down into 3 packages. If we ended up choosing to use Postgres, it would be quite easy to switch SQLite with PG, since all of the things that Postgres would handle for us are now handled by sql.

sql/sqlite

sql/sqlite is a simplified version of the previous sql/sqlite package. It does not handle connection pooling, and is also quite a bit simpler.

sql/pools

sql/pools is responsible for managing connection pooling and execution lifetimes. If you're looking for a single replacement for our old sql/client + sql/sqlite, this is likely it.

sql/registry

sql/registry handles the creation of new databases, with regards to the systems larger commit process. It also provides an additional layer of abstraction for executing/querying, which is consumed by the engine.

sessions

sessions have been vastly simplified. It is conceptually the same, but there is no more 2pc process. It has simple Begin and Commit methods, and the committables also have much simpler interfaces.

Comment on lines +104 to +106
deleteTestDir()
defer deleteTestDir()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these two both intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. If the temp dir is not empty, it will empty it before beginning the test

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However we can probably use an in-memory connection here

Copy link
Member

@jchappelow jchappelow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connection pools are the tricky pieces I think. Some thoughts about that.
Will look through everything else. Can give more concrete suggestions if needed.

internal/sql/pools/pool.go Outdated Show resolved Hide resolved
internal/sql/pools/pool.go Outdated Show resolved Hide resolved
internal/sql/pools/pool.go Outdated Show resolved Hide resolved
internal/sql/pools/pool.go Outdated Show resolved Hide resolved
internal/sql/pools/pool.go Outdated Show resolved Hide resolved
internal/sql/pools/pool.go Outdated Show resolved Hide resolved
internal/sql/pools/returnable.go Outdated Show resolved Hide resolved
internal/sql/pools/pool.go Outdated Show resolved Hide resolved
internal/sql/pools/pool.go Outdated Show resolved Hide resolved
internal/sql/pools/pool.go Outdated Show resolved Hide resolved
internal/sql/registry/registry.go Show resolved Hide resolved
internal/sql/registry/registry.go Outdated Show resolved Hide resolved
// readerMu is the mutex for readers.
// readers will call it and immediately return.
// `Commit` will call it and return when it is finished.
// This is to prevent readers from being opened while a commit is in progress.
Copy link
Member

@jchappelow jchappelow Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that this allows commit to begin with existing readers running?

I'm trying to work out the immediate-unlock goals.

Can we document the behavior of the Registry methods that we are seeking?
Apparently: Callers of Query and Get should wait if a Commit is in progress.
But a Commit can start if reads are in progress?

Are we really just trying to make it easier for Commit to acquire the write lock? The RLock docs state:

// RLock locks rw for reading.
//
// It should not be used for recursive read locking; a blocked Lock
// call excludes new readers from acquiring the lock. See the
// documentation on the RWMutex type.

In particular, the "excludes new readers" part. Does that achieve the goal? (RLock and defer RUnlock)

internal/sql/registry/registry.go Outdated Show resolved Hide resolved
Copy link
Member

@jchappelow jchappelow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minor things in a pending review I had going. Doing more, but wanted to submit this before you got going again today.

internal/sql/sqlite/connection.go Outdated Show resolved Hide resolved
internal/sql/sqlite/connection.go Outdated Show resolved Hide resolved
Comment on lines 178 to 182
c.inUse = false
c.conn.SetInterrupt(nil)

results.addCloser(deferFunc)
// if c.isReadonly() {
// return prepared.Finalize()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like the c.inUse write is safe. The Result.Mutex isn't the same as Connection's.

Can remove this commented code?

}
defer res.Finish()

var value []byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this and the ok decl, and do value, ok := below.

internal/sql/sqlite/result.go Outdated Show resolved Hide resolved
internal/sql/sqlite/result.go Outdated Show resolved Hide resolved
Comment on lines 125 to 127
defer writer.Return()

return writer.Savepoint()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK that the returned sql.Savepoint internally has a connection that has already been "returned"? It's not gonna be closed, but it's not gonna be exclusive anymore either.

Copy link
Member

@jchappelow jchappelow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have had my eyes on 90% of it now. Will circle back after pool/registry is update and I've done some tests.

Comment on lines 63 to 66
nonceBytes := make([]byte, 8)
binary.LittleEndian.PutUint64(nonceBytes, uint64(s.Nonce))

bts = append(bts, nonceBytes...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit but could do this I think:

Suggested change
nonceBytes := make([]byte, 8)
binary.LittleEndian.PutUint64(nonceBytes, uint64(s.Nonce))
bts = append(bts, nonceBytes...)
binary.LittleEndian.AppendUint64(bts, uint64(s.Nonce))

internal/accounts/committable.go Outdated Show resolved Hide resolved
Comment on lines +89 to +101
defer func() {
if err != nil {
err2 := g.datastore.Delete(ctx, schema.DBID())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should only defer this if Create succeeds below.

internal/engine/metadata/store.go Outdated Show resolved Hide resolved
@@ -40,16 +38,68 @@ type Savepoint interface {

type Session interface {
Delete() error
GenerateChangeset() (Changeset, error)
GenerateChangeset(ctx context.Context) (Changeset, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type Session interface {
	Delete() error
	ChangesetID(ctx context.Context) ([]byte, error)
}

Looking at the uses for session, I don't see a reason for a separate Changeset interface and the two step GenerateChangeset() -> ID() process rather than a single ChangesetID() method. It just needs to be documented that it should only be called once per session (like Delete).

Comment on lines 73 to 75
err = u.engine.DeleteDataset(ctx, dbid, tx.Sender)
if err != nil {
return resp(price), fmt.Errorf("failed to drop dataset: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something we should come back to later is this oddity of returning some meaningful value when error is not nil. The accepted contract with Go functions is that if err is not nil, then the other returns are meaningless and should not be referenced for any purpose. There are probably some exceptions here and there, but this is generally best practice. Likely solution is to add Code uint32; Msg string fields to ExecutionResponse, and only return non-nil error before any spending, if at all. Otherwise we'd return an "ErrTxExec" type for error that can provide the spent gas and code.
Probably should wait until we have gas, and progressive gas consumption that depends on the nature of the engine's actions.

internal/engine/execution/procedure.go Outdated Show resolved Hide resolved
internal/engine/execution/procedure.go Outdated Show resolved Hide resolved
internal/engine/execution/procedure.go Outdated Show resolved Hide resolved
// ForEachFile calls fn for each file in path.
// It passes the file's name (without the path) to fn.
func (f *FS) ForEachFile(path string, fn func(string) error) error {
return filepath.Walk(path, func(path string, info os.FileInfo, err error) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, but if we do this for on-demand list datasets, we should probably use filepath.WalkDir since it is supposed to be more efficient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now only read from the disk during startup and commits (where it needs to be atomic). Calls from the user now read from in-memory.

@brennanjl
Copy link
Collaborator Author

@jchappelow I have addressed all of the requested changes. Overall, things stayed more or less the same, but there were a few different changes that made things way easier:

  1. No more sql/pools. A lot of the complexity in pools came from the fact that I was trying to manage execution lifetime across two packages. pools does not handle any of that now (except it can forcefully close queries). It got moved into sql/sqlite because it is so much simpler now.
  2. sql/registry is now responsible for setting execution timeouts for read-only queries.

Copy link
Contributor

@Yaiba Yaiba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just spotted few minor issues; only a couple of packages I actually followed the logic, mostly just general stuff

cmd/kwild/server/build.go Outdated Show resolved Hide resolved
@@ -563,6 +572,8 @@ func (a *AbciApp) EndBlock(e abciTypes.RequestEndBlock) abciTypes.ResponseEndBlo
panic(fmt.Sprintf("failed to finalize validator updates: %v", err))
}

a.blockHeight = e.Height
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. It needs the block height in Commit as an idempotency key

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've set this in BeginBlock, thus you'll get this field unchanged right?

internal/engine/execution/metadata.go Show resolved Hide resolved
internal/engine/execution/procedure.go Show resolved Hide resolved
internal/engine/types/schema.go Show resolved Hide resolved
internal/sql/registry/filesystem.go Show resolved Hide resolved
internal/sql/registry/registry.go Show resolved Hide resolved
internal/sql/sqlite/changeset.go Show resolved Hide resolved
internal/sql/sqlite/connection.go Show resolved Hide resolved
internal/sql/sqlite/pool.go Outdated Show resolved Hide resolved
@@ -51,3 +62,62 @@ func (f *defaultFilesystem) Remove(path string) error {
func (f *defaultFilesystem) Rename(oldpath string, newpath string) error {
return os.Rename(oldpath, newpath)
}

// the below snippet is from MinIO's https://github.com/minio/minio/blob/38f35463b7fe07fbbe64bb9150d497a755c6206e/cmd/xl-storage.go#L127
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm I don't think we need this exact snippet, since we have more control on the pathName, and I assume we won't run on window

Comment on lines +67 to +69
// Signer returns the public key of the sender of the transaction.
func (s *ScopeContext) Signer() []byte {
return s.execution.Caller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see Signer used anywhere, and I'm uncertain of it should return s.execution.Signer instead of Caller, which is the serialized types.User

Copy link
Collaborator Author

@brennanjl brennanjl Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get shook out once we merge in the new account identifier changes.

Signer and Caller will be the same thing. Right now, public keys own schemas, so we need to identify signers as public keys specifically, while @caller has the additional metadata (key type, address type, etc). These will become the same thing though, once we make the account identifier changes

Copy link
Member

@jchappelow jchappelow Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But to clarify the upcoming identifier changes as they pertain to engine, the executionContext.Sender is removed and caller will be the new "identifier" (address or pubkey) rather than serialization of this types.User struct?
Is there still any need for public key access in kuneiform or is everything just reduced to one @caller and no more address() or @caller_address?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, signer is currently used for deploying databases, dropping databases, and the owner modifier, while @caller is the serialized user type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But to clarify the upcoming identifier changes as they pertain to engine, the executionContext.Sender is removed and caller will be the new "identifier" (address or pubkey) rather than serialization of this types.User struct?
Is there still any need for public key access in kuneiform or is everything just reduced to one @caller and no more address() or @caller_address?

Correct. There will be no more need for public key in Kuneiform; it will all just be the user identifier.

Copy link
Member

@jchappelow jchappelow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🥳
Would you like to write a message for the squashed commit? Normally I'd take it upon myself to glean from PR description and/or comments plus the messages of the commits being squashed, but there this one has evolved a lot. Even just a couple sentences would be enough IMO, but I thought you might want to tweak what's in this comment after the most recent changes: #353 (comment)

@brennanjl
Copy link
Collaborator Author

Yep will do that now

…features and improvements.

Among these are:
- Proper handling of transactionality for database deployments and drops
- Simpler generation of app hashes
- Simpler code for determining action logic, making future improvements easier
- Replacing the 2pc protocol with a much simpler system based on idempotency
- Better concurrency support for databases
- Better execution lifetime handling for long running queries (to prevent writer starvation)
- An overall simplification of the entire Kwil system, reducing on net ~5000 lines of code.
- And other improvements
@brennanjl brennanjl merged commit b6deb51 into main Nov 16, 2023
2 checks passed
@brennanjl brennanjl deleted the refactor-engine branch November 16, 2023 20:22
brennanjl added a commit that referenced this pull request Feb 26, 2024
…features and improvements. (#353)

Among these are:
- Proper handling of transactionality for database deployments and drops
- Simpler generation of app hashes
- Simpler code for determining action logic, making future improvements easier
- Replacing the 2pc protocol with a much simpler system based on idempotency
- Better concurrency support for databases
- Better execution lifetime handling for long running queries (to prevent writer starvation)
- An overall simplification of the entire Kwil system, reducing on net ~5000 lines of code.
- And other improvements
brennanjl added a commit that referenced this pull request Feb 26, 2024
…features and improvements. (#353)

Among these are:
- Proper handling of transactionality for database deployments and drops
- Simpler generation of app hashes
- Simpler code for determining action logic, making future improvements easier
- Replacing the 2pc protocol with a much simpler system based on idempotency
- Better concurrency support for databases
- Better execution lifetime handling for long running queries (to prevent writer starvation)
- An overall simplification of the entire Kwil system, reducing on net ~5000 lines of code.
- And other improvements
jchappelow pushed a commit that referenced this pull request Feb 26, 2024
…features and improvements. (#353)

Among these are:
- Proper handling of transactionality for database deployments and drops
- Simpler generation of app hashes
- Simpler code for determining action logic, making future improvements easier
- Replacing the 2pc protocol with a much simpler system based on idempotency
- Better concurrency support for databases
- Better execution lifetime handling for long running queries (to prevent writer starvation)
- An overall simplification of the entire Kwil system, reducing on net ~5000 lines of code.
- And other improvements
brennanjl added a commit that referenced this pull request Feb 26, 2024
…features and improvements. (#353)

Among these are:
- Proper handling of transactionality for database deployments and drops
- Simpler generation of app hashes
- Simpler code for determining action logic, making future improvements easier
- Replacing the 2pc protocol with a much simpler system based on idempotency
- Better concurrency support for databases
- Better execution lifetime handling for long running queries (to prevent writer starvation)
- An overall simplification of the entire Kwil system, reducing on net ~5000 lines of code.
- And other improvements
brennanjl added a commit that referenced this pull request Feb 26, 2024
…features and improvements. (#353)

Among these are:
- Proper handling of transactionality for database deployments and drops
- Simpler generation of app hashes
- Simpler code for determining action logic, making future improvements easier
- Replacing the 2pc protocol with a much simpler system based on idempotency
- Better concurrency support for databases
- Better execution lifetime handling for long running queries (to prevent writer starvation)
- An overall simplification of the entire Kwil system, reducing on net ~5000 lines of code.
- And other improvements
brennanjl added a commit that referenced this pull request Feb 26, 2024
…features and improvements. (#353)

Among these are:
- Proper handling of transactionality for database deployments and drops
- Simpler generation of app hashes
- Simpler code for determining action logic, making future improvements easier
- Replacing the 2pc protocol with a much simpler system based on idempotency
- Better concurrency support for databases
- Better execution lifetime handling for long running queries (to prevent writer starvation)
- An overall simplification of the entire Kwil system, reducing on net ~5000 lines of code.
- And other improvements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants