Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement wasm-rdbms interface for standardized database access #1016

Open
jdegoes opened this issue Oct 19, 2024 · 12 comments · May be fixed by #1212
Open

Implement wasm-rdbms interface for standardized database access #1016

jdegoes opened this issue Oct 19, 2024 · 12 comments · May be fixed by #1212
Assignees

Comments

@jdegoes
Copy link
Contributor

jdegoes commented Oct 19, 2024

WASM RDBMS

WASM components do not have access to databases natively. Rather, they must obtain all functionality from their host, through the component model types.

Although there exists a wasi-sql project, which intends to provide a WASI standard for database access, it is still in its infancy and shows no signs of active development.

Rather than await the completion of wasi-sql, WASM startup Fermyon introduced a series of WIT interfaces postgres, sqlite, and mysql in their WASM-native framework Spin, which use a related series of database types. These interfaces are supported by the Spin host environment.

It is not clear if Fermyon’s interfaces will become standard or not, or if work will be resumed on the earlier (but far less complete) wasi-sql. Rather than wait for the lengthy process of standardization to produce something useful, Golem needs to support some level of database access now, so that developers using databases have options.

This specification calls for the development of a new wasm-rdbms interface, which is designed to support features common to mainstream databases, as well as a corresponding set of implementations in Golem, powered by a low-level Rust library.

💡

Currently, this design only facilitates connection with existing databases, rather than supporting the newer paradigm of provisioning databases based on their demonstrated usage inside the code. However, as connecting to existing databases is something that needs to be supported regardless, it makes sense to first start with this feature.

Overview

The basic approach of this language-agnostic interface to databases is to have a generalized core upon which all specific database interfaces can depend.

This allows new databases to be introduced cheaply, while still respecting the fact that developers must necessarily program against specific databases, due to differences in column types, SQL syntax, and other features.

Common Elements

Common elements belong to the same generic package and have generic interface names.

They include the following, as well as related supporting types:

  • Column Types. The types possible for a column. This is a superset of the possible types for any given database.
  • Columns. The combination of a column type, with a name, and column-specific flags, such as whether or not the column is nullable, unique, auto-generated, etc.
  • Value. A value is a value of some column type.
  • Row. A row is a collection of values.
  • Statement. A statement has SQL (text) as well as variables and values for those variables (prepared statement).
  • Result Set. A result set is the result of executing a query. It may be iterated over.

Unique Elements

Unique elements belong to database-specific packages, with database-specific interface names. They represent aspects of a database interface that benefit from tying to one specific database.

  • Connection. A connection may be opened, with a database-specific connection information; connections allow both query (which return result set) and execution (which returns integer).

WIT Sketch

Common Elements

Something like the following should work for most databases (other than SQLite):

type node-index = u32;
type table-name = string;

/// Node in the db-data-type structure
variant db-column-type-primitive{
    integer(option<u8>),
    decimal(u8, u8),
    float,
    boolean,
    datetime,
    interval,
    string(option<u32>),
    text,
    binary(option<u32>),
    blob,
    enum(list<string>),
    json,
    xml,
    uuid,
    spatial,
}

variant db-column-type {
    primitive(db-column-type-primitive),
    array(list<option<u32>>, db-column-type),
}

flags db-column-type-flags {
    // Key constraints
    primary-key,     // Single or part of composite primary key
    foreign-key,     // References another table's key
    unique,          // Unique constraint
    
    // Core properties
    nullable,        // If not present, column is NOT NULL
    generated,       // Value is computed rather than stored directly
    auto-increment,  // MySQL AUTO_INCREMENT/PostgreSQL SERIAL/SQL Server IDENTITY
    
    // Common behaviors
    default-value,   // Has a default value defined
    indexed,         // Participates in an index
}

/// Data types for database columns
record db-column-type-meta {
    db-type:     db-column-type,
    name:        string,
    flags:       db-column-type-flags,
    foreign-key: option<table-name>,
}

/// Value descriptor for a single database value
variant db-value-primitive {
    integer(s64),
    decimal(string),
    float(float64),
    boolean(bool),
    datetime(u64),
    interval(u64),
    string(string),
    text(string),
    binary(list<u8>),
    blob(list<u8>),
    enum(u32),
    json(string),
    xml(string),
    uuid(u128),
    spatial(list<float64>),
    other(string, list<u8>),
    db-null,
}

variant db-value {
    primitive(db-value-primitive),
    array(list<db-value-primitive>), // Flatteneed
}

/// A single row of values
record db-row {
    values: list<db-value>,
}

/// A potentially very large and lazy stream of rows:
resource db-result-set {
    get-column-metadata: func() -> list<db-column-type-meta>;
    get-next: func() -> option<list<db-row>>;
}

Unique Elements

Something like the following should work for databases that are not SQLite:

interface postgres {
  resource db-connection {
    open: static func(address: string) -> result<db-connection, error>;

    query: func(statement: string, params: list<db-value>) -> result<db-result-set, error>;

    execute: func(statement: string, params: list<db-value>) -> result<u64, error>;
  }
}

interface mysql {
  resource db-connection {
    open: static func(address: string) -> result<db-connection, error>;

    query: func(statement: string, params: list<db-value>) -> result<db-result-set, error>;

    execute: func(statement: string, params: list<db-value>) -> result<u64, error>;
  }
}

SQLite will require its own interface, perhaps inspired by the one in Spin.

Host Implementation

The host implementation will live in the worker-executor service and will use best-in-class async Rust libraries to provide implementations of the specified interfaces.

One slight complication is that a connection in the guest must not map to a connection in the host. The reason for this is that on a single worker-executor node, there may exist thousands of workers, and these workers should not maintain their own independent connections to the database.

Therefore, in order to improve performance, it is necessary to perform connection pooling in the host, so that if only 20 concurrent connections are needed across all workers, then only 20 concurrent connections are created and maintained.

Note that if the wasm-rdbms interface improves in power to allow interleaving database updates with side-effects as part of a single database transaction, then pooling becomes much more difficult. However, with the current interface, the only two things a worker can do with a connection are query and execute, which atomically return results; and if those results are small enough to fit in memory, then there is no need to keep an underlying connection dedicated to the worker (instead, it can be reused for another worker).

@jdegoes
Copy link
Contributor Author

jdegoes commented Nov 6, 2024

/bounty $5000

Copy link

algora-pbc bot commented Nov 6, 2024

💎 $5,000 bounty • Golem Cloud

Steps to solve:

  1. Start working: Comment /attempt #1016 with your implementation plan
  2. Submit work: Create a pull request including /claim #1016 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to golemcloud/golem!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @justcoon Nov 6, 2024, 3:01:20 PM WIP

@justcoon
Copy link
Contributor

justcoon commented Nov 6, 2024

/attempt #1016

@palash25
Copy link

palash25 commented Nov 6, 2024

The host implementation will live in the worker-executor service

I am assuming you mean golem-worker-executor-base directory because golem-worker-executor simply starts up the server with all the services defined in the former directory

and will the .wit files reside in this repo https://github.com/golemcloud/golem-wit/blob/main/wit ?

@jdegoes
Copy link
Contributor Author

jdegoes commented Nov 6, 2024

@palash25 Please don't work on this one, it's already taken. I will have new bounties up soon.

@debaa98
Copy link

debaa98 commented Nov 8, 2024

hey @justcoon are you working on it ?

@justcoon
Copy link
Contributor

justcoon commented Nov 8, 2024

hi @debaa98 , yes I am working on it

@volfcan
Copy link

volfcan commented Nov 9, 2024

@palash25 Please don't work on this one, it's already taken. I will have new bounties up soon.

only one person can attempt it at the same time?

@jdegoes
Copy link
Contributor Author

jdegoes commented Nov 11, 2024

@volfcan Yes, please! But be on the look out for many more bounties in the next 2 months.

@debaa98
Copy link

debaa98 commented Nov 28, 2024

hey @justcoon it seems quite inactive so will i take the issue ?

@vigoo
Copy link
Contributor

vigoo commented Nov 28, 2024

hey @justcoon it seems quite inactive so will i take the issue ?

I believe @justcoon is still working on this, and will open PRs once he is done.

@justcoon
Copy link
Contributor

justcoon commented Nov 28, 2024

Hi, @debaa98 , I am working on this issue

This was referenced Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants