Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

web3.js: (draft) Principles for a rewrite #1111

Closed
steveluscher opened this issue May 18, 2022 · 25 comments
Closed

web3.js: (draft) Principles for a rewrite #1111

steveluscher opened this issue May 18, 2022 · 25 comments

Comments

@steveluscher
Copy link
Collaborator

steveluscher commented May 18, 2022

Problem

I believe that the appetite for a rethink of the web3.js library's architecture has reached a tipping point.

These are some of the problems that a rewrite would aim to solve:

  1. The current monolithic design means that the library delivers poorly on functionality-per-byte. We force users to download code that they are likely never to need, and can't tree-shake away.
  2. Opaque helpers like sendTransaction make it difficult for people to customize behavior. They both assume too much responsibility (ie. do too much, like fetching recent blockhashes and signing transactions) and mutate their inputs (eg. calling sendTransaction overwrites a transaction's ‘recent blockhash’).
  3. The use of JavaScript classes to encapsulate state thwarts our best efforts to create typesafe applications. For example, what's the difference between a signed transaction, an unsigned transaction, and a nonce-based transaction? In the current library, these are just Transaction instances that may or may not be in the configuration you hope, and may throw runtime errors if they're not.

Proposed Solution

This isn't even a bad idea.

image

I've experienced a lot of ground-up rewrites in my time, and the approach that I've known to consistently end in success is:

  1. Create the ideal API.
  2. Replace the implementation of the existing API using the new one under the hood, incrementally.
  3. Once you've proven that you can reimplement the legacy API using the modern one, freeze development on the legacy API and focus on giving people tooling and tutorials to migrate to the modern one.

Principles

What follows are a set of principles that I believe should inform and guide the rewrite.

Principle 1: Let data be data

Unless a parcel of data needs to mutate itself in response to events, do not wrap it in a JavaScript class.

  1. We should take pains to treat data as immutable. Prefer the use of the Readonly and ReadonlyArray types in TypeScript.
  2. Operations over data should take immutable data structures as input and produce, as output, immutable data structures that share as much structure with the originals as possible.
  3. Data structures should be well typed. The type of a data structure should tell you everything you need to know, statically, about which operations it's compatible with.

Principle 2: Perform work by transforming data using functions

Data should not be mutated in place nor should it be expressed as a JavaScript class that mutates its own private state. Instead, operations (eg. signing a transaction, serializing a transaction, adding an instruction to a transaction, converting a public key between formats) should be performed by invoking functions over immutable data to produce new immutable data.

// Instead of mutating the internal state of JavaScript class instances...
const transaction = new Transaction(/* ... */);
transaction.sign(/* ... */);

// Perform operations by transforming data using functions.
const transaction = createTransaction(/* ... */);
const signedTransaction: ISignedTransaction = signTransaction(transaction, /* ... */);

Principle 3: Importing a module must never produce a side effect

The top level of an ESModule is called its ‘module factory.’ Anything at the top level runs immediately the moment the module is imported from another module.

We must never do any work in a module factory other than defining and exporting types, primitive values, JavaScript classes, and functions. We must never call a function at the top level, nor access a property of an object. We must enforce this with lint rules and with a check at the CI step.

  1. This will ensure that the library is fully tree-shakeable (ie. unused code can be deleted by an optimizing compiler)
  2. This will make the library more ‘lazy,’ which is to say code will compile and run as it's invoked rather than as it's imported.

Principle 4: Use opaque types to guarantee runtime contracts

After making certain assertions about a value, return it in its most primitive form but cast to an opaque type

For instance, once you've asserted that an array of numbers is a valid public key, cast it to an opaque PublicKey type. This makes it so that you can pass around a primitive value while at the same time enforcing runtime guarantees about its compatibility with various operations.

Imagine asserting that a string is definitely a base58 encoded pubkey, then casting it to an opaque Base58EncodedPublicKey TypeScript type. Now that value comes with a guarantee that you can deserialize it back into a array that conforms to the PublicKey type, or use it as an input to an RPC call – and all without using JavaScript classes.

Principle 5: Extreme modularization

Break the library up into as many ES modules as practical. Test them independently.

End users may choose a monolithic import style compatible with a tree-shaking compiler (eg. import {createPublicKey} from '@solana/web3.js';) or they may craft a custom build by importing ES modules directly (eg. import createPublicKey from '@solana/web3.js/modules/createPublicKey.js';)

Principle 6: Program wrappers belong in their own packages

Wherever possible, kick program wrappers (eg. vote-program) into their own npm modules that are tested and published separately.

  1. This should reduce CI time in cases where only the changed package needs to be retested.
  2. People who don't use a program wrapper don't have to download it.

Principle 7: All errors must be coded

Never throw errors with freeform text. Always throw typed or coded errors, so that people can make airtight assertions about the nature of what went wrong in their catch blocks.

Principle 8: All asynchronous operations and subscriptions must be cancellable

Never offer an asynchronous operation without also offering a way to cancel it.

  1. Subscriptions must always return a dispose handle.
  2. Asynchronous operations must always accept and respond to an AbortController.
// You can later call `dispose()` to cancel this subscription.
const {dispose} = makeSubscription('accountChange', /* ... */);

// You can at any time call `abortController.abort()` to cancel this promise.
const abortController = new AbortController();
await confirmTransaction(/* ... */, {abortController});

Principle 9: Minimize over-requires in function inputs

A function should make as efficient the use of its input arguments as possible. That is to say that it should not require as input anything that it doesn't make use of.

Instead of this:

function doThing(
  connection: Connection  // Way more information than we need.
) {
  const commitment = connection.commitment;
  // Do something with `commitment`
}

Do this:

function doThing(commitment: Commitment) {
  // Do something with `commitment`
}

h/t @tg44

Principle 10: Debuggability

Produce a debug build that produces warnings and messages that otherwise get stripped out in production mode through dead-code elimination.

if (__DEV__) {
  log(...);  // Gets stripped out when `__DEV__` is `false` in production mode.
}

Under consideration: Allow developers to inject a custom logger, in development or production. Supply a default implementation of this logger that logs to the console. Examples of logs this logger might produce include outgoing RPC calls and incoming RPC subscription notifications.

h/t @Swertin

Principle 11: Avoid JavaScript numbers

JavaScript numbers can express integer values up to 2^53 - 1. If you want to express a value higher than that you must express it either as a bigint or a string.

  • When communicating with a web API (eg. the JSON RPC) always use strings to avoid truncation
  • When storing and performing arithmetic on values in memory always use bigint
@jordaaash
Copy link
Contributor

I love the general framework here. I'm not sure about this part:

Once you've proven that you can reimplement the legacy API using the modern one

Once web3.js is more functional, it'll be more modular and easy to build libraries on top of, so the ideal API might in fact be much smaller. There's also lot that could simply be chopped (deprecated stuff, surprising behaviors, bad abstractions).

What would reimplementing the legacy API to call the ideal API help us do?

@steveluscher
Copy link
Collaborator Author

What would reimplementing the legacy API to call the ideal API help us do?

Back when Relay was completely reimplementing everything, the library forked into ‘classic,’ ‘modern,’ and ‘compat.’ The compat layer essentially let people keep using the old API but use the modern implementation under the hood.

https://github.com/facebook/relay/tree/v1.7.0/packages/react-relay

This is definitely not meant to constrain or bloat the new API; we can still leave the old cruft behind. Progressively rebuilding the old API using the new API lets us put the new one into production from day one, so that we can gain confidence that we're building something that actually works.

@kevinheavey
Copy link

kevinheavey commented May 20, 2022

I would like to add that it's nice if web3.js has a similar API to the Rust SDK. Right now it's similar-ish to the Rust SDK, and implementing principles 1 and 2 would make it less similar. They're good principles though, and the Rust SDK should probably also adopt them.

(My stake in this is that I work a lot on solana-py, which mostly imitates web3.js, and that I'm about to release a Python wrapper for the Rust SDK.)

@mvines
Copy link
Member

mvines commented May 20, 2022

Building the solana-sdk for wasm to avoid re-writing all the transaction packing logic/etc in web3.js would be splendid. This has been a pipe dream since forever. We're pretty close actually

@steveluscher
Copy link
Collaborator Author

Building the solana-sdk for wasm to avoid re-writing all the transaction packing logic/etc in web3.js would be splendid.

Thanks for bringing this up! I've heard this a few times. I really didn't want to get into that particular discussion on this particular webpage, but here we are :)

I have a number of worries about the WASM cross-compilation plan. I don't know if they're founded.

  1. I think that shipping a WASM binary would really thwart our efforts to make web3.js compatible with optimizing compilers. This might get us even further from a state where an optimizing compiler can tree-shake away the parts of web3.js that we can statically determine to be unreachable because a dApp doesn't use them.
  2. At Facebook we shipped WASM binaries to do certain image manipulations in-browser. Based on my work there, I'm concerned about WASM compile times; even with the compileStreaming API.
    • With WASM you have to compile and initialize the entire program before you can use any of it. Web3.js is often going to be involved in the startup path of a dApp; I don't want to make app startup wait on anything it's not going to need.
    • JavaScript lazily parses and JITs code on demand, as you use it.
  3. I'm also deeply concerned about ferrying data across the WASM/JS boundary. A key property of performant frontends (eg. using React) is to model world state using immutable data structures. Doing so gives you a fast path to knowing which parts of the UI to re-render, and which parts can stay memoized. I'm afraid that having to ferry data across the boundary might break down our ability to test for referential equality between two data structures, making it extremely difficult to efficiently memoize UIs.

@mvines
Copy link
Member

mvines commented May 21, 2022

Seems like some research is needed before making a decision

@mschneider
Copy link
Contributor

Also shipped web apps using wasm already and it's really just worth it, if you have a problem that can not be solved in regular js due to runtime performance issues. Usually you end up writing a lot of runtime bridging code as well. Totally agree with steve's assesment of the drawbacks of WASM, there's no free lunch.

I'd add as another benefit, that no one really likes to switch programming languages, while debugging and so being able to read the code that causes an error in javascript, is just so much more productive for the average web dev.

@steveluscher steveluscher changed the title web3.js: (draft) Manifesto for a rewrite web3.js: (draft) Principles for a rewrite May 21, 2022
@steveluscher
Copy link
Collaborator Author

steveluscher commented May 21, 2022

…great choice of word in the current climate & state of affairs. 🤦‍♂️ where is the media & pr communications filter

Oh? What's the cultural context that you're referring to? I literally don't know.

My first exposure to the word was in the context of the First Things First Manifesto of 1964. Backed by 400 designers, it advocated for a “reversal of priorities [in consumer advertising] in favour of the more useful and more lasting forms of communication.” It was an optimistic call to arms to use design as a tool for education and prosperity rather than persuasion and coercion.

In any case, it's easy enough to change! Done.

@steveluscher
Copy link
Collaborator Author

steveluscher commented May 21, 2022

principle 5 sounds like dependency hell.

Thankfully we're on the same team! I meant ES Modules, not NPM modules. I'll update the text.

@tomland123
Copy link

tomland123 commented May 21, 2022

did principle 8 happen because of me because we are currently in the process of destroying all web3 from our client because of the lack of abort controller -- it is impossible to scale our product without it.

I am not a big wasm fan--the only killer benefit I could see is to write solana code in the browser and share it with friends/have a replit that you can quickly check the results of.

The tradeoffs are months of odd bugs, compilation errors, and worse error messages than anchors first versions.

I don't know what you mean by principle 5, but I do like how lodash is structured--but the file structure you proposed seems a lot clunkier than how lodash does it so I am just going to tacitly state look at lodash for inspiration before you build this +_+

@danmt
Copy link

danmt commented May 21, 2022

Thanks a lot for this effort. I like the overall approach. In principle 8, it would be really useful for me to have makeSubscription and the abortable callbacks to match the TC39 Observable spec. I don't know how many other consumers will agree on this, in such case it might just be another layering.

@mrmizz
Copy link

mrmizz commented May 21, 2022

in favor of all the functional/immutable stuff.

@martinezjorge
Copy link

This sounds great! Definitely would love to participate.

@tg44
Copy link

tg44 commented May 23, 2022

I came from scala experience, and I can relate to these principles. I would add that;

  • keep the function inputs as concrete as possible (for ex; codeblock1 requires a provider, which requires a wallet, but it never uses it... I know this is an anchor code, but still, we should try to minimize this kind of "overrequires")
  • keep options open for debugging or extreme use
    • I would like to see some effort about RPC HTTP call logging, or even an interchangeable HTTP client, it is tough to debug things when you don't see the raw RPC calls for example in nodejs
    • rn the retry with backoff is coded in, it should be a separate config/something with rich option of customization and probably even with hooks (for ex.; I want 5 retries, and every time call this function when a retry started so I can notify the user to be more patient)
    • can we get data from a specific block? if so we should add this as a param to every function even if most of the users will call it with the default/latest, and most of the RPC endpoints are not supporting it (I don't know if any rpc supports it or not, just an example)

codeblock1;

const payer = web3.Keypair.generate(); //unnec
const wallet = new Wallet(payer) //unnec
export const connection = new Connection(clusterApiUrl("mainnet-beta"));
const provider = new Provider(connection, wallet, {preflightCommitment: "recent"})
const programAddress = ...
const idl = await Program.fetchIdl(programAddress, provider) //this only uses connection from provider

@Swertin
Copy link

Swertin commented May 24, 2022

Please please please, improve the overall debugging process for this library. This is hands down one of the biggest roadblock for newer developers. Spending 8 hours to figure out the exact input types and formatting for single constructor is not time well spent.

@steveluscher
Copy link
Collaborator Author

I don't know what you mean by principle 5, but I do like how lodash is structured…

100%. Underscore and lodash were the inspiration for principle 5.

@steveluscher
Copy link
Collaborator Author

In principle 8, it would be really useful for me to have makeSubscription and the abortable callbacks to match the TC39 Observable spec.

I've used Observables in the context of working on https://github.com/facebook/relay where I came to truly hate them. :)

I'll definitely keep your suggestion in mind though, and consider them if they're a good fit.

@steveluscher
Copy link
Collaborator Author

keep the function inputs as concrete as possible (for ex; codeblock1 requires a provider, which requires a wallet, but it never uses it... I know this is an anchor code, but still, we should try to minimize this kind of "overrequires")

Yes, yes, yes. Paring down input arguments to as close as you can get to the primitives is so so important to keep an API performant and flexible. I've watched people save a bazillion dollars and tons of CO2 by ‘unwrapping’ input arguments so as to make it possible to easily memoize function calls.

I'm going to elevate this to the level of principle 9.

@steveluscher
Copy link
Collaborator Author

Please please please, improve the overall debugging process for this library.

Absolutely. I'm elevating this to principle 10.

@kevinheavey
Copy link

kevinheavey commented May 24, 2022

Oh another big problem with web3.js is it doesn't have proper support for sending batch requests and (related) it doesn't make it easy to build RPC request objects without sending them. (These are things I'm adding to solana-py)

@SvenDowideit
Copy link

I've just started migrating our Workbench code from using Keypairs in the frontend code, to delegating to the Wallet-Adapter - and one thing I'd really like, is to continue using the high level API's (like createMint()) and not need to totally re-write using lower abstraction level transactions. This becomes even more dramatic when you look at the web3.js getOrCreateAssociatedTokenAccount() calls - where a Wallet user starts to realise that they're going to end up duplicating most of web3.js's high level API's to convert to what (I hope) will become Best Practice - using the Solana-Wallet-Adapter.

as an example that gets me where the API consumer me wants to go (I just added a WalletContextState to the payer):

export async function createMint(
  connection: sol.Connection,
  payer: sol.Signer | WalletContextState,
  mintAuthority: sol.PublicKey,
  freezeAuthority: sol.PublicKey | null,
  decimals: number,
  keypair = sol.Keypair.generate(),
  confirmOptions?: sol.ConfirmOptions,
  programId = splToken.TOKEN_PROGRAM_ID
): Promise<sol.PublicKey> {

see workbenchapp/solana-workbench@6f76f1b#diff-431dcefd1e9123bea461100cada881acf05b512a1e5c7e1ddc8f45334fa1c3bc

and yeah - I did this with type checking if's, but it could also be a drop in library - I just prefer the idea of one codebase to fix - especially when I look at getOrCreateAssociatedTokenAccount type things.

@matt-allan
Copy link

Hey there,

Before finding this issue I was thinking about spinning out an unofficial "web3-lite" package based on a lot of these same principles. Some thoughts on how this could be done below:

Something that I think may be helpful for this effort that I haven't seen mentioned elsewhere is starting with an interface description language for the JSON RPC API. If we can automate generating TypeScript types for all of the endpoints it will lower the maintenance burden quite a bit. I think this would also be generally useful (a nice Postman style RPC client with autocomplete, generating the docs, testing, etc.). Maybe we can use the Rust types as the source of truth and derive schemas from those?

For prior art, there's a few projects that might be interesting here:

Ideally what I would like to see in a new web3.js is something along the lines of:

  1. A low level JSON-RPC client
  2. TypeScript types for methods, params, and responses
  3. Encoding / decoding
  4. constants, i.e. BPF_LOADER_PROGRAM_ID
  5. Nice abstractions around pagination and subscriptions, using abort controllers, async iterables, etc.
  6. High level APIs that build on the other primitives for common use cases

I think that could be implemented in a modular, layered fashion and result in very small, tree shake-able bundles. It should be possible to keep the user friendliness of the current API with a much lower volume of code by leveraging TypeScript (this would still work for JavaScript devs if they are using a language server).

If the JSON RPC client takes the IDL as a type parameter you can type the input arguments and return types automatically without having to add a method for every single API method. Here's an example TS Playground to show what I mean.

A use case this would enable that I'm particularly keen on is combining multiple different method calls into a single batch in a type-safe manner.

RE: serialization I 100% agree on using opaque types. If the lowest layer uses primitives, a higher layer can still handle deserializing into complex types if that's wanted by the user. If we have an IDL we can auto-generate 'schemas' that exist at runtime and describe how to serialize a type automatically.

Has there been any work on this effort lately? I may have some availability if anyone else is interested in this work and wants to collaborate.

@steveluscher
Copy link
Collaborator Author

If we can automate generating TypeScript types for all of the endpoints it will lower the maintenance burden quite a bit.

If you could see my screen right now, you'd see a bunch of stuff open to autogenerate TypeScript interfaces from the Rust implementation of the RPC. I'm 100% with you, and working on this now.

I think that could be implemented in a modular, layered fashion and result in very small, tree shake-able bundles.

100%.

If the JSON RPC client takes the IDL as a type parameter you can type the input arguments and return types automatically without having to add a method for every single API method.

I'm going to do one better; the JSON RPC API is going to become all types and almost no implementation, yet with the same ergonomics as people are used to today. Take a look at how I implemented the SMS Mobile Wallet Adapter API with JavaScript proxies. Essentially, we'll be able to support infinite RPC method growth with zero growth in the JS implementation.

A use case this would enable that I'm particularly keen on is combining multiple different method calls into a single batch in a type-safe manner.

Thanks for the reminder. I'll think about batching as I proceed.

Thank you for all of these thoughts. I've got you. Now that Breakpoint is over, I'm going to make progress on this. Hold on tight.

@steveluscher
Copy link
Collaborator Author

Closed by 5a20ed0.

Copy link
Contributor

github-actions bot commented Aug 8, 2024

Because there has been no activity on this issue for 7 days since it was closed, it has been automatically locked. Please open a new issue if it requires a follow up.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests