Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design doc #1

Open
kklas opened this issue Apr 17, 2023 · 34 comments
Open

Design doc #1

kklas opened this issue Apr 17, 2023 · 34 comments

Comments

@kklas
Copy link
Contributor

kklas commented Apr 17, 2023

sui-client-gen design

sui-client-gen is a code generator that generates TS SDK code based on Move smart contracts. It brings type safety and significantly reduces the boilerplate for interacting with smart contracts and fetching objects from chain.

The purpose of this document is to describe its design and gather feedback before it's implemented. It explains reasoning behind the design decisions and is a useful reference for building generators for other languages also.

Design goals for the generated TS code:

  • works with both browser and node
  • generate simple and composable minimal code that can be used as building blocks for building higher level abstractions
  • composable and orthogonal to sui.js (no overlapping functionalities, composes with TransactionBlock...)
  • no built-in caching -- caching requirements are different for node and web apps, and should be handled by app framework

Design goals for the generator:

  • support both local and on-chain packages
  • acts as a shared foundation that can be used for building generators for other languages

POC Overview

In order to evaluate the generated client code I have manually implemented the target code for the Kuna Labs AMM smart contract. This target code is located under ts/src/generated/amm.

I have also included a "fixture" smart contract in this repo under move/fixture. It has two modules - fixture.move and example_coin.move which I have used to evaluate some other functionalities of the generated client not covered with the AMM smart contract.

Additionally, I have included a small node CLI in ts/src/main.ts to showcase how the generated client is used (creating transactions, fetching objects, fetching dynamic fields, fetching events...). It works with smart contracts mentioned above (AMM and fixture) that I have deployed on the devnet.

There is also a POC for the generator in Rust which fetches modules and their dependencies recursively from chain starting from a package address, and then prints structs and their fields to output. It's located in generator/src/main.rs.

Package

For each package, a folder is generated containing within it a folder for each of its modules.
Within module folders, functions.ts, structs.ts, and constants.ts files are generated that will contain relevant code. If a module has no structs or constants defined, the respective files won't be generated.

In the package root folder there is also index.ts that contains the hard-coded PACKAGE_ID const and init.ts whose purpose is related to decoding objects with generic fields which will be explained in the "Structs with generic fields and StructClassLoader" section.

In the case of the mentioned amm package, the generated file structure will look like this:

<root>
├── amm
│   ├── index.ts
│   ├── init.ts
│   ├── pool
│   │   ├── constants.ts
│   │   ├── functions.ts
│   │   └── structs.ts
│   └── util
│       └── functions.ts
└── ...

The reason I have separated constants, functions, and structs into separate source files is to avoid potential name collisions for some extra code that is generated for each function / struct. (e.g. each function binding has an "...Args" interface generated for it used to provide type safety when specifying arguments to a function call). Another reason is that it makes the generated code more readable.

But arguably, it could be made that these are all concatenated in a single file <module>.ts. I don't see a big advantage to that but it could be discussed.

Constants

Move constants can only be of primitive types and vector<u8> so generating constants.ts is very straightforward (example).
In the future versions of the generator, doc comments above constant definitions in Move could be used to indicate to the generator that the constants represent error codes and generate error handling code also, but ths is not in the scope for the initial version.

Functions

Function bindings are generated in functions.ts files for each module (if any). Only public functions are considered.
We generate two items for each move function - an args interface and a function to generate a move call with the provided arguments.

In case of amm module's create function (link) we will have this (full file here):

export interface CreateArgs {
  registry: ObjectArg;
  initA: ObjectArg;
  initB: ObjectArg;
  lpFeeBps: bigint;
  adminFeePct: bigint;
}

/**
 * Creates a new Pool with provided initial balances. Returns the inital LP coins.
 * @arguments (registry: &mut PoolRegistry, init_a: Balance\<A\>, init_b: Balance\<B\>, lp_fee_bps: u64, admin_fee_pct: u64)
 * @returns Balance<LP<A, B>>
 */
export function create(
  tx: TransactionBlock,
  typeArgs: [Type, Type],
  args: CreateArgs
) {
  return tx.moveCall({
    target: `${PACKAGE_ID}::pool::create`,
    typeArguments: [typeArgs[0], typeArgs[1]],
    arguments: [
      obj(tx, args.registry),
      obj(tx, args.initA),
      obj(tx, args.initB),
      tx.pure(args.lpFeeBps.toString()),
      tx.pure(args.adminFeePct.toString()),
    ],
  });
}

In the args interface, the types are mapped as follows:

Move TypeScript Comment
u8, u16, u32 number
u64, u128, u256 bigint
bool boolean
address string
vector<u8> Array<number> | Uint8Array | string hex in case of string
vector<T> Array<T>
object reference ObjectArg composes with TransactionBlock

ObjectArg is defined as ObjectId | ObjectCallArg | TransactionArgument. This way it supports passing in raw object ID values (string) and also composes with TransactionBlock (tx.object(...), tx.gas and values returned from tx.split(...), tx.moveCall(...), etc.).

The function itself is very straightforward. It just generates a moveCall on the provided transaction block with the provided arguments.

Its signature will change based on how many type parameters and function parameters the respective move function takes. E.g. if the move function doesn't have any type parameters then the typeArgs parameter will be omitted. In case it takes only 1 type parameter, it will have a typeArg parameter of type Type (and not a single element tuple). Similarly, if the move function has only one function parameter, then the function will take just this one argument directly (not nested in a field).

The doc comment from the move file is also generated on top of the function.

Structs

Code corresponding to module's structs are generated in the structs.ts. The goal is mainly to generate a class for the struct to handle decoding of object's data returned by the RPC into an object with proper type definitions for its fields.

Let's look at the generated code for amm module's Pool struct (full code here):

bcs.registerStructType([`${PACKAGE_ID}::pool::Pool`, 'A', 'B'], { ... }) // (1)

export function isPool(type: Type) { // (2)
  return type.startsWith(`${PACKAGE_ID}::pool::Pool<`)
}

export interface PoolFields { // (3)
  id: ObjectId
  balanceA: Balance
  balanceB: Balance
  // ... other field definitions
 }

/** Pool represents an AMM Pool. */
export class Pool { // (4)
  static readonly $typeName = `${PACKAGE_ID}::pool::Pool` // (5)
  static readonly $numTypeParams = 2 // (6)

  readonly $typeArgs: [Type, Type] // (7)

  readonly id: ObjectId // (8)
  readonly balanceA: Balance
  readonly balanceB: Balance
  // ... other field definitions

  constructor(typeArgs: [Type, Type], fields: PoolFields) { ... } // (9)

  static fromFields(typeArgs: [Type, Type], fields: Record<string, any>): Pool { ... } // (10)

  static fromFieldsWithTypes(item: FieldsWithTypes): Pool { ... } // (11)

  static fromBcs( // (12)
    typeArgs: [Type, Type], data: Uint8Array | string, encoding?: Encoding
): Pool { ... }

  static fromSuiParsedData(content: SuiParsedData): Pool { ... } // (13)

  static async fetch(provider: JsonRpcProvider, id: ObjectId): Promise<Pool> { ... } // (14)
}

(1) The struct is registered in the framework-level BCS instance so that it can be decoded from a BCS encoding.

(2) A helper method is generated that checks whether an arbitrary type is a Pool

(3) An interface defining struct's fields. This is used in the struct class constructor (9) for defining the type of the fields parameter.

(4) The class representing the struct. It stores struct instance's fields within proper type definitions and also provides multiple methods that deal with loading an instance from various different formats returned by the RPC.

(5), (6), and (7) are meta fields. They are prefixed with $ to avoid name collisions with struct fields and as an indication to the user. static $typeName (static) holds struct's full type name (type parameters are omitted), $numTypeParams (static) stores the number of type parameters the struct is defined with. $typeArgs will hold the type arguments the struct instance (object) is instantiated with.

(8) Then the struct fields are defined. Nested structs are defined through their respective generated classes.

(9) Struct class constructor. It takes typeArgs if any and fields as args.

(10) fromFields loads the struct instance from an object containing its fields (example). This is used when loading the object from BCS as bcs.de(...) returns an object in this format. In case the object has nested struct fields, they will be recursively loaded by calling fromFields on their class.

(11) fromFieldsWithTypes loads the struct instance from an object that contains its type and fields (example). The RPC often returns objects in this format (defined here), e.g. calling provider.getObject with showContent: true. This method can then be used to load an object from this format. Also does recursive loading for nested structs.

(12) fromBcs decodes the object from BCS encoding. Calls bcs.de and fromFields internally.

(13) frumSuiParsedData is a convenience method that loads the object from SuiParsedData which is what is returned on provider.getObject calls (and others) when showContent: true. Calls fromFieldsWithTypes internally.

(14) fetch convenience function that calls provider.getObject with the provided object ID. Errors if the object at address doesn't match the class. This method is only generated for structs that have the key capability.

There's also one issue I ran into in this part. This is that when fetching objects or events from the RPC, they're returned in various different formats. There's also special casing in the RPC for certain types (like Balance) to be returned in a different format which one needs to be mindful of when decoding them from nested fields.

For example:

Structs with generic fields and StructClassLoader

Code for structs with generics is similar the ones without with a few small differences. Let's look at sui::dynamic_field::Field<Name, Value> as an example (full target code here).

Firstly, generics are added to the fields interface and struct class definition:

export interface FieldFields<Name, Value> {
  id: ObjectId
  name: Name
  value: Value
}

export class Field<Name, Value> { ... }

It's important to note that generics are added only for non-phantom type parameters. In the Pool example above, both type parameters were phantom so generics were not added there.

This is because adding generics for phantom types is technically not necessary for the generated code to function. Also because there's no native support for phantom parameters in TypeScript (perhaps there's a hacky way to do it but I don't want to go there). The phantom arguments will be stored (along with others) in the $typeArgs field of the struct class instance.

Next, since we will have to load objects with generics dynamically during runtime (e.g. when calling fetch), when loading generic fields we will have to somehow be able to get their respective classes based on object's type arguments to be able to decode them. This is done through structClassLoader which is a framework-level instance of StructClassLoader class.

E.g., in the dynamic field example we have:

  static fromFields<Name, Value>(
    typeArgs: [Type, Type],
    fields: Record<string, any>
  ): Field<Name, Value> {
    initLoaderIfNeeded() // (1)

    return new Field(typeArgs, {
      id: UID.fromFields(fields.id).id.bytes,
      name: structClassLoader.fromFields(typeArgs[0], fields.name), // (2)
      value: structClassLoader.fromFields(typeArgs[1], fields.value), // (3)
    })
  }

So in (2) and (3) structClassLoader is being used to get the right struct class for the field. E.g., if the dynamic field had 0x2::coin::Coin for its Value type parameter, then the structClassLoader.fromFields(typeArgs[1], fields.value) call would return a Coin instance loaded from field.value object.

Before this can be done though, the struct classes we generated need to be registered with the loader. This is done in the initLoaderIfNeded() call (1). This will register struct classes for all generated modules in the package (example), but only the first time it's called.

The code for the StructClassLoader can be found here. In its working it's similar to the BCS class -- the decode callbacks for each type are stored in a map against their type name and then the decode calls (fromFields and fromFieldsWithTypes) fetch them from there and use to load the proper instance based on its type.

Dealing with multiple packages and dependencies

When generating code for a package, its code and the code for its dependencies will be generated on the same level in the <root> directory (with the exception of 0x1, 0x2, and 0x3 packages which will be included in the framework js package). Package folder names will be based on package names found in their Move.toml.

E.g.:

<root>
├── amm
├── fixture
└── ...

The generator will also support generating code from the chain when the source code isn't available. Since the on-chain package object don't contain Move.toml info, package names will be provided manually by the user.

For this I would like to introduce a config file (e.g. gen.toml) that would be located in the code generation root directory that would specify address to name mapping for on chain packages similar to how it's done in Move.toml:

[names]
amm = "0x2398947fe4b14b514cdf3b52f2d67507c345c2c33c5fa472c0d5d0b0bd13ceed"
fixture = "0xeec7628a510a5e85c47bad12027cab6917cc865729e8f60a7bfdcd5efdc7f959"

This config file would also be used to specify which packages to generate the code for. It would support specifying object IDs, git URLs, and local directories:

[packages]
foo = { local = "../projects/foo" }
bar = { id = "0xeec7628a510a5e85c47bad12027cab6917cc865729e8f60a7bfdcd5efdc7f959" }
amm = { git = "https://github.com/kuna-labs/sui-smart-contracts.git", subdir="amm/move" }

This would allow for a very elegant workflow:

  • create gen.toml file and place it in a directory in your repo
  • run sui-client-gen inside it (no extra args or options need to be passed)

The advantages of this approach (generating code for all dependencies in the repo) are:

  • simple to use and gives control to the user
  • no need to import any additional npm packages as dependencies (and trust them)
  • generated code and subsequent generation changes go through code review
  • can solve the diamond problem

Potential issue -- I expect that people will start building higher level SDKs that wrap the generated code and introduce higher level abstractions and helper functions based on the semantics of the package. In that case the issue is that these higher level SDKs would have to bundle the generated code they depend on. When importing these higher level SDKs into a project that already has generated code you could run into compatibility issues and the diamond problem again. One way this could be solved is that the higher level SDKs are also built as code generators that you could plug in to generate them on top of the code generated by sui-client-gen.

Generator implementation

The code generator will be implemented in Rust and based on move-model. This is so that it can support code generation from both local packages and bytecode modules fetched from chain.

The general implementation outline is this:

  • look up gen.toml and start loading specified packages
  • in case of on-chain package entry, load the root package modules and all dependencies recursively
    • I've already built a small POC for this here
  • in case of local package entry, get modules by compiling it
  • merged all modules into a single GlobalEnv
  • query GlobalEnv to build an IDL-like intermediate representation of all packages and their contents more suitable for code generation work (see below)
  • generate code using quasi-quoting library (e.g. genco)

Since move-model's GlobalEnv is a bit cumbersome to work with in the context of code generation, I will build an intermediate representation struct more suitable for that work (resembling an IDL). This foundation can then be used for building code generators for other languages.

This representation could look something like this (displayed as JSON):

{
  "packages": {
    "0x111": {
      "modules": [
        {
          "name": "module_name",
          "constants": [
            {
              "type": "<>",
              "name": "<>", // optional since in bytecode modules we don't have named constants
              "value": "<>",
              "doc": "<>" // optional
            }
          ],
          "functions": [
            {
              "name": "<>",
              "visibility": "public",
              "type_params": [
                {
                  "is_phantom": false,
                  "name": "<>" // optional
                }
              ],
              "arguments": [
                {
                  "name": "<>", // optional
                  "kind": "mut_ref",
                  "type": "0x111::"
                }
              ],
              "returns": [
              ],
              "doc": "<>" // optional
            }
          ],
          "structs": [
            {
              "name": "<>",
              "type_params": [
              ],
              "fields": [
                {
                  "name": "<>",
                  "type": "0x111::module_name::FooType" // or primitive, or generic
                }
              ],
              "capabilities": [
              ],
              "doc": "<>" // optional
            }
          ]
        }
      ]
    }
  }
}

@awelc
Copy link

awelc commented May 11, 2023

@kklas, a quick question - do you need something specific from the move-model that cannot be obtained from IRs generated by the source-to-bytecode compiler?

@sblackshear
Copy link

@awelc I would think that this goal

support both local and on-chain packages

necessitates usage of the move-model, since it is the common IR that can be constructed from both bytecode and source.

@awelc
Copy link

awelc commented May 12, 2023

Thank you @sblackshear! Admittedly, I did not know that about the move-model...

@kklas
Copy link
Contributor Author

kklas commented May 12, 2023

I guess the main thing I'd like to get your thoughts on is on the "Dealing with multiple packages and dependencies" section. Here I'm proposing that code for all packages is generated in the repo on the same level instead of creating a separate npm package for each Move package which would be the conventional way of doing things. But I think what I'm proposing is the right way here for the reasons I outlined in the section. I'm also introducing gen.toml here. These are the only things that might be considered slightly unconventional so give thoughts if you have any.

As for the generated code itself I'm fairly confident this will be OK since I've used a similar pattern in a production app with very good results. Also this can be easily modified later if needed so I don't have concerns here.

As for the move-model side of things, I'm not yet super proficient with it but based on the POC I've built I think this should be fairly OK. move-model seems very solid. I will probably need to build an intermediate representation to bridge the gap between move-model and data structures needed for actual code generation though. Might do just a wrapper like ModuleEnv and StructEnv are or just clone everything into an IDL-like struct. Will know more when I get to grips with it.

@amnn
Copy link

amnn commented May 12, 2023

Thanks for sharing this design doc @kklas !

Dealing with multiple packages and dependencies

A flat structure makes sense to me here, especially if you are allowing people to opt-in to packages to generate for. We already require that packages with overlapping addresses do not contain overlapping modules, so there is no risk of conflicts.

In terms of mapping addresses to names, did you consider re-using Move.toml's mapping here, instead of having a similar structure within gen.toml? The ResolutionGraph data structure in the move codebase calculates a named address mapping for the root package (for all names it might encounter, which includes names from transitive dependencies), so you could potentially re-use that.

One feature this might not work with today is on-chain or bytecode-only dependencies, but we can look into adding support for that.

@kklas
Copy link
Contributor Author

kklas commented May 12, 2023

@amnn thanks for the feedback!

In terms of mapping addresses to names, did you consider re-using Move.toml's mapping here

Yes, this will be used, but doesn't fully solve the problem because, as you mention, the on-chain packages don't store that info. So for on-chain packages you'd still have to have the user specify them, otherwise your imports look like this import { swapA } from 'gen-root/0x4e5dd7da39a41d91aa8c3bf0759d1bb59dc280aba4412c087ac37386c8db7a8b/pool/functions'

The other reason why gen.toml is introduced is that it allows you to specify which packages you want to generate code for. E.g., suppose in your app you're integrating with DeepBook and an AMM package where these two packages are completely separate on chain and don't depend each other. You will have to specify to the generator that you want code generated for these two and I think it's better to do that within a config file that's committed to the repo instead of CLI args every time.

So while the address mapping in Move.toml can be leveraged here, it doesn't solve the problem fully because:

  • it's not stored on chain so this info is lost for on chain packages
  • you still need some way to specify to the generator what packages to pull (arguably this can also be done through CLI args, but could become cumbersome for when you're specifying a lot of packages)
  • even if you had the above two, you still could run into conflicts when pulling multiple separate packages because they have different mappings for the same address in their respective Move.tomls and there's no clear way to resolve that without additional user input

Because of that my intuition is that some kind of config file will need to be there.

@amnn
Copy link

amnn commented May 12, 2023

The other reason why gen.toml is introduced is that it allows you to specify which packages you want to generate code for.

This makes sense @kklas, I don't want to get rid of gen.toml -- it's the right home for the configuration on which packages to generate code for, but I am curious to see if we can re-use the name to address mapping in the Move.toml and the existing machinery that resolves those transitively, to avoid having to duplicate that information and logic, because it gets surprisingly complex.

it's not stored on chain so this info is lost for on chain packages

There is upstream support for building against a binary-only package. It works by setting up the package with a build directory containing bytecode and a manifest, but no sources. It seems like porting this over would complement what you're trying to achieve:

We can create a binary-only package based on an on-chain address, and this can be used as a dependency for the package you are generating bindings for. From there you can supply a name for the on-chain address either in the manifest of the binary-only package, or the manifest of the package that depends on it.

you still need some way to specify to the generator what packages to pull (arguably this can also be done through CLI args, but could become cumbersome for when you're specifying a lot of packages)

This is also a problem that Move.toml solves (through its [dependencies] section), and which can be extended to support on-chain packages as mentioned above.

even if you had the above two, you still could run into conflicts when pulling multiple separate packages because they have different mappings for the same address in their respective Move.tomls and there's no clear way to resolve that without additional user input

There's two conflict scenarios I can think of, let me know if there are more, but for both of these, I think, there are existing solutions for:

  • The same address is assigned to different names across different packages. How this would manifest is that you would potentially have two names to use for that address, and if you wanted to, you could generate two sets of bindings for it, but in practice, you will probably pick at most one name by adding it to your gen.toml, to break the tie.
  • The same name is used to refer to multiple addresses in different transitive dependencies. This is something that we already need to address when we do named address resolution when fetching dependencies (it's what addr_subst is for), so any package that we can resolve dependencies for will come with a consistent named address mapping that uses address substitution to avoid conflicts.

So, modulo binary package support and retaining gen.toml to opt-in to which package(s) to generate bindings for, you should be able to re-use the fields and logic that we already have built-in to the toolchain for named address resolution.

@kklas
Copy link
Contributor Author

kklas commented May 12, 2023

but I am curious to see if we can re-use the name to address mapping in the Move.toml and the existing machinery that resolves those transitively, to avoid having to duplicate that information and logic, because it gets surprisingly complex.

@amnn, yes, here we're already getting into implementation details. I agree it would be great if we can re-use this logic and thanks for pointing this out. I haven't thought about how complex this gets.

We can create a binary-only package based on an on-chain address, and this can be used as a dependency for the package you are generating bindings for. From there you can supply a name for the on-chain address either in the manifest of the binary-only package, or the manifest of the package that depends on it.

So if I'm understanding this correctly, what you're suggesting is -- suppose I want to pull an on-chain package directly (not a dependency and I don't have the source code locally), I would have to create a stub build directory with corresponding Move.toml, specify address mappings there, and also do this for each transitive dependency?

If so I have to say I don't agree with this approach here. It seems very cumbersome for the users and also unnecessary because we're not actually trying to build the package, just get it's bytecode into move-model to generate the bindings. So while doing that for when you're building a package makes sense, it seems unnecessary to have to do that here.

I agree that we should re-use as much of Move.toml and address mapping logic as possible, but I think that all of this should work just by editing gen.toml and not require any extra steps from users such as creating stub build directories or editing external Move.toml files.

There's two conflict scenarios I can think of, let me know if there are more, but for both of these, I think, there are existing solutions for:

I agree with your points here.

Another trickiness would be with handling different package versions, but I think your points on this already being handled by Move.toml also apply here and can be re-used.

@amnn
Copy link

amnn commented May 12, 2023

I would have to create a stub build directory with corresponding Move.toml, specify address mappings there, and also do this for each transitive dependency?

I don't think this needs to be so manual or cumbersome, we can and should add a tool that performs all these steps, given the ID of a single on-chain package. Would that help?

@kklas
Copy link
Contributor Author

kklas commented May 13, 2023

Of course, if there's a tool that I can throw a package ID (or multiple) and required address definitions into and it would solve version and name conflicts, that would certainly help! Whether it creates stub build directories or not is an implementation detail and I'm not familiar enough with the code base to comment on that, but I strongly feel the users shouldn't have to be aware of any external build directories or Move.tomls outside of gen.toml. If this workflow can be achieved then I'm all for it!

@kklas
Copy link
Contributor Author

kklas commented May 15, 2023

I came up with a more elegant solution for package name problem. gen.toml will be used to define which packages to include in the generation:

[packages]
foo = { local = "../projects/foo" }
bar = { id = "0xeec7628a510a5e85c47bad12027cab6917cc865729e8f60a7bfdcd5efdc7f959" }
amm = { git = "https://github.com/kuna-labs/sui-smart-contracts.git", subdir="amm/move" }

This will put them in the top-level directory. The transitive dependency packages won't be included in the top-level directory but under _dependencies and will use addresses for directory names:

<root>
└── _dependencies
│    ├── 0x934..f93
│    └── 0x123..abc
├── amm
├── bar
└── foo

Benefits:

  • no requirement for the user to specify package names for transitive dependencies
  • no need to resolve address mapping conflicts
  • only packages specified in gen.toml are top-level so the directory structure is cleaner

@amnn
Copy link

amnn commented May 15, 2023

I think that could work, and is a nice simplification! One thing you may have to consider is whether users of named packages might be exposed to addresses via types in their public signature, e.g. amm contains a function that returns a type from 0x123...abc. There's lots of reasonable ways to address this in the design though (use the numerical address always, use the address unless a name has been given in gen.toml, re-export types in the public signature).

On a new topic, id style package specifiers need to be read from a fullnode, one option is to specify that in the dependency:

bar = { node = "https://fullnode.mainnet.sui.io", id = "0xeec7628a510a5e85c47bad12027cab6917cc865729e8f60a7bfdcd5efdc7f959" }

Other options:

  • Make the node used a top-level field in gen.toml,
  • Make it a flag for the tool or pull it from an environment variable so individuals can configure it.
  • Hardcode the fullnode.<network>.sui.io URLs into the tool, but then add a per-package or top-level configuration for which network id packages are to be read from.

My intuition is that configuring the node as a top-level field in gen.toml is the best balance of convenience vs decentralisation/decoupling, but this also depends on how likely you feel it would be that people would generate bindings from across multiple networks (which points to per-package configuration).

@kklas
Copy link
Contributor Author

kklas commented May 15, 2023

One thing you may have to consider is whether users of named packages might be exposed to addresses via types in their public signature

The only problem with using addresses in paths is that import statements become very unergonomic. Users will very likely be exposed to types defined in dependencies but will probably not need to import them directly. If you need to use a type directly, you can specify the package to gen.toml which will generate it top-level.

I think this will be more than good enough for the MVP and later we may explore some of the other solutions you mention here or perhaps introduce support for address mappings as discussed earlier.

id style package specifiers need to be read from a fullnode

Good idea! I think using https://fullnode.mainnet.sui.io as the default with the option to override it in gen.toml with a custom RPC would work well. I think it's very likely people will want to generate for devnet / testnet.

@Jordan-Mysten
Copy link

From a TypeScript SDK perspective, this looks great! Thanks so much for putting this together. A couple thoughts after reading through this:

  • I think the package ID should be configurable at runtime. The main use-case I think this would be useful for is packages published for both testnet / devnet / mainnet, you may want the consumer to be able to pass a known package ID that's not the one it was generated with. It'd also make this tool a lot more useful for local dapp + move package dev.
  • I'd make sure you fork the BCS instance so that you don't pollute the global registered types. I think we're going to change the BCS API eventually to make this more the default, just would be good to not have folks depend on the global bcs instance + import side-effects.
  • I think having classes for the structs does make sense, but I think it'd also be nice to just have a plain typescript type definition that just defines the set shape.
  • I think it'd be useful for the struct classes to also have something like toBCS() that converts to bytes. I don't think this would be essential, just something that I think would be neat.
  • The fetch implementation might be a little misleading given the object could be in a dynamic field or wrapped (in which case the API wouldn't work), I wonder if it'd be better to just assume the consumer knows how to get the bytes or move object?

@kklas
Copy link
Contributor Author

kklas commented May 15, 2023

@Jordan-Mysten thanks for taking a look and the feedback!

I think the package ID should be configurable at runtime

This is actually a point that I'm very well aware of and is something that is much more nuanced than it appears. My opinion is that being able to switch between different IDs on the same generated code would be useful to have, but being able to do that dynamically during runtime is unnecessary. First let me explain the distinction between the two.

In a previous SDK generator I built for Solana (called anchor-client-gen) I made it so that the generated programId.ts file is allowed to be modified manually where the import statements and changes to the PROGRAM_ID constant would be preserved by the generator on subsequent generation runs (see here).
This way you could modify your program ID via an env var or by loading it from a config (e.g. export const PROGRAM_ID = config.myProgramId) and the generator would preserve this (wouldn't overwrite) on subsequent runs.

So, this allows configuring the IDs of the generated code (you can modify them via an env var during build), but doesn't allow them to be changed dynamically during runtime (e.g. via a select in the app UI). Some users then asked for the latter, but after some thought my conclusion was that this is unnecessary and very likely an anti-pattern. My reasoning is that in a production app you will never need to switch between environments (mainnet, testnet) or different smart contract IDs dynamically during runtime. You may need to do that only during development but in that case you can set them through env vars (or through a config set through an env var; npm start:devnet npm start:testnet). On the other hand changing IDs dynamically during the runtime can cause you a lot of headaches (think of invalidating the object caches (!)). I'm not convinced this complexity is necessary (especially since you won't need this in production). I've also summarized my thoughts on this here kklas/anchor-client-gen#51 (comment). That being said, I'm not against it in general, just nobody has managed to convince me otherwise yet.

I agree with you that having some ability to configure package IDs through env vars and such would be useful and would like to implement that, on Sui we're running into a different problem. If you think about switching between e.g. mainnet and devnet, we're not only changing the ID of the relevant package, but also changing IDs of all of its dependencies. So when you say "you may want the consumer to be able to pass a known package ID", considering you also have to pass down IDs of all its dependencies, it's not clear to me how to do that in a practical manner. Also, the generated code for the dependencies has package IDs in the folder structure (see previous discussion here #1 (comment)), so passing down some other IDs would be very confusing.

Another approach that could be tried is that code for different environments is generated in separate folders (e.g. one for mainnet, one for devnet, etc.) and then there is some convenient way to switch between them. Perhaps having a "current" directory that is a link to the relevant environment, or having a stub directory that re-exports the types / methods from the directory of the relevant environment. But it's also not clear to me if this would be practical.

LMK if you have any ideas here.

I'd make sure you fork the BCS instance so that you don't pollute the global registered types.

Agreed, and a custom instance is indeed being used: https://github.com/kunalabs-io/sui-client-gen/blob/master/ts/src/framework/bcs.ts

I think having classes for the structs does make sense, but I think it'd also be nice to just have a plain typescript type definition that just defines the set shape.

You mean like this?

export type PoolType = {
  $typeName: `${string}::pool::Pool`
  $numTypeParams: 2

  readonly $typeArgs: [Type, Type]

  readonly id: ObjectId
  readonly balanceA: Balance
  readonly balanceB: Balance
  readonly lpSupply: Supply
  readonly lpFeeBps: bigint
  readonly adminFeePct: bigint
  readonly adminFeeBalance: Balance
}

Also there already is a PoolFields interface https://github.com/kunalabs-io/sui-client-gen/blob/master/ts/src/generated/amm/pool/structs.ts#L101-L109

I think it'd be useful for the struct classes to also have something like toBCS()

Totally agree. Skipped this for the POC because it's a lot to add all this manually. I'm making a note and will add this in the generator. There are probably a few other useful methods that can be added like toJSON and fromJSON (in case you want to store the object somewhere).

The fetch implementation might be a little misleading given the object could be in a dynamic field or wrapped (in which case the API wouldn't work), I wonder if it'd be better to just assume the consumer knows how to get the bytes or move object?

I agree with you here that fetch generally shouldn't be used as you probably want to be fetching objects in a more efficient way, but on the other hand I find it a nice convenience function. It's useful for when you want to quickly debug something for example. It does type validation and calls deserialize so saves you a few steps. Also having this function there is nice I think because it gives you a nice example of how to deserialize object fetched from the chain using the class (in case you're too lazy to look up the docs). Might even add a fetchMulti method.
I agree with you that it's a bit misleading but I think it's net positive to have it here. I trust that people will figure out that it's better to fetch manually and then just deserialize using the class. Especially since there's no other way to do it when you're doing a multi fetch on heterogeneous objects.

@Jordan-Mysten
Copy link

@kklas We actually do allow switching between networks dynamically in wallet, so I do think there's legitimacy there. All of our products also have network switches currently (I actually do think npm run start:devnet & npm run start:testnet is materially different than having npm run start and having network switching in the UI, and we will always do the latter), and it's also useful for dev when you're testing against different environments and want to dynamically re-configure at runtime. With the current approach none of our apps would be able to use this, which would be a shame.

For things like invalidating object caches, this is why we've taken the stance that the SDK doesn't have any caches and the user is responsible for caching logic to ensure the network is used as a cache key, which would also solve the problem here.

I'm not sure I understand the dependency problem, why would we need to provide all the dependency IDs?

@kklas
Copy link
Contributor Author

kklas commented May 15, 2023

@Jordan-Mysten Fair enough, I see a case for having dynamic network switching, I was mainly discussing about introducing the complexity of being able to pass down IDs to the SDK vs. letting the generator be "dumb" about it and require the user generate separate code for each of the environments.

I don't see an elegant way to design the SDK to support this.

I'm not sure I understand the dependency problem, why would we need to provide all the dependency IDs?

Suppose you have a package A that depends on some other packages published on chain (that perhaps depend on other packages themselves). This previous comment here #1 (comment) outlines how the generator will handle multiple packages and dependencies (for full context see previous comments).

So when you change the env and pass down a different ID for package A, this still wouldn't work because its dependencies still use IDs from the previous environment. So you'd need to pass down the IDs for all the dependencies in the SDK also. I argue that keeping track of package IDs for each package and all of the dependencies for each env in the app would be impractical (cumbersome and error prone). Also introducing a mechanism in the SDK to be able to do that would add complexity (hard-coded package IDs are simple and safe).

I'm not against it in general, just don't see a elegant way to do it and I'd like to keep things as simple as possible for the MVP.

@kklas
Copy link
Contributor Author

kklas commented May 15, 2023

With the current approach none of our apps would be able to use this, which would be a shame.

Which apps are that? I think that wallets and explorer don't apply here because you probably won't be generating static SDKs there aside from the framework which is anyways the same on all chains? Suifrens AFAICT doesn't have a network switch.

@Jordan-Mysten
Copy link

I still don't really understand why those IDs are needed, unless you're interacting directly with those packages?

The apps would be wallet, explorer, suifrens, etc. Wallet and explorer won't only talk to the framework packages, which is why it'd be nice to be able to use something like this. Suifrens doesn't currently have a network switch but in dev we'd probably have one.

@kklas
Copy link
Contributor Author

kklas commented May 15, 2023

I still don't really understand why those IDs are needed, unless you're interacting directly with those packages?

Concretely, they're used for three reasons:

  • safe guard against deserialising an object with a wrong class (e.g. here, here, here, this is done recursively when decoding)
  • registering BCS struct types (e.g. here)
  • registering types in StructClassLoader here

If you don't have this you could end up with obscure and hard to trace bugs in your app because it would be possible to deserialize an object with the wrong class without throwing an error and ending up with a corrupt object.

In theory, it's would be to use package names instead of addresses here, but we would have to have proper address mappings which is a different problem to solve, not exactly straightforward and has its drawbacks (as you can see from my discussion with @amnn in previous comments). Using package IDs is most natural here.

So while perhaps you may get away with e.g. fetching an object manually and then deserializing it using fromBcs after switching env, other methods that check for type correctness will break. And I'd argue it's important to have those.

In essence, package IDs are used all throughout the SDK to differentiate between types (and this is also fundamental to the Move programming model in general), so even if you don't "interact with those packages directly", they're still needed for SDK to function. It's not an easy task to work around that.

Wallet and explorer won't only talk to the framework packages

Can you give an example? I'm trying to understand the use-case where you need both dynamic network switching and static SDKs in the same app.

Suifrens doesn't currently have a network switch but in dev we'd probably have one.

I'm curious how is your dev workflow affected by not having a dynamic switch and having to do it with npm run instead?

@Jordan-Mysten
Copy link

It seems weird to me to apply package checks for generated code from dependencies, but maybe that's just a personal preference thing. If this is important, I imagine the user can opt-out when they set a custom package ID, or they could define all of their dependency IDs. I actually don't think runtime validation is that important here though.

For registering BCS names, that's a valid point but more of a problem with how BCS uses a global namespace (which will go away at some point). If you create an isolated bcs instance you can also just guarantee you register unique names (the names don't actually matter).

For the case of wallet, you can imagine we might at some point have a native integration with a defi token swap package that we'd like to have integrated in all environments. The wallet needs to be able to built generically, and be able to define these dynamically for each supported network. Having multiple builds just flat out isn't supported. We'd ideally like to use tooling to generate out clients in these cases. You can imagine similar use-cases for our other products as well.

Dynamic switch in-app was probably one of our single biggest improvements to local DX that we've shipped because it lets us quickly switch networks and validate our product across the multiple networks that exist. And for products like wallets, it's also just a requirement.

@kklas
Copy link
Contributor Author

kklas commented May 16, 2023

Understood and thanks for these valuable insights @Jordan-Mysten.

You're right that runtime validation isn't strictly necessary for types defined in dependencies because once the top-level type is validated it's guaranteed that inner (wrapped) types are OK also. These are here because simply because code for all packages (including dependencies) is generated the same, and in the original design we didn't make the distinction between dependencies and top-level packages.

While this can be worked around in multiple ways (one of those being allowing the user to opt-out as you say), I don't think this is the core of the issue. What we're dealing here is having the same package published across different environments with different IDs and the question is how to keep track of that.

Giving it a bit more thought I have a solution in mind. In gen.toml we would allow users to define environments:

[environments]
mainnet = "https://fullnode.mainnet.sui.io"
devnet = "https://fullnode.devnet.sui.io"
local = "https://localhost:9000"
any_other_env = "..."

Then in the [packages] section we would allow them to specify for each package their ID on each environment:

amm = { local = "../projects/amm", envs = { mainnet = "0x...", devnet = "0x...", local = 0x..." } }

The generator can then query the on-chain package object to get IDs of all dependencies for all of the environments. It can even verify the packages against on-chain objects (ala sui verify-source) to make sure the IDs user put in are correct.

On the SDK side we would generate dependency packages for each env in a separate directory:

<root>
└── _dependencies
│    ├── mainnet
│    │   └── 0x...
│    └── devnet
│        └── 0x...
└── amm

(Yes, I do understand that this will the generated dependency code will be duplicated for each env. There are multiple ways around that, but let's keep it simple for the purposes of this discussion.)

Then for each required method we would introduce an optional parameter env? which would allow the user to specify environment (and thus imply the correct IDs):

type Env = "mainnet" | "devnet" | "local" | "any_other_env"

...

export class Pool {
...
static fromSuiParsedData(content: SuiParsedData, env?: Env)
}

The package IDs for each environment would be stored in constants. Each env would have its own BCS instance also. This would allow the SDK to be able to switch between different environments dynamically.

@Jordan-Mysten will this work for your use case?

@kklas
Copy link
Contributor Author

kklas commented May 16, 2023

@amnn I would love to hear your thoughts on the above also. I think that the problem of keeping track of deployments of the same package across different envs is something that should be handled on the Sui Move package level (i.e. Move.toml). This would actually help Move developers also. Do you think support for this can be added?

Some of these problems could also perhaps be handled by a package manager (including providing a global address <-> package name mapping). Is anyone working on that?

@Jordan-Mysten
Copy link

That still wouldn't really satisfy our requirements, we'd want to be able to dynamically update the IDs at runtime to handle data wipes in networks, and have flexibility to update the IDs pointed to dynamically (e.g. we want to change contracts in already deployed code).

@kklas
Copy link
Contributor Author

kklas commented May 16, 2023

Are you referring to the situation where e.g. in the wallet extension there is a devnet wipe and you need to update the IDs but can't push an update to the chrome store but rather have to rely on fetching the new set of IDs in the old code from some backend service?

Because if you can just push new code then I don't see a problem.

@Jordan-Mysten
Copy link

Yeah specifically talking about wallets where there's a very long amount of time before code changes are fully rolled out. Agreed that on websites we can push code, although I'd still prefer not requiring code pushes for configuration changes.

@kklas
Copy link
Contributor Author

kklas commented May 16, 2023

OK, makes sense. I understand your point about opting out of ID checks now.

I think that if we add support for address <-> package name mapping on top of environment switching support I described here #1 (comment) where for each env we would have a map instance in the runtime where you could override ID for each package manually it would work. So instead of using constants, the SDK would get IDs from that map for doing the type checks. For registering BCS types instead of using the ID we would use the package name instead, but it would be up to you to make sure that package IDs are correct and new types can be deserialized correctly.

@amnn
Copy link

amnn commented May 17, 2023

I think that the problem of keeping track of deployments of the same package across different envs is something that should be handled on the Sui Move package level (i.e. Move.toml)

Yes, this is something we have been thinking about. Generally we want to move away from developers having to manually manage addresses, because the process is fiddly and error-prone, especially with the addition of published-at!

MystenLabs/sui#8345 covers some initial work we want to do along this vein -- moving published-at and [addresses] from Move.toml to Move.lock and having the tooling update it as you publish. You can see the last open question there speaks to your point. Once we've implemented automated address management, we can look into also distinguishing by network.

@kklas
Copy link
Contributor Author

kklas commented May 17, 2023

Once we've implemented automated address management, we can look into also distinguishing by network.

Great! In that case I will do the MVP as described here #1 (comment) to not duplicate any of this effort, and then add address mappings and environments after this lands.

@Jordan-Mysten I think you will be OK with this for the time being even with the wallet use case since you're only using the framework packages. If you need it sooner LMK and I'll add some hack in to allow you to switch package IDs.

@kklas
Copy link
Contributor Author

kklas commented May 19, 2023

There is upstream support for building against a binary-only package. It works by setting up the package with a build directory containing bytecode and a manifest, but no sources. It seems like porting this over would complement what you're trying to achieve.

@amnn can you point me to where this is supported in upstream?

I took a closer look at the codebase and I think you're right that this is the way to go. A stub package directory should be created where all the top level packages are listed as dependencies, including on-chain packages. This way DependencyGraph and ResolvedGraph can be re-used which solve the problem of guaranteeing there are no duplicate packages and have support for dependency overriding. Named addresses are not strictly necessary for the generated code but need to be resolved to build the move-model (and are useful to include in the generated code anyways). Since DependencyGraph and ResolvedGraph require a root package and traverse over the filesystem, a stub package directory will have to be created. Otherwise a lot of this functionality will have to be re-implemented which I don't think is worth it in this case. Building a move-model by combining on-chain and source packages while extracting maximum information from the source files where available will be a different problem to solve though, but certainly doable.

When you said "porting this over" did you also mean to add support for sui move build to leverage this?

@amnn
Copy link

amnn commented May 19, 2023

@amnn can you point me to where this is supported in upstream?

Sure thing, it was added in these PRs:

Named addresses are not strictly necessary for the generated code but need to be resolved to build the move-model (and are useful to include in the generated code anyways).

Note that I believe there is one limitation around move-model where it doesn't currently support packages that uses address substitution (the addr_subst field on dependencies)

When you said "porting this over" did you also mean to add support for sui move build to leverage this?

Yes, most of the changes are in the compiler and package resolution logic. We will need to cherry-pick the change and fix things up to work with lock file support because they both landed at the same time in the respective codebases (There shouldn't be any fundamental reason why this wouldn't work though).

@kklas
Copy link
Contributor Author

kklas commented May 19, 2023

Thanks! There seems to be a bit more work here to be done so for the MVP what I'll do is I'll keep the on-chain and source packages separate by building two separate move models for them. I can guarantee consistency in both because for source it will be handled by the ResolvedGraph and for on-chain I can do the necessary package overrides manually by selecting the one with the highest version based on fetched SuiRawMovePackage. The downside is that the two sets of dependency packages will be generated separately which will cause duplication, but I think that will be OK for the initial version. I'll get it out quicker so people can use it while I work on solving all of this properly in the meantime.

Note that I believe there is one limitation around move-model where it doesn't currently support packages that uses address substitution (the addr_subst field on dependencies)

Out of curiosity, how is this handled in the prover then? Does it just not work when there's an address substitution in a dependency?

@amnn
Copy link

amnn commented May 19, 2023

Yes, I believe it doesn't support those packages (and I think practically speaking, it's actually rare to get package name conflicts as well).

@kklas
Copy link
Contributor Author

kklas commented Jun 15, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants