Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve function call fee estimation #4826

Closed
Longarithm opened this issue Sep 15, 2021 · 20 comments
Closed

Improve function call fee estimation #4826

Longarithm opened this issue Sep 15, 2021 · 20 comments
Assignees
Labels
A-params-estimator Area: runtime params estimator T-contract-runtime Team: issues relevant to the contract runtime team

Comments

@Longarithm
Copy link
Member

Longarithm commented Sep 15, 2021

The current estimation function call fee must be improved, because it doesn't take into account the contract deserialization and thus complexity of the contract. This already leads to significant undercharging of huge contracts like Aurora.

Intention of the fee is to charge users for all operations not accounted by wasm gas counter, so estimation is made by counting instructions spent on no-op function call.

Simple approach

We can't charge constant fee, because gas spent varies on contract' complexity. But the algorithm of parsing raw wasm code should be linear, so the fee should be bounded by C * code.len().

The problem here is that when there are too many functions in contract, fee becomes very inaccurate, see #4660. We absolutely can't afford to charge 302 Tgas for each Aurora engine call, because 1) it is ~10 times overestimation 2) we could fit only 3 such tx in a block.

New approach

Let's make number of functions a separate factor to make estimation more accurate.
Take it as func_assoc here: https://github.com/near/wasmer/blob/0.18.1/lib/runtime-core/src/module.rs#L48
Experiment show that the effect shows up on all functions, not only exported ones.

So, from the above the following constants should exist:

gas_spent <= base_cost + cost_per_code_byte * code.len() + cost_per_function * module_info.func_assoc 

Then we estimate these coefficients by LSM here using function least_squares_method_2: master...2c1e4fe

Current results are:

base_cost ~= 10 Ggas
cost_per_code_byte ~= 311 Mgas per KB
cost_per_function ~= 2 Ggas

It gives upper estimation on a simple test set if we multply coefficients by 2. Aurora fee should be ~4 Tgas after this.

Plan

  • Include args.len() to the estimation
  • Decide if we need safety multiplier
  • Validate results on real contracts
  • Check this effect on other items included to wasm modules:
    • globals
    • tables / elems
    • memory
    • exports / imports?
    • different set of function bodies?

Other ideas

@Longarithm Longarithm self-assigned this Sep 15, 2021
@Longarithm Longarithm added the A-params-estimator Area: runtime params estimator label Sep 15, 2021
@Longarithm
Copy link
Member Author

Longarithm commented Sep 16, 2021

Validated cost_per_function as follows:

Take 2 contracts of the same length but different number of functions. Estimate cost_per_function as (cost_2 - cost_1)/(funcs_2 - funcs_1).
For simplicity, i-th function body is i32.const {i} drop possibly repeated several times.
In my construction, first contract C(N) contains 2 functions:

func0: i32.const 0 drop
func1: {i32.const 1 drop} repeated N times

Second contract contains N / 6 + 3 functions, which made lengths almost the same.

I considered the following values of N: {10, 20, ..., 100}, {75, 80, ..., 125}, {120, 140, ..., 200}, {1000, 2000, ... 10000}.
For almost all of them cost_per_function <= 2.2 Ggas.

There are several outliers with N ~= 100 for which cost_per_function is 3.7 Ggas (1.5 times higher). This effect comes out for 50 repeated run_vm calls and vanishes for 20 calls. For neighbouring estimations effect also vanishes.
This also may be related to warmup in the beginning.

Another outlier is C(100) itself. Current model estimates its call as 20 Ggas instead of 27 Ggas as it should be.

Anyway I'm still confident in safety-multiplying cost by 2:

  • we already have two methods of computing the coefficient
  • we have outliers only for small contract sizes
  • (?) functions are stored in vector, and if reallocation happens, it may give 2x overhead

@bowenwang1996
Copy link
Collaborator

@Longarithm thanks for the investigation and detailed analysis! This provides really valuable insights into what the problem is and how we could address it. Some thoughts:

  • If we decide to use this model that takes number of methods into consideration, how should we make sure that the increase in fees doesn't break existing contracts?
  • Alternatively, is there something we could do to fundamentally improve the execution model so that we don't have to deal with fee increases?

@Longarithm
Copy link
Member Author

Longarithm commented Sep 17, 2021

@bowenwang1996
For current model, if contract length and functions number are lower than some constants (have to be found), there is no risk of fees increase. Fortunately, current base fee is 2.3 Tgas which is quite high comparing to new base fee which model estimates as 20 Ggas.

Hopefully there won't be a lot of other contracts and I can analyze them manually.

On the other hand, I found that call of no-op function state_migration on Aurora #4814 (comment) requires 10.7 Tgas. It means that base fee should be at least 10.7 Tgas and my model gives only 6.2 Tgas.

@nearmax says that it actually shouldn't break Aurora, because costs of their calls are much higher. But this may not be the case for other contracts.

It has several non-trivial imports, which could affect the cost. Need to sync with Contract Runtime on this.

@Longarithm
Copy link
Member Author

When I add a bunch of imports same as in aurora engine (see https://hackmd.io/EvpgULvtSqGjyLMPicRIYg), some coefficients in the models significantly change:

base_cost = -30 Ggas
cost_per_code_byte = 1.5 Ggas per KB
cost_per_function = 2 Ggas

Though it was reasonable to expect only base cost change.

@bowenwang1996
Copy link
Collaborator

Hopefully there won't be a lot of other contracts and I can analyze them manually.

That would be great!

some coefficients in the models significantly change:

Does that indicate the model itself may not be the best?

@Longarithm
Copy link
Member Author

Does that indicate the model itself may not be the best?

Yes

@MaksymZavershynskyi
Copy link
Contributor

We should consider rolling out best approximation now and then later if we find out that the model depends on more coefficients we can add new fees, e.g. cost per import. We can also add a limit on the max number of imports, e.g. 100 to make sure it is not abusable.

We are currently in a situation where our base function call fee does not reflect the actual time it takes to load the contract and we cannot be staying in this state for too long while we are trying to find the best model for loading the contract.

@matklad
Copy link
Contributor

matklad commented Sep 22, 2021

🤔 I don't think "just add more parameters later" is a safe solution -- the estimation error might be in the wrong direction.

Imagine that we didn't find the import's issue, and rolled out the cost based on number of fns + len. Then someone could submit a contract with zero functions and a tonne of imports. We'd say that the contract is cheap, but it will be costly in practice.

We know about the imports now, of course, but what if there's some other thing (globals, exports, particular control flow) which exhibits a similar effect?

Estimation based on a single parameter (code length) is more robust to abuse. We are not 100% sure that "many functions" captures the worst use-case, but that seems rather probable. If we add number of functions as a model parameter, we'd have no idea about what is the new worst-case.

That being said, our current cost is just wrong, so anything would be better than it.


If we want to go the multidimensional-cost way, I think we should look not only at the number of functions, but on the overall complexity of the contract. That is, we should add together lenghths of all the sections in https://webassembly.github.io/spec/core/binary/modules.html.

@Longarithm
Copy link
Member Author

I like the idea of complexity factor as a compromise:

  • it should not depend on exact wasmer implementation
  • it includes number of functions
  • this may work in short-term, and exact value can be clarified later

Though intuitively sections are not equal-weighted, and we need to keep in mind that we just cover the problem we have by some fee with ambiguous name.

Regarding functions or imports number - instead of limiting them, we could apply different fee schemes based on their values, like

  • n_functions < 2000 => charge optimistic fee
  • n_functions >= 2000 => charge pessimistic fee

@bowenwang1996
Copy link
Collaborator

we could apply different fee schemes based on their values

I think this is an interesting idea! It also makes it easier for developers to reason about the cost of function call transactions.

@Longarithm
Copy link
Member Author

Longarithm commented Sep 23, 2021

Reducing aurora code leads to some insights.

I left only no-op state_migration and deploy_code exported methods in the contract (attached). This results in 618 KB code with 781 function for which no-op function spends 6.7 Tgas.

This itself implies that logically either cost_per_code_byte >= 10 Ggas / KB or cost_per_function >= 10 Ggas, none of which is the case for model trained on synthetic contracts.

At this point, as a short-term solution I suggest to leave only one parameter in model and put:

base_cost = 4.1 Tgas
if n_functions <= 2000, cost_per_code_byte = 20 Ggas / KB 
if n_functions > 2000, cost_per_code_byte = 200 Ggas / KB 

This serves the following purposes:

  • for the worst abuse scenario we know - contract in which all functions are no-op - cost is fair because for n_functions <= 2000 the worst case is when equality is attained and then fee is exactly 4.1 Tgas, and for n_functions > 2000 cost_per_code_byte is exactly 200 Ggas / KB for such worst contracts, being computed here;
  • aurora fee is 22 Tgas (size = 890 KB) because it has less than 2000 functions, and it is close to actual fee which should be 10 Tgas;
  • for average contract, fee should not change catastrophically (yet to be checked on existing contracts)

Drawback is that we increase base fee by 2x. On the other hand, upgrade to new wasmer should compensate it.

BTW raw complexity idea doesn't work here - n_functions is significantly larger term in all examples, and one import increases functions number in our wasmer implementation as well.

aurora_engine_with_deploy.wasm.zip

@MaksymZavershynskyi
Copy link
Contributor

There are issues with "complexity" metric:

  • It will be part of the protocol, so we need to make sure it is not extremely complex;
  • I've had bad experiences with trying to aggregate a bunch of stuff signals into a package metric -- there will likely be corner cases where it behaves awfully and it will be hard to reason about it. E.g. Wasm can have very disproportional sections.

I like the idea of a simple cost function with n_functions == 2000 as a threshold that cuts off "weird" contracts. Though unfortunately, it adds a non-linear function into our fees.

@matklad
Copy link
Contributor

matklad commented Sep 23, 2021

n_functions == 2000 as a threshold that cuts off "weird" contracts

I am not yet convinced that it does actually cut of "weird" contracts -- it cuts one weird case we know about, so it's overfit to it. In other words, the reasoning behind "for the worst abuse scenario we know" is not entirely sound. The worst-case is relative to the metric. Metric with cutoff will have a different worst-case, which we don't know yet.

To feel more confident about metrics which count functions, I would like to understand this:

What does the slowest 1mb contract with at most 1k functions looks like?

@matklad
Copy link
Contributor

matklad commented Sep 23, 2021

It's also the case that, no matter which metric we chose, there always be the gap between the worst-case for the metric, and the average/best case on the network.

This puts us in an uncomfortable situation -- there is incentive to make the metrics more complicated to reduce worst/best case overhead. But more complex metrics are easier to abuse.

That said, for each specific contract, we can always get the approximate real cost. We can take the contract, run parameter estimator, and say "loading this contract takes about X gas".

@olonho suggested an interesting way out of this systematic problem -- let the validators agree, using whatever mechanism, which contracts are actually "cheap". That is, after a particular code is deployed, validators can verify that it is actually cheap to instantiate. This allows us to keep estimated costs as a safe approximation to prevent abuse by pathalogical contracts, while making important normal contracts cheap on a case-by-case basis.

Strictly from a contract runtime perspective, I would be much more comfortable saying that a particular contract with <2000 functions is cheap, than saying that any contract with <2000 functions is cheap.

@bowenwang1996
Copy link
Collaborator

That is, after a particular code is deployed, validators can verify that it is actually cheap to instantiate.

How does this work in practice? Do you mean that validators would need to agree on the exact cost of each contract? I am not sure how we want to do this and it feels to me like a lot of overhead. In addition, what if the contact is invoked in the same block when it is deployed? If we say that we use the estimated cost here and later validators could change the cost, it means that for transactions that invoke the same method on the same contract, they could end up having very different costs, which is not optimal

near-bulldozer bot pushed a commit that referenced this issue Nov 20, 2021
Stabilize limiting functions number in a contract to 10_000: #4954

While discussing #4826, we found that both contract compilation and function call require a lot of time, if number of wasm functions in it is huge. Even if we fix contract size, function number still makes a big difference.

To mitigate this issue, we limit number of functions in a contract. This shouldn't affect current mainnet contracts because maximal number of functions is ~5k.

## Test plan

* http://nayduck.near.org/#/run/2223
* existing tests
@stale
Copy link

stale bot commented Dec 22, 2021

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

@stale stale bot added the S-stale label Dec 22, 2021
@bowenwang1996 bowenwang1996 added the T-contract-runtime Team: issues relevant to the contract runtime team label Dec 28, 2021
@stale stale bot removed the S-stale label Dec 28, 2021
@stale
Copy link

stale bot commented Mar 29, 2022

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

@stale
Copy link

stale bot commented Jul 6, 2022

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

@stale stale bot added the S-stale label Jul 6, 2022
@akhi3030 akhi3030 removed the S-stale label Jul 8, 2022
@jakmeier
Copy link
Contributor

The discussion in this issue seems mostly outdated, the last comment is now 1 year old.

In the meantime, we have improved estimations in slightly different directions with PR #6362, and we have come up with a "true function call cost" in #7227. This should address the main concerns raised here in this issue.

The further reaching discussion for changing the gas parameter model has also been evaluated in #6992, which is now also done. Currently we don't see a pressing reason anymore to change the set of parameters. But we do want to reshuffle them, see #7741.

@Longarithm do you think we will have more things done on this particular issue? Or can we perhaps close this issue by now?

@Longarithm
Copy link
Member Author

Superseded by #7741.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-params-estimator Area: runtime params estimator T-contract-runtime Team: issues relevant to the contract runtime team
Projects
None yet
Development

No branches or pull requests

6 participants