Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txtar for test4 cacheTypes issue #2605

Closed
wants to merge 1 commit into from

Conversation

thehowl
Copy link
Member

@thehowl thehowl commented Jul 18, 2024

Draft, to share txtar for an issue on test4.

There is a new gnoland restart command to try to verify restarting a node. It doesn't seem to work properly in a "successful" scenario (it panics further down the line); but at least for as far as this issue is concerned, it shows the correct stacktrace we are encountering in production.

My theory on the panic: tx1 fails because of an out-of-gas error, creating one of those inconsistencies between cache* fields in the gno.Store and the underlying database. The type gno.land/p/g17ernafy6ctpcz6uepfsq2js8x2vz0wladh5yc3/zentasktic.Collection is stored in cacheTypes, but not in the store.

gno.land/p/g17ernafy6ctpcz6uepfsq2js8x2vz0wladh5yc3/zentasktic was then redeployed, in tx2. This time, it worked; however, when saving the package, SetType is called and is a no-op, because the type is considered as being already in the store (as it exists in the cache).

func (ds *defaultStore) SetType(tt Type) {
	tid := tt.TypeID()
	// return if tid already known.
	if tt2, exists := ds.cacheTypes[tid]; exists {
		if tt != tt2 {
			// this can happen for a variety of reasons.
			// TODO classify them and optimize.
			return
		}
	}

Thus, already running nodes are fine (they have the zentasktic.Collection from the failing transaction, still in the cache). However, when restarting a node, the type exists in neither cacheTypes nor the underlying database; generating the panic.

Ideas for a solution:

  • fix(gno.land): make gno store cache respect rollbacks #2319 tries to address the underlying problem, and make the GnoVM store "transactional", and rolled back/committed in the same way as the Tm2 stores.
    • However, it almost certainly screws up gas calculations; and as such is unsuitable.
  • One solution that could be compatible with test4 is to use RunFiles in PreprocessAllFilesAndSaveBlockNodes, so we can have the types eventually saved again in the cache/store.
    • However, I'm pretty sure this is a perilous path, riddled with other hacks and caveats, both predictable and unpredictable.
    • Especially since this still leads to an inconsistent state with a part of the chain: running nodes use a FrankenType (remember: the zentasktic.Collection in cacheTypes comes from a failing tx, not the "real", deployed zentasktic.Collection). Restarted nodes would instead use the correctly "deployed" zentasktic.Collection. Since the types are the same, this might be fine, but I'm not so sure.
  • One very simple solution, but which probably leads to the fewest further headaches, is to always clean the database and do a catch-up on the network every time a node is restarted.
    • Yep, it's ugly, but somehow seems preferable to the other two.

@github-actions github-actions bot added the 📦 ⛰️ gno.land Issues or PRs gno.land package related label Jul 18, 2024
Copy link

codecov bot commented Jul 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.02%. Comparing base (447b763) to head (bf9a692).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2605      +/-   ##
==========================================
- Coverage   55.03%   55.02%   -0.02%     
==========================================
  Files         595      595              
  Lines       79665    79644      -21     
==========================================
- Hits        43843    43822      -21     
+ Misses      32509    32508       -1     
- Partials     3313     3314       +1     
Flag Coverage Δ
contribs/gnodev 26.00% <ø> (ø)
contribs/gnofaucet 14.46% <ø> (-0.86%) ⬇️
contribs/gnokeykc 0.00% <ø> (ø)
contribs/gnomd 0.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@deelawn
Copy link
Contributor

deelawn commented Jul 19, 2024

Nice job diagnosing the issue 💪 . In regards to the solutions, could we maybe do one of the following:

  • Don't cache types when a package is created
  • Use a transient cache for types and move them to the regular cache once addpkg completes without issue

I know this won't fix test4, but it seems like a sane solution going forward. I made a short POC implementation here -- https://github.com/gnolang/gno/compare/master...deelawn:gno:fix/transient-type-cache?expand=1

Let me know what you think, or if I'm missing something, why it won't work 😅

@thehowl
Copy link
Member Author

thehowl commented Jul 24, 2024

  • Don't cache types when a package is created

  • Use a transient cache for types and move them to the regular cache once addpkg completes without issue

I think this is not just a problem for cacheTypes, but also for cacheNodes. And, even then, the code in tm2 commits a tx to store when all of its messages succeed; while your approach would instead commit to store on a "per-message" basis. Which is why I think it's better to do it with something like #2319's approach.

@deelawn
Copy link
Contributor

deelawn commented Jul 24, 2024

  • Don't cache types when a package is created
  • Use a transient cache for types and move them to the regular cache once addpkg completes without issue

I think this is not just a problem for cacheTypes, but also for cacheNodes. And, even then, the code in tm2 commits a tx to store when all of its messages succeed; while your approach would instead commit to store on a "per-message" basis. Which is why I think it's better to do it with something like #2319's approach.

Right, thanks. I forgot this is happening for each message, not each tx.

@grepsuzette
Copy link
Contributor

grepsuzette commented Jul 25, 2024

I will react, naively, to the 3 solutions, @thehowl.

  1. fix(gno.land): make gno store cache respect rollbacks #2319 tries to address the underlying problem, and make the GnoVM store "transactional", and rolled back/committed in the same way as the Tm2 stores.
  • However, it almost certainly screws up gas calculations; and as such is unsuitable.
  1. One solution that could be compatible with test4 is to use RunFiles in PreprocessAllFilesAndSaveBlockNodes, so we can have the types eventually saved again in the cache/store.(...) However, I'm pretty sure this is a perilous path
  2. One very simple solution, but which probably leads to the fewest further headaches, is to always clean the database and do a catch-up on the network every time a node is restarted.
  • Yep, it's ugly, but somehow seems preferable to the other two.

Solution 2. dicey, agree.

The ideal solution if we omit gas calculations just for this sentence seems solution 1, right?

In what way does solution 1 affect gas calculations? Is it like: Malory volontarily fails tx on low gas so next users pay a lot of gas to reload stuffs that were discarded from memory kind of bad?

If so, before ditching solution 1... in #2319, Jae talked about a short-term solution for rollback issues (you explored a different solution in #2504, consisting of preloading all packages at start):

I think the easiest way to solve this problem right now is to change the implementation of the usage of the type checker so that it doesn't call store.GetMemPackage, but only reads the mempackage through ReadMemPackage().

The type checker doesn't care about gno state, I think it only cares about the source code, so there's no need to run "GetMemPackage" which also runs the mempackage.

Couldn't this work and help here?

For the sake of the discussion, here the txtar in this PR (truncated after 80th column as we don't need the sourcecode):

loadpkg gno.land/p/demo/avl
gnoland start

gnokey sign -tx-path $WORK/tx1.tx -chainid tendermint_test -account-sequence 0 t
! gnokey broadcast $WORK/tx1.tx
stderr 'out of gas'

gnokey sign -tx-path $WORK/tx2.tx -chainid tendermint_test -account-sequence 1 t
gnokey broadcast $WORK/tx2.tx
stdout 'OK!'

gnokey sign -tx-path $WORK/tx3.tx -chainid tendermint_test -account-sequence 2 t
gnokey broadcast $WORK/tx3.tx
stdout 'OK!'

gnoland restart

-- tx1.tx --
{
  "msg": [
    {
      "@type": "/vm.m_addpkg",
      "creator": "g1jg8mtutu9khhfwc4nxmuhcpftf0pajdhfvsqf5",
      "package": {
        "name": "zentasktic",
        "path": "gno.land/p/g17ernafy6ctpcz6uepfsq2js8x2vz0wladh5yc3/zentasktic"
        "files": [
          {
            "name": "README.md",
            "body": "# ZenTasktic Core\n\nA basic, minimalisitc Asess-Decide-Do 
          },
          {
            "name": "collections.gno",
            "body": "// base implementation\npackage zentasktic\n\nimport (\n\t\
          },
          {
            "name": "collections_test.gno",
            "body": "package zentasktic\n\nimport (\n\t\"testing\"\n\n    \"gno.
          },
          {
            "name": "contexts.gno",
            "body": "// base implementation\npackage zentasktic\n\nimport (\n\t\
          },
          {
            "name": "contexts_test.gno",
            "body": "package zentasktic\n\nimport (\n\t\"testing\"\n\n    \"gno.
          },
          {
            "name": "core.gno",
            "body": "// base implementation\npackage zentasktic\n\nimport (\n\t\
          },
          {
            "name": "errors.gno",
            "body": "package zentasktic\n\nimport \"errors\"\n\nvar (\n\tErrTask
          },
          {
            "name": "marshals.gno",
            "body": "package zentasktic\n\nimport (\n\t\"bytes\"\n)\n\n\ntype Co
          },
          {
            "name": "projects.gno",
            "body": "// base implementation\npackage zentasktic\n\nimport (\n\t\
          },
          {
            "name": "projects_test.gno",
            "body": "package zentasktic\n\nimport (\n\t\"testing\"\n\n    \"gno.
          },
          {
            "name": "realms.gno",
            "body": "// base implementation\npackage zentasktic\n\nimport (\n\t
          },
          {
            "name": "tasks.gno",
            "body": "package zentasktic\n\nimport (\n\t\"time\"\n\n\t\"gno.land/
          },
          {
            "name": "tasks_test.gno",
            "body": "package zentasktic\n\nimport (\n\t\"testing\"\n\n    \"gno.
          }
        ]
      },
      "deposit": ""
    }
  ],
  "fee": {
    "gas_wanted": "10000000",
    "gas_fee": "1000000ugnot"
  },
  "signatures": [],
  "memo": ""
}

-- tx2.tx --
{
  "msg": [
    {
      "@type": "/vm.m_addpkg",
      "creator": "g1jg8mtutu9khhfwc4nxmuhcpftf0pajdhfvsqf5",
      "package": {
        "name": "zentasktic",
        "path": "gno.land/p/g17ernafy6ctpcz6uepfsq2js8x2vz0wladh5yc3/zentasktic"
        "files": [
          {
            "name": "README.md",
            "body": "# ZenTasktic Core\n\nA basic, minimalisitc Asess-Decide-Do 
          },
          {
            "name": "collections.gno",
            "body": "// base implementation\npackage zentasktic\n\nimport (\n\t
          },
          {
            "name": "collections_test.gno",
            "body": "package zentasktic\n\nimport (\n\t\"testing\"\n\n    \"gno.
          },
          {
            "name": "contexts.gno",
            "body": "// base implementation\npackage zentasktic\n\nimport (\n
          },
          {
            "name": "contexts_test.gno",
            "body": "package zentasktic\n\nimport (\n\t\"testing\"\n\n    \"gno.
          },
          {
            "name": "core.gno",
            "body": "// base implementation\npackage zentasktic\n\nimport (\n\t
          },
          {
            "name": "errors.gno",
            "body": "package zentasktic\n\nimport \"errors\"\n\nvar (\n\tErrTask
          },
          {
            "name": "marshals.gno",
            "body": "package zentasktic\n\nimport (\n\t\"bytes\"\n)\n\n\ntype Co
          },
          {
            "name": "projects.gno",
            "body": "// base implementation\npackage zentasktic\n\nimport (\n\t
          },
          {
            "name": "projects_test.gno",
            "body": "package zentasktic\n\nimport (\n\t\"testing\"\n\n    \"gno.
          },
          {
            "name": "realms.gno",
            "body": "// base implementation\npackage zentasktic\n\nimport (\n\t
          },
          {
            "name": "tasks.gno",
            "body": "package zentasktic\n\nimport (\n\t\"time\"\n\n\t\"gno.land/
          },
          {
            "name": "tasks_test.gno",
            "body": "package zentasktic\n\nimport (\n\t\"testing\"\n\n    \"gno.
          }
        ]
      },
      "deposit": ""
    }
  ],
  "fee": {
    "gas_wanted": "15000000",
    "gas_fee": "1000000ugnot"
  },
  "signatures": [],
  "memo": ""
}

-- tx3.tx --
{
  "msg": [
    {
      "@type": "/vm.m_addpkg",
      "creator": "g1jg8mtutu9khhfwc4nxmuhcpftf0pajdhfvsqf5",
      "package": {
        "name": "zentasktic_core",
        "path": "gno.land/r/g17ernafy6ctpcz6uepfsq2js8x2vz0wladh5yc3/zentasktic_
        "files": [
          {
            "name": "workable.gno",
            "body": "package zentasktic_core\n\ntype Workable interface {\n\t// 
          },
          {
            "name": "wrapper.gno",
            "body": "package zentasktic_core\n\n\nimport (\n\t\"strconv\"\n\n\t
          }
        ]
      },
      "deposit": ""
    }
  ],
  "fee": {
    "gas_wanted": "15000000",
    "gas_fee": "1000000ugnot"
  },
  "signatures": [],
  "memo": ""
}

@thehowl
Copy link
Member Author

thehowl commented Jul 25, 2024

The ideal solution if we omit gas calculations just for this sentence seems solution 1, right?

In what way does solution 1 affect gas calculations? Is it like: Malory volontarily fails tx on low gas so next users pay a lot of gas to reload stuffs that were discarded from memory kind of bad?

Well, the txtar is actually already a good example of how fixing this bug breaks gas calculations. This is the output:

        > ! gnokey broadcast $WORK/tx1.tx
        [stderr]
        "gnokey" error: --= Error =--
        Data: transaction failed [...]
        log out of gas, gasWanted: 10000000, gasUsed: 11478554 location: WritePerByte
        Msg Traces:
        --= /Error =--
[...]
        > gnokey broadcast $WORK/tx2.tx
        [stdout]

        OK!
        GAS WANTED: 15000000
        GAS USED:   7851998
        HEIGHT:     87
        EVENTS:     []

These two transactions are identical, except for the gas wanted. But the first panics because it exceeds the 10M gas limit. The second one succeeds.. but note that the gas used is now 7851998; ie. less than 10M.

In fact, you get the same bug if instead of having tx1 and tx2, you only use tx1 (but re-sign it to have a different account-number); the first time around it fails, the second time it succeeds.

This is because on the second time there's already data about the objects it's trying to save to the store, in cacheTypes. So... it hits the store less, and consequently it hits the gas counter less (in tm2/pkg/store/gas/store.go).

Keeping test4 as is while fixing the bug and maintaining the same gas used for all replayed transactions is hard. Which is why we're now looking at doing 1 (with #2319) and probably doing a chain reset / new chain.

If so, before ditching solution 1... in #2319, Jae talked about a short-term solution for rollback issues (you explored a different solution in #2504, consisting of preloading all packages at start):

I think the easiest way to solve this problem right now is to change the implementation of the usage of the type checker so that it doesn't call store.GetMemPackage, but only reads the mempackage through ReadMemPackage().
The type checker doesn't care about gno state, I think it only cares about the source code, so there's no need to run "GetMemPackage" which also runs the mempackage.

Couldn't this work and help here?

the changes have already been merged in #2504 and are already part of the test4 binary (the bug is very related, but with #2504 we made it so at least it's not possible that a failing transaction makes a stdlib package "unimportable"). This is another problem, relating to the lack of rollback functionality in cacheTypes (and probably cacheNodes) as well.

@thehowl
Copy link
Member Author

thehowl commented Aug 5, 2024

Merging to #2319

@thehowl thehowl closed this Aug 5, 2024
@thehowl thehowl deleted the dev/morgan/cache-types-test4-mess branch September 3, 2024 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📦 ⛰️ gno.land Issues or PRs gno.land package related
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants