feat: add nonce column to actor_states #1105

kasteph · 2022-12-22T05:32:58Z

This PR:

persists the actors nonce in the actor_states table and;
adds the nonce column to the actor_states table in the schema.
- the nonce column has a UNIQUE constraint on it but still nullable (postgres docs)
- I didn't wanna make it into a PK bc we'll need to fill existing rows with some unique values and I don't see the need to create yet another table which will be added overhead for Lily users who are using Postgres for storage. Adding another table would mean creating a view to unify these two tables (actor_states) and (actor_states_with_nonce).

Resolves #1104

codecov-commenter · 2022-12-22T05:47:06Z

Codecov Report

Merging #1105 (f29e1f5) into master (852e26b) will not change coverage.
The diff coverage is n/a.

Additional details and impacted files

@@          Coverage Diff           @@
##           master   #1105   +/-   ##
======================================
  Coverage    32.8%   32.8%           
======================================
  Files          46      46           
  Lines        3156    3156           
======================================
  Hits         1037    1037           
  Misses       2023    2023           
  Partials       96      96

placer14 · 2023-01-03T14:20:51Z

Since the column is nullable with no default defined, the default should be NULL?
Would the constraint fail on tables with multiple rows having the same (height,head,nonce) values? I believe that given we already have dupe rows w (height,head) adding NULL nonce values won't make those unique. Correct?

I suspect this migration will not apply cleanly and I would really prefer to test this on a copy of that table in our production DB before we accept/merge this.

placer14 · 2023-01-03T14:26:58Z

I don't see the need to create yet another table which will be added overhead for Lily users who are using Postgres for storage. Adding another table would mean creating a view to unify these two tables (actor_states) and (actor_states_with_nonce).

Honestly, the view approach seems best here. We're making the schema a "lie" by indicating this column is nullable (which is impossible in the domain). This will require those who consume this schema to understand the semantics of NULL here (which are completely an operational detail and will only confuse those who are looking carefully).

And we're saying there's "overhead" in a view, but if we rename the existing table and put it and the new able behind a view called actor_states it should be seemless to the consumer. Am I missing some other drawbacks?

kasteph · 2023-01-06T09:09:37Z

Thanks for taking the time to review this and the feedback!

Since the column is nullable with no default defined, the default should be NULL?

Yes.

Would the constraint fail on tables with multiple rows having the same (height,head,nonce) values?

If there are multiple rows with the same (height,head,nonce) values then it will violate that constraint per the Postgres doc:

In general, a unique constraint is violated if there is more than one row in the table where the values of all of the columns included in the constraint are equal. source

I believe that given we already have dupe rows w (height,head) adding NULL nonce values won't make those unique. Correct?

That's a good catch, thanks. I'll add the code column as well so that it will be (height,head,code,nonce). The first three columns are already part of the PK.

I suspect this migration will not apply cleanly and I would really prefer to test this on a copy of that table in our production DB before we accept/merge this.

That sounds good.

We're making the schema a "lie" by indicating this column is nullable (which is impossible in the domain). This will require those who consume this schema to understand the semantics of NULL here (which are completely an operational detail and will only confuse those who are looking carefully).

Do you mind expanding what you mean by the semantics of NULL? From my understanding, it is a marker for missing or unknown values in RDBMSes, according to the creator of RDBMSes. Thus it makes sense to me that it could be accept a NULL value since the nonce was not captured for this task in the first place and therefore a good candidate for a missing or unknown value.

And we're saying there's "overhead" in a view, but if we rename the existing table and put it and the new able behind a view called actor_states it should be seemless to the consumer. Am I missing some other drawbacks?

It is much simpler to include in the column comment that this column will have null values before this change is applied because it was not in the application code rather than create a new table, rename the old table, create a view that unifies these tables, and then document that.

The method you describe is seamless to the consumer, certainly, but there are also people who operate Lily for whom such tasks is just extra work (and sometimes not easy to find and can only be found in the CHANGELOG or hidden deep within the docs -- if it even is, for e.g., miner_sector_infos and miner_sector_infos_v7).

frrist · 2022-12-22T13:35:03Z

model/actors/common/actors.go

@@ -79,6 +79,8 @@ type ActorState struct {
 	Code string `pg:",pk,notnull"`
 	// Top level of state data as json.
 	State string `pg:",type:jsonb,notnull"`
+	// The next Actor nonce that is expected to appear on chain.


Since we're already making changes here, let's add a state_root field too. It will make the table re-org safe.

Sounds good, let me think harder about the schema then because I don't wanna overcomplicate things but also make sure it's correct.

frrist · 2023-01-11T17:38:56Z

schemas/v1/13_actorstates_nonce.go

+		`
+ALTER TABLE {{ .SchemaName | default "public"}}.actor_states
+  ADD COLUMN nonce BIGINT,
+  ADD CONSTRAINT actor_states_uniq_nullable_nonce UNIQUE (height,head,nonce);


I think we should replace height with state_root as it's reorg safe.

placer14 · 2023-01-12T21:11:18Z

Do you mind expanding what you mean by the semantics of NULL? From my understanding, it is a marker for missing or unknown values in RDBMSes, according to the creator of RDBMSes. Thus it makes sense to me that it could be accept a NULL value since the nonce was not captured for this task in the first place and therefore a good candidate for a missing or unknown value.

The schema is all there is as far as a def of the shape of data. NULL is a placeholder for an empty value for us here because we don't know retroactively what these values are suppose to be. We are adding this piece of semantic knowledge as a comment on a column (maybe) but this knowledge does not appear in other use cases where it is important and easily overlooked (for the same reason it would be lost in the CHANGELOG). (Such as how an ORM would interpret the schema automagically, or how certain query will behave in the face of a NULLable column vs not, or even writing the query in a way that gives us the answer we expected in this specific context (which could be surprisingly different from others).

Ideally, the semantics of our internal operations should influence this schema as little as possible. That said, if we're introducing these semantics to simplify dev/op efforts, that's one thing and we've made such concessions in the past. But we've also tried to make schemas mostly read-only and using new models to represent schema changes is the most honest way of tracking how our exports evolve over time. We attempted this with the miner_sector_info_v7 table. Whether we do one or the other is fine, but doing both is hurting us in both ways (making schema semantics fuzzy and complicating our dev/ops lives).

If we are going to deprecate this schema in favor of something slimmer in the future once IDL work lands, then I'm much less concerned about these things in the short term. I wanted to raise this concern (as I have in the past) and make sure we are aware of the costs we are paying.

placer14 · 2023-01-12T21:16:13Z

It is much simpler to include in the column comment that this column will have null values before this change is applied because it was not in the application code rather than create a new table, rename the old table, create a view that unifies these tables, and then document that.

Also, let's not trivialize the differences here. Creating a comment is easy for this PR, but I will need to spend cycles to verify production doesn't break/degrade due to this migration. We are trading off avoidance of one complexity for taking on another one at runtime. (Which, also as a operator, I'm not a fan of.)

placer14 · 2023-01-12T21:19:59Z

@kasteph I hope this doesn't come off as nitpicky. I appreciate you submitting the PR for this and I won't block the merge if others want to go this way. Please let me know once the migration is ready for testing and I'll run it in staging for a bit to measure/test.

birdychang · 2023-01-23T22:46:59Z

@frrist @kasteph @placer14 Should we just move forward?

placer14 · 2023-01-24T19:54:38Z

There is a risk that this migration is expensive when we roll out to production and ends up interrupting services while the database blocked writing to this table.

To mitigate this risk, I would like to dupe the actors table in production and test this migration first. I was finishing the removal of data from production so I can have room on disk for this but we could spin up a duplicate instance from backup and run the test on that and pay the cost of a single instance for a day (~$300?). I can start that process now and will report back with results from the test.

We could also handle the rollout manually by turning off actor tasks during migration and then gapfilling them after it completes. This would only affect data from the disabled tasks. I'm not sure about the scope of impact of data missing from this task on alerts and monitoring/analysis capabilities, but I'm certain we use this data and is non-zero.

Lastly, we could avoid testing if this were a new table and not an alter operation. This is guaranteed to have no affect on production.

If you want to abort the migration testing on a new instance, please ping me and I'll tear it down.

placer14 · 2023-01-24T20:09:40Z

Looks like restoring a production backup to a duplicate instance is broken in the UI. (!!!) I have an open request to support to investigate that.

placer14 · 2023-01-26T19:01:53Z

We discussed that this migration is not necessary. In order to avoid work needed to test/land this, we will postpone discussion on this improvement until some time later if needed. I have abandoned the migration testing work I mentioned previously.

kasteph added 2 commits December 22, 2022 14:32

feat: add nonce column to actor_states

f32bdec

fix: patch number

464728e

kasteph requested review from frrist, davidgasquez, placer14 and birdychang December 22, 2022 05:38

kasteph marked this pull request as ready for review December 22, 2022 05:45

kasteph added 4 commits January 3, 2023 12:46

chore: add uniq_nullable on nonce

d9fa6d6

forgot a comma

549ee02

timescaledb requires ts to be included in index

f29e1f5

chore: rename constraint

abe4b02

kasteph mentioned this pull request Jan 3, 2023

chore: add unique constraint on vm_messages.params #1107

Closed

frrist reviewed Jan 11, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add nonce column to actor_states #1105

feat: add nonce column to actor_states #1105

kasteph commented Dec 22, 2022 •

edited

Loading

codecov-commenter commented Dec 22, 2022 •

edited

Loading

placer14 commented Jan 3, 2023

placer14 commented Jan 3, 2023

kasteph commented Jan 6, 2023

frrist Dec 22, 2022

kasteph Jan 12, 2023

frrist Jan 11, 2023

placer14 commented Jan 12, 2023 •

edited

Loading

placer14 commented Jan 12, 2023

placer14 commented Jan 12, 2023

birdychang commented Jan 23, 2023

placer14 commented Jan 24, 2023 •

edited

Loading

placer14 commented Jan 24, 2023 •

edited

Loading

placer14 commented Jan 26, 2023

feat: add nonce column to actor_states #1105

Are you sure you want to change the base?

feat: add nonce column to actor_states #1105

Conversation

kasteph commented Dec 22, 2022 • edited Loading

codecov-commenter commented Dec 22, 2022 • edited Loading

Codecov Report

placer14 commented Jan 3, 2023

placer14 commented Jan 3, 2023

kasteph commented Jan 6, 2023

frrist Dec 22, 2022

Choose a reason for hiding this comment

kasteph Jan 12, 2023

Choose a reason for hiding this comment

frrist Jan 11, 2023

Choose a reason for hiding this comment

placer14 commented Jan 12, 2023 • edited Loading

placer14 commented Jan 12, 2023

placer14 commented Jan 12, 2023

birdychang commented Jan 23, 2023

placer14 commented Jan 24, 2023 • edited Loading

placer14 commented Jan 24, 2023 • edited Loading

placer14 commented Jan 26, 2023

kasteph commented Dec 22, 2022 •

edited

Loading

codecov-commenter commented Dec 22, 2022 •

edited

Loading

placer14 commented Jan 12, 2023 •

edited

Loading

placer14 commented Jan 24, 2023 •

edited

Loading

placer14 commented Jan 24, 2023 •

edited

Loading