Add postgres computation readers #1104

YuhongWang-Amazon · 2023-07-14T18:02:23Z

Implement postgres computation readers

wfa-reviewable · 2023-07-14T18:02:29Z

This change is

YuhongWang-Amazon

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt line 40 at r2 (raw file):

/**
 * Performs read operations on Computations and ComputationStages tables

@renjiezh to follow up on this comment. I moved the requisition and blob queries to its own reader, and reference them from the computation reader. Since we always read Computations and ComputationStages tables together with JOIN statements. Hence, that one is left here.

renjiezh

Reviewed 1 of 6 files at r1, 5 of 5 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @YuhongWang-Amazon)

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationBlobReferenceReader.kt line 76 at r2 (raw file):

   * @return [Map<[Long], [String]?>] for all blobIds of a computation
   */
  suspend fun blobIdToPathMapByDepType(

I wonder if we can rename it as readBlobIdToPathMap. I feel ByDepType here is kinda confusing because the key parameters are actually localId and stage.

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt line 40 at r2 (raw file):

Previously, YuhongWang-Amazon wrote…

@renjiezh to follow up on this comment. I moved the requisition and blob queries to its own reader, and reference them from the computation reader. Since we always read Computations and ComputationStages tables together with JOIN statements. Hence, that one is left here.

Thanks. Yeah, you are right. Since we always read as ComputationToken so it has to link with the current ComputationStage. In the comment, maybe with ComputationStages tables is more precise.

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt line 207 at r2 (raw file):

   * @return [ComputationToken] when a Computation with externalRequisitionKey is found, or null.
   */
  suspend fun readComputationToken(

This way to read and build ComputationToken is much easier to understand thus less error-pruning. However, I wonder if it has worse performance than the older way that using a large single SQL query joining three tables.
Note that, readComputationToken is the most frequently used function so we want to make it as fast as we can. Could you share your insight of the performance impact?

YuhongWang-Amazon

Reviewable status: 4 of 6 files reviewed, 2 unresolved discussions (waiting on @renjiezh)

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationBlobReferenceReader.kt line 76 at r2 (raw file):

Previously, renjiezh wrote…

I wonder if we can rename it as readBlobIdToPathMap. I feel ByDepType here is kinda confusing because the key parameters are actually localId and stage.

Done. Thanks. That's a much better function name.

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt line 40 at r2 (raw file):

Previously, renjiezh wrote…

Thanks. Yeah, you are right. Since we always read as ComputationToken so it has to link with the current ComputationStage. In the comment, maybe with ComputationStages tables is more precise.

done!

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt line 207 at r2 (raw file):

Previously, renjiezh wrote…

This way to read and build ComputationToken is much easier to understand thus less error-pruning. However, I wonder if it has worse performance than the older way that using a large single SQL query joining three tables.
Note that, readComputationToken is the most frequently used function so we want to make it as fast as we can. Could you share your insight of the performance impact?

I don't have much experience with Postgres performances. Since the duchy data schema is relatively simple and the size of the duchy database should not be too large, I think the performance difference between a large single query and multiple smaller queries is probably the overhead of making database requests. (e.g. network latencies).
From my understanding the throughput of the duchy data services will not be too large, hence this overhead should not have much impact on the overall performance.

Since the smaller queries and functions are easier to read and manage, I prefer to keep it this way for now. We can always update this to JOINed queries, if there are any significant performance regressions.

renjiezh

Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @YuhongWang-Amazon)

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt line 207 at r2 (raw file):

Previously, YuhongWang-Amazon wrote…

I don't have much experience with Postgres performances. Since the duchy data schema is relatively simple and the size of the duchy database should not be too large, I think the performance difference between a large single query and multiple smaller queries is probably the overhead of making database requests. (e.g. network latencies).
From my understanding the throughput of the duchy data services will not be too large, hence this overhead should not have much impact on the overall performance.

Since the smaller queries and functions are easier to read and manage, I prefer to keep it this way for now. We can always update this to JOINed queries, if there are any significant performance regressions.

Sounds good to me.

renjiezh

Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @YuhongWang-Amazon)

SanjayVas

Reviewed 1 of 6 files at r1, 3 of 5 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @YuhongWang-Amazon)

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationBlobReferenceReader.kt line 30 at r3 (raw file):

  /**
   * Gets the [ComputationBlobDependency] of a computation.

nit

Suggestion:

Reads

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationBlobReferenceReader.kt line 113 at r3 (raw file):

   * @param readContext The transaction context for reading from the Postgres database.
   * @param localComputationId A local identifier for a computation
   * @return A list of computation blob keys

FYI the @return line isn't required if it's obvious, e.g. from the summary fragment. Of course, it's never incorrect to have it.

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationBlobReferenceReader.kt line 175 at r3 (raw file):

      dependencyType = row.getProtoEnum("DependencyType", ComputationBlobDependency::forNumber)
    }
  }

nit: prefer the more conventional if (namedVariable != null) { ... } over namedVariable?.let { ... } when smart casts can ensure that namedVariable is non-null inside the then branch.

This case is using it to avoid an extra named variable as well, so the choice is yours. I personally think that it's worth the extra variable in this case.

Suggestion:

  private fun buildBlobMetadata(row: ResultRow): ComputationStageBlobMetadata {
    val path = row.get<String?>("PathToBlob")
    return computationStageBlobMetadata {
      blobId = row["BlobId"]
      if (path != null) {
        this.path = path
      }
      dependencyType = row.getProtoEnum("DependencyType", ComputationBlobDependency::forNumber)
    }
  }

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt line 278 at r3 (raw file):

        bind("$1", computationTypes[0])
        updatedBefore?.let { bind("$2", it) }
      }

Do separate bound statements rather than one with conditional binding when there's a difference in the number of bound variables. Note that it might be clearer to avoid having any bound variables in baseSql. If you look at the Kingdom's Spanner readers, baseSql tends not to include the WHERE clause at all.

Also prefer the more conventional if (namedVariable != null) { ... } over namedVariable?.let { ... } when smart casts can ensure that namedVariable is non-null inside the appropriate branch.

Suggestion:

    val query =
      if (updatedBefore == null) {
        boundStatement(baseSql) {
          bind("$1", computationTypes[0])
        }
      } else {
        boundStatement(baseSql + " AND UpdatedTime <= $2") {
          bind("$1", computationTypes[0])
          bind("$2", updatedBefore)
        }
      }

YuhongWang-Amazon

Reviewable status: 2 of 25 files reviewed, 1 unresolved discussion (waiting on @renjiezh and @SanjayVas)

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationBlobReferenceReader.kt line 113 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

FYI the @return line isn't required if it's obvious, e.g. from the summary fragment. Of course, it's never incorrect to have it.

Thanks. I will keep these returns here but keep that in mind for future works.

src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt line 278 at r3 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Do separate bound statements rather than one with conditional binding when there's a difference in the number of bound variables. Note that it might be clearer to avoid having any bound variables in baseSql. If you look at the Kingdom's Spanner readers, baseSql tends not to include the WHERE clause at all.

Also prefer the more conventional if (namedVariable != null) { ... } over namedVariable?.let { ... } when smart casts can ensure that namedVariable is non-null inside the appropriate branch.

Done.

SanjayVas

Reviewed 3 of 23 files at r4, 19 of 20 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @YuhongWang-Amazon)

Implement postgres computation readers

YuhongWang-Amazon added 2 commits July 14, 2023 16:24

Migrate duchy postgre package to deploy/common

75ad68a

Implement postgres computation readers

239ae19

YuhongWang-Amazon added 3 commits July 14, 2023 18:31

fix

3cff349

refactor

b8e1ba1

update doc string

2c9ed62

YuhongWang-Amazon commented Jul 14, 2023

View reviewed changes

YuhongWang-Amazon requested a review from renjiezh July 14, 2023 20:40

YuhongWang-Amazon changed the title ~~Implement postgres computation readers~~ Add postgres computation readers Jul 17, 2023

renjiezh requested changes Jul 17, 2023

View reviewed changes

feedbacks

ba5e04b

YuhongWang-Amazon commented Jul 17, 2023

View reviewed changes

renjiezh requested changes Jul 18, 2023

View reviewed changes

renjiezh approved these changes Jul 18, 2023

View reviewed changes

YuhongWang-Amazon requested a review from SanjayVas July 18, 2023 14:01

SanjayVas requested changes Jul 18, 2023

View reviewed changes

Base automatically changed from yuhong.postgres-duchy-readers to main July 19, 2023 13:57

Resolve feedbacks

0975d15

YuhongWang-Amazon commented Jul 19, 2023

View reviewed changes

YuhongWang-Amazon and others added 2 commits July 19, 2023 10:30

Merge branch 'main' into yuhong.postgres-duchy-readers-implementations

fb1eb4f

fix typo

4ea3509

SanjayVas approved these changes Jul 19, 2023

View reviewed changes

Merge main into yuhong.postgres-duchy-readers-implementations

d602b26

YuhongWang-Amazon enabled auto-merge (squash) July 19, 2023 18:12

YuhongWang-Amazon merged commit 951d61e into main Jul 19, 2023

YuhongWang-Amazon deleted the yuhong.postgres-duchy-readers-implementations branch July 19, 2023 18:19

ple13 pushed a commit that referenced this pull request Aug 16, 2024

Add postgres computation readers (#1104)

2a93084

Implement postgres computation readers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add postgres computation readers #1104

Add postgres computation readers #1104

YuhongWang-Amazon commented Jul 14, 2023

wfa-reviewable commented Jul 14, 2023

YuhongWang-Amazon left a comment

renjiezh left a comment

YuhongWang-Amazon left a comment

renjiezh left a comment

renjiezh left a comment

SanjayVas left a comment

YuhongWang-Amazon left a comment

SanjayVas left a comment

Add postgres computation readers #1104

Add postgres computation readers #1104

Conversation

YuhongWang-Amazon commented Jul 14, 2023

wfa-reviewable commented Jul 14, 2023

YuhongWang-Amazon left a comment

Choose a reason for hiding this comment

renjiezh left a comment

Choose a reason for hiding this comment

YuhongWang-Amazon left a comment

Choose a reason for hiding this comment

renjiezh left a comment

Choose a reason for hiding this comment

renjiezh left a comment

Choose a reason for hiding this comment

SanjayVas left a comment

Choose a reason for hiding this comment

YuhongWang-Amazon left a comment

Choose a reason for hiding this comment

SanjayVas left a comment

Choose a reason for hiding this comment