-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add postgres computation readers #1104
Add postgres computation readers #1104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt
line 40 at r2 (raw file):
/** * Performs read operations on Computations and ComputationStages tables
@renjiezh to follow up on this comment. I moved the requisition and blob queries to its own reader, and reference them from the computation reader. Since we always read Computations and ComputationStages tables together with JOIN statements. Hence, that one is left here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1, 5 of 5 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @YuhongWang-Amazon)
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationBlobReferenceReader.kt
line 76 at r2 (raw file):
* @return [Map<[Long], [String]?>] for all blobIds of a computation */ suspend fun blobIdToPathMapByDepType(
I wonder if we can rename it as readBlobIdToPathMap
. I feel ByDepType
here is kinda confusing because the key parameters are actually localId and stage.
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt
line 40 at r2 (raw file):
Previously, YuhongWang-Amazon wrote…
@renjiezh to follow up on this comment. I moved the requisition and blob queries to its own reader, and reference them from the computation reader. Since we always read Computations and ComputationStages tables together with JOIN statements. Hence, that one is left here.
Thanks. Yeah, you are right. Since we always read as ComputationToken so it has to link with the current ComputationStage. In the comment, maybe with ComputationStages tables
is more precise.
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt
line 207 at r2 (raw file):
* @return [ComputationToken] when a Computation with externalRequisitionKey is found, or null. */ suspend fun readComputationToken(
This way to read and build ComputationToken is much easier to understand thus less error-pruning. However, I wonder if it has worse performance than the older way that using a large single SQL query joining three tables.
Note that, readComputationToken is the most frequently used function so we want to make it as fast as we can. Could you share your insight of the performance impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 6 files reviewed, 2 unresolved discussions (waiting on @renjiezh)
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationBlobReferenceReader.kt
line 76 at r2 (raw file):
Previously, renjiezh wrote…
I wonder if we can rename it as
readBlobIdToPathMap
. I feelByDepType
here is kinda confusing because the key parameters are actually localId and stage.
Done. Thanks. That's a much better function name.
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt
line 40 at r2 (raw file):
Previously, renjiezh wrote…
Thanks. Yeah, you are right. Since we always read as ComputationToken so it has to link with the current ComputationStage. In the comment, maybe
with ComputationStages tables
is more precise.
done!
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt
line 207 at r2 (raw file):
Previously, renjiezh wrote…
This way to read and build ComputationToken is much easier to understand thus less error-pruning. However, I wonder if it has worse performance than the older way that using a large single SQL query joining three tables.
Note that, readComputationToken is the most frequently used function so we want to make it as fast as we can. Could you share your insight of the performance impact?
I don't have much experience with Postgres performances. Since the duchy data schema is relatively simple and the size of the duchy database should not be too large, I think the performance difference between a large single query and multiple smaller queries is probably the overhead of making database requests. (e.g. network latencies).
From my understanding the throughput of the duchy data services will not be too large, hence this overhead should not have much impact on the overall performance.
Since the smaller queries and functions are easier to read and manage, I prefer to keep it this way for now. We can always update this to JOINed queries, if there are any significant performance regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @YuhongWang-Amazon)
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt
line 207 at r2 (raw file):
Previously, YuhongWang-Amazon wrote…
I don't have much experience with Postgres performances. Since the duchy data schema is relatively simple and the size of the duchy database should not be too large, I think the performance difference between a large single query and multiple smaller queries is probably the overhead of making database requests. (e.g. network latencies).
From my understanding the throughput of the duchy data services will not be too large, hence this overhead should not have much impact on the overall performance.Since the smaller queries and functions are easier to read and manage, I prefer to keep it this way for now. We can always update this to JOINed queries, if there are any significant performance regressions.
Sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @YuhongWang-Amazon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1, 3 of 5 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @YuhongWang-Amazon)
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationBlobReferenceReader.kt
line 30 at r3 (raw file):
/** * Gets the [ComputationBlobDependency] of a computation.
nit
Suggestion:
Reads
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationBlobReferenceReader.kt
line 113 at r3 (raw file):
* @param readContext The transaction context for reading from the Postgres database. * @param localComputationId A local identifier for a computation * @return A list of computation blob keys
FYI the @return
line isn't required if it's obvious, e.g. from the summary fragment. Of course, it's never incorrect to have it.
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationBlobReferenceReader.kt
line 175 at r3 (raw file):
dependencyType = row.getProtoEnum("DependencyType", ComputationBlobDependency::forNumber) } }
nit: prefer the more conventional if (namedVariable != null) { ... }
over namedVariable?.let { ... }
when smart casts can ensure that namedVariable
is non-null
inside the then
branch.
This case is using it to avoid an extra named variable as well, so the choice is yours. I personally think that it's worth the extra variable in this case.
Suggestion:
private fun buildBlobMetadata(row: ResultRow): ComputationStageBlobMetadata {
val path = row.get<String?>("PathToBlob")
return computationStageBlobMetadata {
blobId = row["BlobId"]
if (path != null) {
this.path = path
}
dependencyType = row.getProtoEnum("DependencyType", ComputationBlobDependency::forNumber)
}
}
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt
line 278 at r3 (raw file):
bind("$1", computationTypes[0]) updatedBefore?.let { bind("$2", it) } }
Do separate bound statements rather than one with conditional binding when there's a difference in the number of bound variables. Note that it might be clearer to avoid having any bound variables in baseSql
. If you look at the Kingdom's Spanner readers, baseSql
tends not to include the WHERE
clause at all.
Also prefer the more conventional if (namedVariable != null) { ... }
over namedVariable?.let { ... }
when smart casts can ensure that namedVariable
is non-null
inside the appropriate branch.
Suggestion:
val query =
if (updatedBefore == null) {
boundStatement(baseSql) {
bind("$1", computationTypes[0])
}
} else {
boundStatement(baseSql + " AND UpdatedTime <= $2") {
bind("$1", computationTypes[0])
bind("$2", updatedBefore)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 25 files reviewed, 1 unresolved discussion (waiting on @renjiezh and @SanjayVas)
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationBlobReferenceReader.kt
line 113 at r3 (raw file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
FYI the
@return
line isn't required if it's obvious, e.g. from the summary fragment. Of course, it's never incorrect to have it.
Thanks. I will keep these returns here but keep that in mind for future works.
src/main/kotlin/org/wfanet/measurement/duchy/deploy/common/postgres/readers/ComputationReader.kt
line 278 at r3 (raw file):
Previously, SanjayVas (Sanjay Vasandani) wrote…
Do separate bound statements rather than one with conditional binding when there's a difference in the number of bound variables. Note that it might be clearer to avoid having any bound variables in
baseSql
. If you look at the Kingdom's Spanner readers,baseSql
tends not to include theWHERE
clause at all.Also prefer the more conventional
if (namedVariable != null) { ... }
overnamedVariable?.let { ... }
when smart casts can ensure thatnamedVariable
is non-null
inside the appropriate branch.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 23 files at r4, 19 of 20 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @YuhongWang-Amazon)
Implement postgres computation readers
Implement postgres computation readers