Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This optimization removes many joins from the metrics query generated by the
analysis()
method in the LogRepository; it also removes extra queries and consolidates them into one:As an example, the following queries were generated by the current code to get the percentage value for a categorical metric with a repeated measure of 'most recent', results from both queries have to be used to calculate the percentage:
SELECT "individualEnrollment"."conditionId", count(cast(extracted.value as text)) as result FROM "experiment" "experiment" INNER JOIN "query" "queries" ON "queries"."experimentId"="experiment"."id" INNER JOIN "metric" "metric" ON "metric"."key"="queries"."metricKey" INNER JOIN "metric_log" "metric_logs" ON "metric_logs"."metricKey"="metric"."key" INNER JOIN "log" "logs" ON "logs"."id"="metric_logs"."logId" INNER JOIN (SELECT DISTINCT "individualEnrollment"."userId" as "userId", "individualEnrollment"."experimentId" as "experimentId", "individualEnrollment"."conditionId" as "conditionId" FROM "individual_enrollment" "individualEnrollment") "individualEnrollment" ON "experiment"."id" = "individualEnrollment"."experimentId" AND logs."userId" = "individualEnrollment"."userId" INNER JOIN (SELECT "logs"."data" -> 'masteryWorkspace' -> 'worksheet_grapher_a1_patterns_2step_expr' ->> 'workspaceCompletionStatus' as value, "logs"."id" as id FROM "log" "logs") "extracted" ON extracted.id = "logs"."id" WHERE "metric"."key" = $1 AND "experiment"."id" = $2 AND "queries"."id" = $3 AND logs."updatedAt" = (SELECT max(sqlog."updatedAt") FROM "log" "sqlog" WHERE sqlog."userId" = logs."userId" AND "sqlog"."data" -> 'masteryWorkspace' -> 'worksheet_grapher_a1_patterns_2step_expr' ->> 'workspaceCompletionStatus' IS NOT NULL) AND (cast(extracted.value as text)) = $4 GROUP BY "individualEnrollment"."conditionId"
SELECT "individualEnrollment"."conditionId", count(cast(extracted.value as text)) as result, COUNT(DISTINCT "individualEnrollment"."userId") as "participantsLogged" FROM "experiment" "experiment" INNER JOIN "query" "queries" ON "queries"."experimentId"="experiment"."id" INNER JOIN "metric" "metric" ON "metric"."key"="queries"."metricKey" INNER JOIN "metric_log" "metric_logs" ON "metric_logs"."metricKey"="metric"."key" INNER JOIN "log" "logs" ON "logs"."id"="metric_logs"."logId" INNER JOIN (SELECT DISTINCT "individualEnrollment"."userId" as "userId", "individualEnrollment"."experimentId" as "experimentId", "individualEnrollment"."conditionId" as "conditionId" FROM "individual_enrollment" "individualEnrollment") "individualEnrollment" ON "experiment"."id" = "individualEnrollment"."experimentId" AND logs."userId" = "individualEnrollment"."userId" INNER JOIN (SELECT "logs"."data" -> 'masteryWorkspace' -> 'worksheet_grapher_a1_patterns_2step_expr' ->> 'workspaceCompletionStatus' as value, "logs"."id" as id FROM "log" "logs") "extracted" ON extracted.id = "logs"."id" WHERE "metric"."key" = $1 AND "experiment"."id" = $2 AND "queries"."id" = $3 AND (cast(extracted.value as text)) In ($4, $5) AND logs."updatedAt" = (SELECT max(sqlog."updatedAt") FROM "log" "sqlog" WHERE sqlog."userId" = logs."userId" AND "sqlog"."data" -> 'masteryWorkspace' -> 'worksheet_grapher_a1_patterns_2step_expr' ->> 'workspaceCompletionStatus' IS NOT NULL) GROUP BY "individualEnrollment"."conditionId"
By contrast, the code in this pr generates the following single query, which calculates the percentage:
SELECT "individualEnrollment"."conditionId", cast(count(cast(logs.datum as text)) filter (where logs.datum = 'GRADUATED') as decimal) / cast(count(cast(logs.datum as text)) as decimal) * 100 as result, COUNT(DISTINCT "individualEnrollment"."userId") as "participantsLogged" FROM "individual_enrollment" "individualEnrollment" INNER JOIN (SELECT "userId", "datum" FROM (SELECT "userId", "logs"."data" -> 'masteryWorkspace' -> 'worksheet_grapher_a1_patterns_2step_expr' ->> 'workspaceCompletionStatus' as "datum", row_number() over (partition by "userId" order by "updatedAt" DESC) AS rn FROM "log" "logs" WHERE "logs"."data" -> 'masteryWorkspace' -> 'worksheet_grapher_a1_patterns_2step_expr' ->> 'workspaceCompletionStatus' is not null) "t" WHERE rn = 1) "logs" ON logs."userId"="individualEnrollment"."userId" WHERE "experimentId" = $1 GROUP BY "individualEnrollment"."conditionId"
Note that the subquery that selects the data point of interest (in this case 'workspaceCompletionStatus' ) from the log table for a given user, also selects a 'row_number()', which allows us to select only the first row. That will be the most recent, since we are selecting over (partition by "userId" order by "updatedAt" DESC). In this way, we can get our data and satisfy the repeated measure with just one join on the 'log' table.
For Within Subject queries, the strategy of wrapping the normal query as a subquery and applying the repeated measure on the outer level is retained.