Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use CTE in Enrollment analytics queries [DHIS-16705] #19519

Merged
merged 45 commits into from
Jan 22, 2025

Conversation

luciano-fiandesio
Copy link
Contributor

@luciano-fiandesio luciano-fiandesio commented Dec 18, 2024

Summary

This PR refactors the SQL query generation process for Enrollment analytics to enhance compatibility with Doris by utilizing Common Table Expressions (CTEs) instead of sub-queries.

Problem

Previously, SQL queries for Enrollment analytics leveraged sub-queries to fetch event values from analytics_event_* tables. While functional in Postgres, these queries were incompatible with Doris due to its limitations in handling nested sub-queries correlated with outer query layers.

Example of the original query:

select
    enrollment,
    ...
    ax."ou",
    (select
         "fyjPqlHE7Dn"
     from
	 analytics_event_M3xtLkYBlKI
     where
	 analytics_event_M3xtLkYBlKI.eventstatus != 'SCHEDULE'
	 and analytics_event_M3xtLkYBlKI.enrollment = ax.enrollment
	 and ps = 'CWaAcQYKVpq'
limit 1 ) as "CWaAcQYKVpq[-1].fyjPqlHE7Dn",
from
     analytics_enrollment_m3xtlkyblki as ax

Mitigation

This refactor converts sub-queries into Common Table Expressions (CTEs), which are supported by Doris. CTEs simplify the queries, making them more modular and readable, thereby improving performance and compatibility.

Example of the refactored query using CTE:

with ps_cwaacqykvpq_fyjpqlhe7dn as (
select
	distinct on
	(enrollment) enrollment,
	"fyjPqlHE7Dn" as value
from
	analytics_event_M3xtLkYBlKI
where
	eventstatus != 'SCHEDULE'
	and ps = 'CWaAcQYKVpq'
order by
	enrollment,
	occurreddate desc,
	created desc)
select
	ax.enrollment,
	...
	ps_cwaacqykvpq_fyjpqlhe7dn.value as "CWaAcQYKVpq.fyjPqlHE7Dn"
from
	analytics_enrollment_m3xtlkyblki as ax
left join ps_cwaacqykvpq_fyjpqlhe7dn on
	ax.enrollment = ps_cwaacqykvpq_fyjpqlhe7dn.enrollment
where
	(((enrollmentdate >= '2021-01-01'
	and enrollmentdate < '2022-01-01')))
	and (ax."uidlevel1" = 'ImspTQPwCqd')

Testing Strategy

The changes have been tested using the e2e project to ensure all tests pass successfully on both Postgres and Doris.

Types of Queries Affected

Both aggregated and non-aggregated Enrollment queries are impacted by this refactor.

Experimental Flag

To prevent potential platform issues, the refactored CTE-based queries are not enabled by default. They can be activated by setting the following system configurations to true:

  • experimentalAnalyticsSqlEngineEnabled
  • or by configuring DHIS2 analytics.database to doris.

Performance Comparison:

  • Original query with sub-select
    Original Query
  • Refactored query with CTEs
    Refactored Query

@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch from 8c33952 to 3e070ce Compare December 18, 2024 16:56
@larshelge larshelge changed the title DHIS-16705 Use CTE in Enrollment analytics queries [DRAFT] Use CTE in Enrollment analytics queries [DHIS-16705] Dec 19, 2024
@larshelge larshelge changed the title [DRAFT] Use CTE in Enrollment analytics queries [DHIS-16705] [DRAFT] fix: Use CTE in Enrollment analytics queries [DHIS-16705] Dec 19, 2024
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch from 5170115 to d8119c0 Compare December 20, 2024 15:19
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch 2 times, most recently from 5c4dba5 to ef43c5b Compare January 3, 2025 16:19
this.cteDefinition = cteDefinition;
this.offsets.add(offset);
// one alias per offset
this.alias = new RandomStringGenerator.Builder().withinRange('a', 'z').build().generate(5);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
Builder.build
should be avoided because it has been deprecated.
this.programIndicatorUid = programIndicatorUid;
this.programStageUid = null;
// ignore offset
this.alias = new RandomStringGenerator.Builder().withinRange('a', 'z').build().generate(5);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
Builder.build
should be avoided because it has been deprecated.
this.programIndicatorUid = null;
this.programStageUid = programStageUid;
// ignore offset
this.alias = new RandomStringGenerator.Builder().withinRange('a', 'z').build().generate(5);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
Builder.build
should be avoided because it has been deprecated.
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch 2 times, most recently from aa533b4 to 5f05bb5 Compare January 6, 2025 13:50
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch 2 times, most recently from 4e085d3 to 62c2237 Compare January 7, 2025 09:09
void contributeCTE(
ProgramIndicator programIndicator,
RelationshipType relationshipType,
AnalyticsType outerSqlEntity,

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'outerSqlEntity' is never used.
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch 2 times, most recently from 79f7563 to 94ab16e Compare January 7, 2025 21:28
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch 2 times, most recently from bc234ee to a4ff98c Compare January 10, 2025 08:07
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch from 8cb7b23 to 895d84c Compare January 20, 2025 16:07
@maikelarabori maikelarabori added the run-api-analytics-tests Enables analytics e2e tests label Jan 21, 2025
@luciano-fiandesio luciano-fiandesio marked this pull request as ready for review January 21, 2025 11:09
Copy link
Contributor

@maikelarabori maikelarabori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes @luciano-fiandesio. 👍
I left a few comments/suggestions, please take a look, thx!

@luciano-fiandesio luciano-fiandesio changed the title [DRAFT] fix: Use CTE in Enrollment analytics queries [DHIS-16705] fix: Use CTE in Enrollment analytics queries [DHIS-16705] Jan 21, 2025
@Getter private final String cteDefinition;

/** The calculated offset * */
@Getter private final List<Integer> offsets = new ArrayList<>();

Check notice

Code scanning / CodeQL

Exposing internal representation Note

getOffsets exposes the internal representation stored in field offsets. The value may be modified
after this call to getOffsets
.
}

if (offset < 0) {
return (-1 * offset);

Check failure

Code scanning / CodeQL

User-controlled data in arithmetic expression High

This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.
This arithmetic expression depends on a
user-provided value
, potentially causing an underflow.

Copilot Autofix AI 5 days ago

To fix the problem, we need to validate the offset value before performing the arithmetic operation. Specifically, we should ensure that the offset value is within a safe range to prevent overflow or underflow. We can achieve this by adding a guard clause that checks if the offset value is within the bounds of Integer.MIN_VALUE and Integer.MAX_VALUE.

  • Add a validation check for the offset value in the computeRowNumberOffset method.
  • If the offset value is outside the safe range, we can either throw an exception or clamp the value to the nearest safe value.
Suggested changeset 1
dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/event/data/JdbcEnrollmentAnalyticsManager.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/event/data/JdbcEnrollmentAnalyticsManager.java b/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/event/data/JdbcEnrollmentAnalyticsManager.java
--- a/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/event/data/JdbcEnrollmentAnalyticsManager.java
+++ b/dhis-2/dhis-services/dhis-service-analytics/src/main/java/org/hisp/dhis/analytics/event/data/JdbcEnrollmentAnalyticsManager.java
@@ -1418,4 +1418,10 @@
     if (offset < 0) {
+      if (offset == Integer.MIN_VALUE) {
+        throw new IllegalArgumentException("Offset value is too small and may cause underflow.");
+      }
       return (-1 * offset);
     } else {
+      if (offset == Integer.MAX_VALUE) {
+        throw new IllegalArgumentException("Offset value is too large and may cause overflow.");
+      }
       return (offset - 1);
EOF
@@ -1418,4 +1418,10 @@
if (offset < 0) {
if (offset == Integer.MIN_VALUE) {
throw new IllegalArgumentException("Offset value is too small and may cause underflow.");
}
return (-1 * offset);
} else {
if (offset == Integer.MAX_VALUE) {
throw new IllegalArgumentException("Offset value is too large and may cause overflow.");
}
return (offset - 1);
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
2 New issues
2 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@larshelge larshelge merged commit 28581b4 into master Jan 22, 2025
15 of 17 checks passed
@larshelge larshelge deleted the DHIS-16705_ENROLLMENT_WITH_CTE branch January 22, 2025 11:21
@gnespolino gnespolino restored the DHIS-16705_ENROLLMENT_WITH_CTE branch January 23, 2025 12:57
@gnespolino gnespolino deleted the DHIS-16705_ENROLLMENT_WITH_CTE branch January 23, 2025 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-api-analytics-tests Enables analytics e2e tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants