Streaming exports (#1826)

* Switch to non-native Postgres client. And add a "streaming" API for making database queries, which streams the results from the database to Node as they are generated by Postgres. This allows Node to process the rows one by one (and garbage collect in between), which is much easier on the VM when we need to do big queries that summarize data (or just format it and incrementally spit it out an HTTP response). * Mostly refactoring. This moves the handle_GET_reportExport route into its own file, which necessitated refactoring some other things (zinvite and pca) out of server.ts as well. Chipping away at the monolith. This also converts the votes.csv report to use the streaming query from Postgres, which is mostly a smoke test. It seems to work, so next I'll convert it to stream the results incrementally to the HTTP response as well. * Split each report into separate function. * Count up comment votes in single pass over votes table. There was actually a bug in the old SQL that aggregated votes from _all_ conversations instead of just the conversation in question, which is why it took 30 seconds to run. With that bug fixed, even the super slow "do a full subquery for each comment row" was actually quite fast. But this is way cheaper/faster. * Add participant-votes.csv export. * Switch to non-native Postgres client. And add a "streaming" API for making database queries, which streams the results from the database to Node as they are generated by Postgres. This allows Node to process the rows one by one (and garbage collect in between), which is much easier on the VM when we need to do big queries that summarize data (or just format it and incrementally spit it out an HTTP response). * Mostly refactoring. This moves the handle_GET_reportExport route into its own file, which necessitated refactoring some other things (zinvite and pca) out of server.ts as well. Chipping away at the monolith. This also converts the votes.csv report to use the streaming query from Postgres, which is mostly a smoke test. It seems to work, so next I'll convert it to stream the results incrementally to the HTTP response as well. * Split each report into separate function. * Count up comment votes in single pass over votes table. There was actually a bug in the old SQL that aggregated votes from _all_ conversations instead of just the conversation in question, which is why it took 30 seconds to run. With that bug fixed, even the super slow "do a full subquery for each comment row" was actually quite fast. But this is way cheaper/faster. * Add participant-votes.csv export. * Flip vote polarity. In the raw votes table, -1 means agree and 1 means disagree, so we need to count things correctly. And when exporting votes in participant votes, we flip the sign so that 1 means agree and -1 means disagree. * Properly escape comment text. * add votes matrix, show data license preprod, logging. --------- Co-authored-by: Michael Bayne <[email protected]>
compdemocracy · Oct 22, 2024 · 61d2940 · 61d2940
1 parent c60752b
commit 61d2940
Show file tree

Hide file tree

Showing 9 changed files with 976 additions and 647 deletions.
diff --git a/client-report/src/components/overview.js b/client-report/src/components/overview.js
@@ -27,6 +27,9 @@ const Number = ({ number, label }) => (
 
 const pathname = window.location.pathname; // "/report/2arcefpshi"
 const report_id = pathname.split("/")[2];
+const doShowDataLicenseTerms = ["pol.is", "preprod.pol.is", "localhost"].includes(
+  window.location.hostname
+);
 
 const getCurrentTimestamp = () => {
   const now = new Date();
@@ -147,6 +150,16 @@ const Overview = ({
           </a>
           {` (as event log)`}
         </p>
+        <p style={{ fontFamily: "monospace" }}>
+          {`---Votes matrix: `}
+          <a
+            download={getDownloadFilename("participant-votes", conversation)}
+            href={`http://${window.location.hostname}/api/v3/reportExport/${report_id}/participant-votes.csv`}
+          >
+            {getDownloadFilename("participant-votes", conversation)}
+          </a>
+          {` (as comments x participants matrix)`}
+        </p>
         <div style={{ marginTop: "3em" }}>
           <p style={{ fontFamily: "monospace" }}>
             <strong>Public API endpoints (read only, Jupyter notebook friendly)</strong>
@@ -160,36 +173,36 @@ const Overview = ({
           <p style={{ fontFamily: "monospace" }}>
             {`$ curl http://${window.location.hostname}/api/v3/reportExport/${report_id}/votes.csv`}
           </p>
+          <p style={{ fontFamily: "monospace" }}>
+            {`$ curl http://${window.location.hostname}/api/v3/reportExport/${report_id}/participant-votes.csv`}
+          </p>
         </div>
-        {window.location.hostname === "pol.is" ||
-          (window.location.hostname === "localhost" && (
-            <div style={{ marginTop: "3em" }}>
-              <p style={{ fontFamily: "monospace" }}>
-                <strong>Attribution of Polis Data</strong>
-              </p>
-
-              <p style={{ fontFamily: "monospace" }}>
-                All Polis data is licensed under a Creative Commons Attribution 4.0 International
-                license: https://creativecommons.org/licenses/by/4.0/
-              </p>
-              <p style={{ fontFamily: "monospace" }}>
-                --------------- BEGIN STATEMENT ---------------
-              </p>
-              <p
-                style={{ fontFamily: "monospace" }}
-              >{`Data was gathered using the Polis software (see: compdemocracy.org/polis and github.com/compdemocracy/polis) and is sub-licensed
+        {doShowDataLicenseTerms && (
+          <div style={{ marginTop: "3em" }}>
+            <p style={{ fontFamily: "monospace" }}>
+              <strong>Attribution of Polis Data</strong>
+            </p>
+
+            <p style={{ fontFamily: "monospace" }}>
+              All Polis data is licensed under a Creative Commons Attribution 4.0 International
+              license: https://creativecommons.org/licenses/by/4.0/
+            </p>
+            <p style={{ fontFamily: "monospace" }}>
+              --------------- BEGIN STATEMENT ---------------
+            </p>
+            <p
+              style={{ fontFamily: "monospace" }}
+            >{`Data was gathered using the Polis software (see: compdemocracy.org/polis and github.com/compdemocracy/polis) and is sub-licensed
           under CC BY 4.0 with Attribution to The Computational Democracy Project. The data and more
           information about how the data was collected can be found at the following link: ${window.location.href}`}</p>
-              <p style={{ fontFamily: "monospace" }}>
-                --------------- END STATEMENT---------------
-              </p>
-              <p style={{ fontFamily: "monospace" }}>
-                For further information on best practices for Attribution of CC 4.0 licensed content
-                Please see:
-                https://wiki.creativecommons.org/wiki/Best_practices_for_attribution#Title.2C_Author.2C_Source.2C_License
-              </p>
-            </div>
-          ))}
+            <p style={{ fontFamily: "monospace" }}>--------------- END STATEMENT---------------</p>
+            <p style={{ fontFamily: "monospace" }}>
+              For further information on best practices for Attribution of CC 4.0 licensed content
+              Please see:
+              https://wiki.creativecommons.org/wiki/Best_practices_for_attribution#Title.2C_Author.2C_Source.2C_License
+            </p>
+          </div>
+        )}
       </div>
     </div>
   );

diff --git a/server/package-lock.json b/server/package-lock.json
diff --git a/server/package.json b/server/package.json
@@ -51,7 +51,7 @@
     "p3p": "~0.0.2",
     "pg": "~8.8.0",
     "pg-connection-string": "~2.5.0",
-    "pg-native": "~3.0.1",
+    "pg-query-stream": "^4.6.0",
     "replacestream": "~4.0.0",
     "request": "~2.88.2",
     "request-promise": "~4.2.6",