BigQuery not always refreshing cache when changes are made to SQL #9

richiecroker · 2019-03-05T10:59:49Z

I changed the SQL slightly, and the notebook continued to use the old cache, rather than rerunning.

OLD code

sql = """
SELECT
  rx.month,
  rx.pct,
  SUM(CASE
      WHEN cmpa.type = 'AAF' THEN items --calculate AAF items
      ELSE 0 END) AS aaf_items,
  SUM(CASE
      WHEN cmpa.type = 'AAF' THEN actual_cost -- calculate AAF cost
      ELSE 0 END) AS aaf_cost,
  SUM(items) AS all_items,
  SUM(actual_cost) AS all_cost,
  100*IEEE_DIVIDE(SUM(CASE
        WHEN cmpa.type = 'AAF' THEN items
        ELSE 0 END), SUM(items)) AS aaf_percent_items,
  -- calculate AAF items proportion
  100*IEEE_DIVIDE(SUM(CASE
        WHEN cmpa.type = 'AAF' THEN actual_cost
        ELSE 0 END), SUM(actual_cost)) AS aaf_percent_cost  -- calculate AAF cost proportion
FROM
  hscic.normalised_prescribing_standard AS rx
JOIN
  measures.cmpa_products AS cmpa --join with CPMA products table
ON
  rx.bnf_code=cmpa.bnf_code
WHERE
  rx.bnf_code IN (
  SELECT
    bnf_code
  FROM
    measures.cmpa_products)
GROUP BY
  month,
  pct
ORDER BY
  month

NEW code:

sql = """
SELECT
  rx.month,
  rx.pct,
  SUM(CASE
      WHEN cmpa.type = 'AAF' THEN items --calculate AAF items
      ELSE 0 END) AS aaf_items,
  SUM(CASE
      WHEN cmpa.type = 'AAF' THEN actual_cost -- calculate AAF cost
      ELSE 0 END) AS aaf_cost,
  SUM(items) AS all_items,
  SUM(actual_cost) AS all_cost,
  IEEE_DIVIDE(SUM(CASE
        WHEN cmpa.type = 'AAF' THEN items
        ELSE 0 END), SUM(items)) AS aaf_percent_items,
  -- calculate AAF items proportion
  IEEE_DIVIDE(SUM(CASE
        WHEN cmpa.type = 'AAF' THEN actual_cost
        ELSE 0 END), SUM(actual_cost)) AS aaf_percent_cost  -- calculate AAF cost proportion
FROM
  hscic.normalised_prescribing_standard AS rx
JOIN
  measures.cmpa_products AS cmpa --join with CPMA products table
ON
  rx.bnf_code=cmpa.bnf_code
WHERE
  rx.bnf_code IN (
  SELECT
    bnf_code
  FROM
    measures.cmpa_products)
GROUP BY
  month,
  pct
ORDER BY
  month

sebbacon · 2019-03-05T14:22:03Z

Unable to reproduce this.

The fact that a cached version should be used is stored in hidden files. The SQL is "fingerprinted" and if a hidden file matching that fingerprint exists, the stored CSV is returned, instead of querying the server.

You can see the specially-named files in the repo. I have checked, and first SQL query corresponds with the file
.cmpa_df.csv.6ebcd2ab442da7ee86f70b71c1a6ec4c.tmp, and the second one corresponds with the file .cmpa_df.csv.bde52ffc25b31ee10b8620eaac866734.tmp.

Therefore I can tell that when you ran it locally it did generate different SQL.

If you delete those hidden files, are you able to reproduce?

Possibly related to #9

sebbacon · 2019-03-05T14:50:15Z

I've just pushed a fix to a bug whereby if you used sql1, then did sql2, then reverted to sql1, it would continue to show the results for sql2. Could this explain your issue?

richiecroker · 2019-03-05T16:47:05Z

yes, it could @sebbacon

richiecroker assigned sebbacon Mar 5, 2019

sebbacon added a commit that referenced this issue Mar 5, 2019

Ensure we don't falsely claim to have cached something

4e16fea

Possibly related to #9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery not always refreshing cache when changes are made to SQL #9

BigQuery not always refreshing cache when changes are made to SQL #9

richiecroker commented Mar 5, 2019

sebbacon commented Mar 5, 2019

sebbacon commented Mar 5, 2019

richiecroker commented Mar 5, 2019

BigQuery not always refreshing cache when changes are made to SQL #9

BigQuery not always refreshing cache when changes are made to SQL #9

Comments

richiecroker commented Mar 5, 2019

sebbacon commented Mar 5, 2019

sebbacon commented Mar 5, 2019

richiecroker commented Mar 5, 2019