Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery not always refreshing cache when changes are made to SQL #9

Open
richiecroker opened this issue Mar 5, 2019 · 3 comments
Open
Assignees

Comments

@richiecroker
Copy link
Contributor

I changed the SQL slightly, and the notebook continued to use the old cache, rather than rerunning.

OLD code

sql = """
SELECT
  rx.month,
  rx.pct,
  SUM(CASE
      WHEN cmpa.type = 'AAF' THEN items --calculate AAF items
      ELSE 0 END) AS aaf_items,
  SUM(CASE
      WHEN cmpa.type = 'AAF' THEN actual_cost -- calculate AAF cost
      ELSE 0 END) AS aaf_cost,
  SUM(items) AS all_items,
  SUM(actual_cost) AS all_cost,
  100*IEEE_DIVIDE(SUM(CASE
        WHEN cmpa.type = 'AAF' THEN items
        ELSE 0 END), SUM(items)) AS aaf_percent_items,
  -- calculate AAF items proportion
  100*IEEE_DIVIDE(SUM(CASE
        WHEN cmpa.type = 'AAF' THEN actual_cost
        ELSE 0 END), SUM(actual_cost)) AS aaf_percent_cost  -- calculate AAF cost proportion
FROM
  hscic.normalised_prescribing_standard AS rx
JOIN
  measures.cmpa_products AS cmpa --join with CPMA products table
ON
  rx.bnf_code=cmpa.bnf_code
WHERE
  rx.bnf_code IN (
  SELECT
    bnf_code
  FROM
    measures.cmpa_products)
GROUP BY
  month,
  pct
ORDER BY
  month

NEW code:

sql = """
SELECT
  rx.month,
  rx.pct,
  SUM(CASE
      WHEN cmpa.type = 'AAF' THEN items --calculate AAF items
      ELSE 0 END) AS aaf_items,
  SUM(CASE
      WHEN cmpa.type = 'AAF' THEN actual_cost -- calculate AAF cost
      ELSE 0 END) AS aaf_cost,
  SUM(items) AS all_items,
  SUM(actual_cost) AS all_cost,
  IEEE_DIVIDE(SUM(CASE
        WHEN cmpa.type = 'AAF' THEN items
        ELSE 0 END), SUM(items)) AS aaf_percent_items,
  -- calculate AAF items proportion
  IEEE_DIVIDE(SUM(CASE
        WHEN cmpa.type = 'AAF' THEN actual_cost
        ELSE 0 END), SUM(actual_cost)) AS aaf_percent_cost  -- calculate AAF cost proportion
FROM
  hscic.normalised_prescribing_standard AS rx
JOIN
  measures.cmpa_products AS cmpa --join with CPMA products table
ON
  rx.bnf_code=cmpa.bnf_code
WHERE
  rx.bnf_code IN (
  SELECT
    bnf_code
  FROM
    measures.cmpa_products)
GROUP BY
  month,
  pct
ORDER BY
  month
@sebbacon
Copy link
Contributor

sebbacon commented Mar 5, 2019

Unable to reproduce this.

The fact that a cached version should be used is stored in hidden files. The SQL is "fingerprinted" and if a hidden file matching that fingerprint exists, the stored CSV is returned, instead of querying the server.

You can see the specially-named files in the repo. I have checked, and first SQL query corresponds with the file
.cmpa_df.csv.6ebcd2ab442da7ee86f70b71c1a6ec4c.tmp, and the second one corresponds with the file .cmpa_df.csv.bde52ffc25b31ee10b8620eaac866734.tmp.

Therefore I can tell that when you ran it locally it did generate different SQL.

If you delete those hidden files, are you able to reproduce?

sebbacon added a commit that referenced this issue Mar 5, 2019
@sebbacon
Copy link
Contributor

sebbacon commented Mar 5, 2019

I've just pushed a fix to a bug whereby if you used sql1, then did sql2, then reverted to sql1, it would continue to show the results for sql2. Could this explain your issue?

@richiecroker
Copy link
Contributor Author

yes, it could @sebbacon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants