Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added rustc_version to user agent data #10

Merged
merged 2 commits into from
Mar 23, 2022
Merged

Added rustc_version to user agent data #10

merged 2 commits into from
Mar 23, 2022

Conversation

alex
Copy link
Member

@alex alex commented May 21, 2021

Added to pip in pypa/pip#9987

@alex
Copy link
Member Author

alex commented May 21, 2021

Failing CI appears unrelated to the contents of the PR.

@di
Copy link
Member

di commented May 22, 2021

It's due to pypa/pip#9644, #11 will fix it for now.

@alex
Copy link
Member Author

alex commented May 22, 2021

Ah, ok. Will rebase after that's merged.

@di
Copy link
Member

di commented May 24, 2021

Looks like the tests need updated here. We'll also need to update the BigQuery table schema before we can merge this to add the new column.

@alex
Copy link
Member Author

alex commented May 24, 2021

Is the BigQuery table schema in VCS somewhere I can send a PR, or is that something an admin does on the backend?

@alex
Copy link
Member Author

alex commented Mar 12, 2022

@di lost track of this and now following up, is there anything I can do to help move the ball forward here? (Happy to rebase if that's the next step!)

@di
Copy link
Member

di commented Mar 15, 2022

Sorry to leave you hanging here:

Is the BigQuery table schema in VCS somewhere I can send a PR, or is that something an admin does on the backend?

It is not, this is something that a admin would do on the backend.

@di lost track of this and now following up, is there anything I can do to help move the ball forward here? (Happy to rebase if that's the next step!)

I think next steps would be :

  • rebase and adding a test that includes a non-null rustc_version here
  • get @ewdurbin and I to review & approve
  • admins update the schema
  • merge this

@alex
Copy link
Member Author

alex commented Mar 15, 2022

Will get this updated later today!

@alex
Copy link
Member Author

alex commented Mar 15, 2022

Ok, rebased and test case added.

@alex
Copy link
Member Author

alex commented Mar 15, 2022

Unfortunately CI seems to be busted, pip install requirements.txt doesn't work.

@di di requested a review from ewdurbin March 15, 2022 18:32
@di
Copy link
Member

di commented Mar 15, 2022

Before merging we need to add a new nested column to the details RECORD, which can't be done via the console: https://cloud.google.com/bigquery/docs/managing-table-schemas#adding_a_nested_column_to_a_record

@alex
Copy link
Member Author

alex commented Mar 15, 2022

@di @ewdurbin I assume that's all on your side, and not something I can do.

@ewdurbin
Copy link
Member

Tried to do this, seems I don't have permissions on the public dataset :)

ewdurbin@cloudshell:~$ bq show --schema --format=prettyjson bigquery-public-data:pypi.simple_requests > simple_requests-schema.json                                    
ewdurbin@cloudshell:~$ bq show --schema --format=prettyjson bigquery-public-data:pypi.file_downloads > file_downloads-schema.json                                      
ewdurbin@cloudshell:~$ cp simple_requests-schema.json simple_requests-schema-orig.json                                                                                 
ewdurbin@cloudshell:~$ cp file_downloads-schema.json file_downloads-schema-orig.json
ewdurbin@cloudshell:~$ vim simple_requests-schema.json
ewdurbin@cloudshell:~$ vim file_downloads-schema.json
ewdurbin@cloudshell:~$ diff -C 4 simple_requests-schema-orig.json simple_requests-schema.json 
*** simple_requests-schema-orig.json    2022-03-17 09:54:08.037747262 +0000
--- simple_requests-schema.json 2022-03-17 09:54:50.337703919 +0000
***************
*** 108,115 ****
--- 108,119 ----
        },
        {
          "name": "setuptools_version",
          "type": "STRING"
+       },
+       {
+         "name": "rustc_version",
+         "type": "STRING"
        }
      ],
      "name": "details",
      "type": "RECORD"
ewdurbin@cloudshell:~$ diff -C 4 file_downloads-schema-orig.json file_downloads-schema.json
*** file_downloads-schema-orig.json     2022-03-17 09:54:16.387738706 +0000
--- file_downloads-schema.json  2022-03-17 09:55:05.189688701 +0000
***************
*** 131,138 ****
--- 131,142 ----
        },
        {
          "name": "setuptools_version",
          "type": "STRING"
+       },
+       {
+         "name": "rustc_version",
+         "type": "STRING"
        }
      ],
      "name": "details",
      "type": "RECORD"
ewdurbin@cloudshell:~$ bq update bigquery-public-data:pypi.simple_requests simple_requests-schema.json
BigQuery error in update operation: Access Denied: Table bigquery-public-data:pypi.simple_requests: Permission bigquery.tables.update denied on table bigquery-public-
data:pypi.simple_requests (or it may not exist).
ewdurbin@cloudshell:~$ bq update bigquery-public-data:pypi.file_downloads file_downloads-schema.json
BigQuery error in update operation: Access Denied: Table bigquery-public-data:pypi.file_downloads: Permission bigquery.tables.update denied on table bigquery-public-
data:pypi.file_downloads (or it may not exist).

@ewdurbin
Copy link
Member

Schema files for reference:

file_downloads-schema.json.txt
simple_requests-schema.json.txt

@di
Copy link
Member

di commented Mar 17, 2022

Yeah, this needs to be done by the public datasets team. I've reached out to them and directed them to this issue as well.

@alex
Copy link
Member Author

alex commented Mar 17, 2022

Thanks @di and @ewdurbin !

@di
Copy link
Member

di commented Mar 22, 2022

The rustc_version field has been added to the schemas:

dustin_ingram@cloudshell:~$ bq show bigquery-public-data:pypi.simple_requests | grep rustc_version
                    |  |- rustc_version: string
dustin_ingram@cloudshell:~$ bq show bigquery-public-data:pypi.file_downloads | grep rustc_version
                    |  |- rustc_version: string

I think we're good to merge this?

@alex
Copy link
Member Author

alex commented Mar 22, 2022

Fantastic! Makes sense to me.

@di
Copy link
Member

di commented Mar 23, 2022

Let's see what happens!

@di di merged commit 089f849 into pypi:main Mar 23, 2022
@alex alex deleted the patch-1 branch March 23, 2022 15:33
@alex
Copy link
Member Author

alex commented Mar 23, 2022

@di how long do you think it'd take before data either starts flowing, or errors start flowing 😬

@di
Copy link
Member

di commented Mar 23, 2022

It's auto-deployed and pretty much immediately live in the dataset:

SELECT
  details.rustc_version,
  COUNT(*) AS total_downloads
FROM
  `bigquery-public-data.pypi.file_downloads`
WHERE
  DATE(timestamp) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
  AND CURRENT_DATE()
GROUP BY
  details.rustc_version
ORDER BY
  total_downloads DESC
Row rustc_version total_downloads
1 null 904207816
2 1.59.0 992600
3 1.58.0 31051
4 1.56.1 15052
5 1.58.1 13760
6 1.47.0 13700
7 1.57.0 8510
8 1.56.1 4600
9 1.52.1 4078
10 1.35.0 3589
11 1.48.0 3536
12 1.41.1 2678
13 1.57.0 2328
14 1.52.0 2281
15 1.54.0 2042
16 1.59.0 1933
17 1.44.0 1779
18 1.32.0 1635
19 1.41.0 1623
20 1.44.0 1162
21 1.55.0 1127
22 1.50.0 1030
23 1.61.0-nightly 863
24 1.53.0 735
25 1.54.0 712
26 1.35.0 656
27 1.52.1 531
28 1.51.0 528
29 1.51.0 469
30 1.58.1 423
31 1.53.0 367
32 1.49.0 343
33 1.31.1 328
34 1.50.0 282
35 1.60.0-nightly 275
36 1.47.0 251
37 1.56.0 245
38 1.52.0-nightly 210
39 1.56.0-nightly 185
40 1.58.0-nightly 183
41 1.48.0 154
42 1.60.0-beta.6 150
43 1.21.0 111
44 1.30.0 96
45 1.57.0-beta.3 85
46 1.41.1 82
47 1.56.0 81
48 1.37.0 76
49 1.42.0 67
50 1.58.0 63

@alex
Copy link
Member Author

alex commented Mar 23, 2022

Whoooo! Thank you @di and @ewdurbin for your help getting this to happen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants