-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Size reports] Facilitate a size database (#15045)
#### Problem Because GitHub workflows have no shared storage, size reports use a convoluted scheme involving JSON artifacts. It has always been possible to download these artifacts separately and use the `gh_report.py` script to maintain a historical database outside of GitHub, but the method for doing this is awkward and obscure. #### Change overview - Split off parts of `gh_report.py` for reuse. - Add a script to fetch GitHub artifacts and update a database. - Add a script to encapsulate a few important database queries. - Add documentation for the existing and new scripts. #### Testing Tested offline.
- Loading branch information
1 parent
5c57973
commit 91ea3d5
Showing
16 changed files
with
1,567 additions
and
612 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
# Scripts for GitHub CI | ||
|
||
A set of `gh_*.py` scripts work together to produce size comparisons for PRs. | ||
|
||
## Reports on Pull Requests | ||
|
||
The scripts' results are presented as comments on PRs. | ||
|
||
**Note** that a comment may be updated by the scripts as CI run results become | ||
available. | ||
|
||
**Note** that the scripts will not create a comment for a commit if there is | ||
already a newer commit in the PR. | ||
|
||
A size report comment consists of a title followed by one to four tables. A | ||
title looks like: | ||
|
||
> PR #12345678: Size comparison from `base-SHA` to `pr-SHA` | ||
The first table, if present, lists items with a large increase, according to a | ||
configurable threshold. | ||
|
||
The next table, if present, lists all items that have increased in size. | ||
|
||
The next table, if present, lists all items that have decreased in size. | ||
|
||
The final table, always present, lists all items. | ||
|
||
## Usage in CI | ||
|
||
The original intent was to have a tool that would run after a build in CI, add | ||
its sizes to a central database, and immediately report on size changes from the | ||
parent commit in the database. Unfortunately, GitHub provides no practical place | ||
to store and share such a database between workflow actions. Instead, the | ||
process is split; builds in CI record size information in the form of GitHub | ||
[artifacts](https://docs.github.com/en/actions/advanced-guides/storing-workflow-data-as-artifacts), | ||
and a later step reads these artifacts to generate reports. | ||
|
||
### 1. Build workflows | ||
|
||
#### gh_sizes_environment.py | ||
|
||
The `gh_sizes_environment.py` script should be run once in each workflow that | ||
records sizes, _after_ checkout and _before_ any use of `gh_sizes.py` It takes a | ||
single argument, a JSON dictionary of the `github` context. Typically run as: | ||
|
||
``` | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v2 | ||
with: | ||
submodules: true | ||
- name: Set up environment for size reports | ||
if: ${{ !env.ACT }} | ||
env: | ||
GH_CONTEXT: ${{ toJson(github) }} | ||
run: scripts/tools/memory/gh_sizes_environment.py "${GH_CONTEXT}" | ||
``` | ||
|
||
#### gh_sizes.py | ||
|
||
The `gh_sizes.py` script runs on a built binary (executable or library) and | ||
produces a JSON file containing size information. | ||
|
||
Usage: `gh_sizes.py` _platform_ _config_ _target_ _binary_ [_output_] | ||
|
||
Where _platform_ is the platform name, corresponding to a config file in | ||
`scripts/tools/memory/platform/`. | ||
|
||
Where _config_ is a configuration identification string. This has no fixed | ||
meaning, but is intended to describe a build variation, e.g. a particular target | ||
board or debug vs release. | ||
|
||
Where _target_ is a readable name for the build artifact, identifying it in | ||
reports. | ||
|
||
Where _binary_ is the input build artifact. | ||
|
||
Where _output_ is the name for the output JSON file, or a directory for it, in | ||
which case the name will be | ||
_platform_`-`_config_name_`-`_target_name_`-sizes.json`. | ||
|
||
Example: | ||
|
||
``` | ||
scripts/tools/memory/gh_sizes.py \ | ||
linux arm64 thermostat-no-ble \ | ||
out/linux-arm64-thermostat-no-ble/thermostat-app \ | ||
/tmp/bloat_reports/ | ||
``` | ||
|
||
#### Upload artifacts | ||
|
||
The JSON files generated by `gh_sizes.py` must be uploaded with an artifact name | ||
of a very specific form in order to be processed correctly. | ||
|
||
Example: | ||
|
||
``` | ||
Size,Linux-Examples,${{ env.GH_EVENT_PR }},${{ env.GH_EVENT_HASH }},${{ env.GH_EVENT_PARENT }},${{ github.event_name }} | ||
``` | ||
|
||
Other builds must replace `Linux-Examples` with a label unique to the workflow, | ||
but otherwise use the form exactly. | ||
|
||
### 2. Reporting workflow | ||
|
||
Run a periodic workflow calling `gh_report.py` to generate PR comments. This | ||
script has full `--help`, but normal use is probably best illustrated by an | ||
example: | ||
|
||
``` | ||
scripts/tools/memory/gh_report.py \ | ||
--verbose \ | ||
--report-increases 0.2 \ | ||
--report-pr \ | ||
--github-comment \ | ||
--github-limit-artifact-pages 50 \ | ||
--github-limit-artifacts 500 \ | ||
--github-limit-comments 20 \ | ||
--github-repository project-chip/connectedhomeip \ | ||
--github-api-token "${{ secrets.GITHUB_TOKEN }}" | ||
``` | ||
|
||
Notably, the `--report-increases` flag provides a _percent growth_ threshold for | ||
calling out ‘large’ increases in GitHub comments. | ||
|
||
When this script successfully posts a comment on a GitHub PR, it removes the | ||
corresponding PR artifact(s) so that a future run will not process it again and | ||
post the same comment. Only PR artifacts are removed, not push (trunk) | ||
artifacts, since those may be used as a comparison base by many different PRs. | ||
|
||
## Using a database | ||
|
||
It can be useful to keep a permanent record of build sizes. | ||
|
||
### Updating the database: `gh_db_load.py` | ||
|
||
To update an SQLite file of trunk commit sizes, periodically run: | ||
|
||
``` | ||
gh_db_load.py \ | ||
--repo project-chip/connectedhomeip \ | ||
--token ghp_ThIsIsNoTMyReAlGiThUbToKeNSoDoNoTtRy \ | ||
--db /path/to/database | ||
``` | ||
|
||
Those interested in only a single platform can add the `--github-label` option, | ||
providing the same name as in the size artifact name after `Size,` (e.g. | ||
`Linux-Examples` in the upload example above). | ||
|
||
See `--help` for additional options. | ||
|
||
_Note_: Transient 4xx and 5xx errors from GitHub's API are very common. Run | ||
`gh_db_load.py` frequently enough to give it several attempts before the | ||
relevant artifacts expire. | ||
|
||
### Querying the database: `gh_db_query.py` | ||
|
||
While the database can of course be used directly, the `gh_db_query.py` script | ||
provides a handful of common queries. | ||
|
||
Note that this script (like others that show tables) has an `--output-format` | ||
option offering (among others) CSV, several JSON formats, and any text format | ||
provided by [tabulate](https://pypi.org/project/tabulate/). | ||
|
||
Two notable options: | ||
|
||
- `--query-build-sizes PLATFORM,CONFIG,TARGET` lists sizes for all builds of | ||
the given kind, with a column for each section. | ||
- `--query-section-changes PLATFORM,CONFIG,TARGET,SECTION` lists changes for | ||
the given section. The `--report-increases PERCENT` option limits this to | ||
changes over a given threshold (as is done for PR comments). | ||
|
||
(To find out what PLATFORM, CONFIG, TARGET, and SECTION exist: | ||
`--query-platforms`, then `--query-platform-targets=PLATFORM` and | ||
`--query-platform-sections=PLATFORM`.) | ||
|
||
See `--help` for additional options. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
#!/usr/bin/env python3 | ||
# | ||
# Copyright (c) 2021 Project CHIP Authors | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
"""Fetch data from GitHub size artifacts.""" | ||
|
||
import io | ||
import logging | ||
import sys | ||
|
||
import memdf.sizedb | ||
import memdf.util.config | ||
import memdf.util.markdown | ||
import memdf.util.sqlite | ||
from memdf.util.github import Gh | ||
from memdf import Config, ConfigDescription | ||
|
||
GITHUB_CONFIG: ConfigDescription = { | ||
Config.group_def('github'): { | ||
'title': 'github options', | ||
}, | ||
'github.event': { | ||
'help': 'Download only event type(s) (default ‘push’)', | ||
'metavar': 'EVENT', | ||
'default': [], | ||
'argparse': { | ||
'alias': ['--event'] | ||
}, | ||
}, | ||
'github.limit-artifacts': { | ||
'help': 'Download no more than COUNT artifacts', | ||
'metavar': 'COUNT', | ||
'default': 0, | ||
'argparse': { | ||
'type': int, | ||
}, | ||
}, | ||
'github.label': { | ||
'help': 'Download artifacts for one label only', | ||
'metavar': 'LABEL', | ||
'default': '', | ||
}, | ||
} | ||
|
||
|
||
def main(argv): | ||
status = 0 | ||
try: | ||
sqlite_config = memdf.util.sqlite.CONFIG | ||
sqlite_config['database.file']['argparse']['required'] = True | ||
|
||
config = Config().init({ | ||
**memdf.util.config.CONFIG, | ||
**memdf.util.github.CONFIG, | ||
**sqlite_config, | ||
**GITHUB_CONFIG, | ||
}) | ||
config.argparse.add_argument('inputs', metavar='FILE', nargs='*') | ||
config.parse(argv) | ||
|
||
db = memdf.sizedb.SizeDatabase(config['database.file']).open() | ||
|
||
if gh := Gh(config): | ||
|
||
artifact_limit = config['github.limit-artifacts'] | ||
artifacts_added = 0 | ||
events = config['github.event'] | ||
if not events: | ||
events = ['push'] | ||
for a in gh.get_size_artifacts(label=config['github.label']): | ||
if events and a.event not in events: | ||
logging.debug('Skipping %s artifact %d', a.event, a.id) | ||
continue | ||
cur = db.execute('SELECT id FROM build WHERE artifact = ?', | ||
(a.id,)) | ||
if cur.fetchone(): | ||
logging.debug('Skipping known artifact %d', a.id) | ||
continue | ||
blob = gh.download_artifact(a.id) | ||
if blob: | ||
logging.info('Adding artifact %d %s %s %s %s', | ||
a.id, a.commit[:12], a.pr, a.event, a.group) | ||
db.add_sizes_from_zipfile(io.BytesIO(blob), | ||
{'artifact': a.id}) | ||
db.commit() | ||
artifacts_added += 1 | ||
if artifact_limit and artifact_limit <= artifacts_added: | ||
break | ||
|
||
for filename in config['args.inputs']: | ||
db.add_sizes_from_file(filename) | ||
db.commit() | ||
|
||
except Exception as exception: | ||
raise exception | ||
|
||
return status | ||
|
||
|
||
if __name__ == '__main__': | ||
sys.exit(main(sys.argv)) |
Oops, something went wrong.