Skip to content

Commit

Permalink
[Size reports] Facilitate a size database (#15045)
Browse files Browse the repository at this point in the history
#### Problem

Because GitHub workflows have no shared storage, size reports use a
convoluted scheme involving JSON artifacts. It has always been possible
to download these artifacts separately and use the `gh_report.py` script
to maintain a historical database outside of GitHub, but the method for
doing this is awkward and obscure.

#### Change overview

- Split off parts of `gh_report.py` for reuse.
- Add a script to fetch GitHub artifacts and update a database.
- Add a script to encapsulate a few important database queries.
- Add documentation for the existing and new scripts.

#### Testing

Tested offline.
  • Loading branch information
kpschoedel authored Feb 11, 2022
1 parent 5c57973 commit 91ea3d5
Show file tree
Hide file tree
Showing 16 changed files with 1,567 additions and 612 deletions.
7 changes: 7 additions & 0 deletions .github/.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,9 @@ GetDeviceInfo
GetDns
GetIP
getstarted
GH
gh
ghp
githubusercontent
gitignore
glibc
Expand Down Expand Up @@ -848,6 +851,7 @@ PyEval
PyFunction
pylint
PyObject
pypi
PyRun
pytest
QEMU
Expand Down Expand Up @@ -958,6 +962,7 @@ SiLabs
SiliconLabs
SimpleFileExFlags
SimpleLink
sizedb
sl
SLAAC
SLTB
Expand Down Expand Up @@ -1041,6 +1046,7 @@ testws
texinfo
textboxes
TFT
ThIsIsNoTMyReAlGiThUbToKeNSoDoNoTtRy
threadOperationalDataset
ThreadStackManager
ThreadStackManagerImpl
Expand All @@ -1052,6 +1058,7 @@ TLV
tmp
tngvndl
TODO
toJson
tokenized
toolchain
toolchains
Expand Down
4 changes: 2 additions & 2 deletions scripts/tools/memory/.pylintrc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[BASIC]
disable=too-few-public-methods,bad-whitespace
disable=too-few-public-methods,bad-whitespace,broad-except

no-docstring-rgx=main
no-docstring-rgx=main|__init__
docstring-min-length=5
min-public-methods=1
max-args=7
Expand Down
180 changes: 180 additions & 0 deletions scripts/tools/memory/README-GitHub-CI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Scripts for GitHub CI

A set of `gh_*.py` scripts work together to produce size comparisons for PRs.

## Reports on Pull Requests

The scripts' results are presented as comments on PRs.

**Note** that a comment may be updated by the scripts as CI run results become
available.

**Note** that the scripts will not create a comment for a commit if there is
already a newer commit in the PR.

A size report comment consists of a title followed by one to four tables. A
title looks like:

> PR #12345678: Size comparison from `base-SHA` to `pr-SHA`
The first table, if present, lists items with a large increase, according to a
configurable threshold.

The next table, if present, lists all items that have increased in size.

The next table, if present, lists all items that have decreased in size.

The final table, always present, lists all items.

## Usage in CI

The original intent was to have a tool that would run after a build in CI, add
its sizes to a central database, and immediately report on size changes from the
parent commit in the database. Unfortunately, GitHub provides no practical place
to store and share such a database between workflow actions. Instead, the
process is split; builds in CI record size information in the form of GitHub
[artifacts](https://docs.github.com/en/actions/advanced-guides/storing-workflow-data-as-artifacts),
and a later step reads these artifacts to generate reports.

### 1. Build workflows

#### gh_sizes_environment.py

The `gh_sizes_environment.py` script should be run once in each workflow that
records sizes, _after_ checkout and _before_ any use of `gh_sizes.py` It takes a
single argument, a JSON dictionary of the `github` context. Typically run as:

```
steps:
- name: Checkout
uses: actions/checkout@v2
with:
submodules: true
- name: Set up environment for size reports
if: ${{ !env.ACT }}
env:
GH_CONTEXT: ${{ toJson(github) }}
run: scripts/tools/memory/gh_sizes_environment.py "${GH_CONTEXT}"
```

#### gh_sizes.py

The `gh_sizes.py` script runs on a built binary (executable or library) and
produces a JSON file containing size information.

Usage: `gh_sizes.py` _platform_ _config_ _target_ _binary_ [_output_]

Where _platform_ is the platform name, corresponding to a config file in
`scripts/tools/memory/platform/`.

Where _config_ is a configuration identification string. This has no fixed
meaning, but is intended to describe a build variation, e.g. a particular target
board or debug vs release.

Where _target_ is a readable name for the build artifact, identifying it in
reports.

Where _binary_ is the input build artifact.

Where _output_ is the name for the output JSON file, or a directory for it, in
which case the name will be
_platform_`-`_config_name_`-`_target_name_`-sizes.json`.

Example:

```
scripts/tools/memory/gh_sizes.py \
linux arm64 thermostat-no-ble \
out/linux-arm64-thermostat-no-ble/thermostat-app \
/tmp/bloat_reports/
```

#### Upload artifacts

The JSON files generated by `gh_sizes.py` must be uploaded with an artifact name
of a very specific form in order to be processed correctly.

Example:

```
Size,Linux-Examples,${{ env.GH_EVENT_PR }},${{ env.GH_EVENT_HASH }},${{ env.GH_EVENT_PARENT }},${{ github.event_name }}
```

Other builds must replace `Linux-Examples` with a label unique to the workflow,
but otherwise use the form exactly.

### 2. Reporting workflow

Run a periodic workflow calling `gh_report.py` to generate PR comments. This
script has full `--help`, but normal use is probably best illustrated by an
example:

```
scripts/tools/memory/gh_report.py \
--verbose \
--report-increases 0.2 \
--report-pr \
--github-comment \
--github-limit-artifact-pages 50 \
--github-limit-artifacts 500 \
--github-limit-comments 20 \
--github-repository project-chip/connectedhomeip \
--github-api-token "${{ secrets.GITHUB_TOKEN }}"
```

Notably, the `--report-increases` flag provides a _percent growth_ threshold for
calling out ‘large’ increases in GitHub comments.

When this script successfully posts a comment on a GitHub PR, it removes the
corresponding PR artifact(s) so that a future run will not process it again and
post the same comment. Only PR artifacts are removed, not push (trunk)
artifacts, since those may be used as a comparison base by many different PRs.

## Using a database

It can be useful to keep a permanent record of build sizes.

### Updating the database: `gh_db_load.py`

To update an SQLite file of trunk commit sizes, periodically run:

```
gh_db_load.py \
--repo project-chip/connectedhomeip \
--token ghp_ThIsIsNoTMyReAlGiThUbToKeNSoDoNoTtRy \
--db /path/to/database
```

Those interested in only a single platform can add the `--github-label` option,
providing the same name as in the size artifact name after `Size,` (e.g.
`Linux-Examples` in the upload example above).

See `--help` for additional options.

_Note_: Transient 4xx and 5xx errors from GitHub's API are very common. Run
`gh_db_load.py` frequently enough to give it several attempts before the
relevant artifacts expire.

### Querying the database: `gh_db_query.py`

While the database can of course be used directly, the `gh_db_query.py` script
provides a handful of common queries.

Note that this script (like others that show tables) has an `--output-format`
option offering (among others) CSV, several JSON formats, and any text format
provided by [tabulate](https://pypi.org/project/tabulate/).

Two notable options:

- `--query-build-sizes PLATFORM,CONFIG,TARGET` lists sizes for all builds of
the given kind, with a column for each section.
- `--query-section-changes PLATFORM,CONFIG,TARGET,SECTION` lists changes for
the given section. The `--report-increases PERCENT` option limits this to
changes over a given threshold (as is done for PR comments).

(To find out what PLATFORM, CONFIG, TARGET, and SECTION exist:
`--query-platforms`, then `--query-platform-targets=PLATFORM` and
`--query-platform-sections=PLATFORM`.)

See `--help` for additional options.
5 changes: 3 additions & 2 deletions scripts/tools/memory/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,15 @@ The following options are common to _most_ of the scripts, where applicable:
- `--output-format` _FORMAT_, `--to` _FORMAT_, `-t` _FORMAT_ Output format.
One of:
- `text` — Plain text tables, in a single file.
- `csv` — Comma-separated tables (in several files).
- `tsv` — Tab-separated tables (in several files).
- `csv` — Comma-separated tables (in several files, if not stdout).
- `tsv` — Tab-separated tables (in several files, if not stdout).
- `json_split` — JSON - see Pandas documentation for details.
- `json_records` — JSON - see Pandas documentation for details.
- `json_index` — JSON - see Pandas documentation for details.
- `json_columns` — JSON - see Pandas documentation for details.
- `json_values` — JSON - see Pandas documentation for details.
- `json_table` — JSON - see Pandas documentation for details.
- Any format provided by [tabulate](https://pypi.org/project/tabulate/).
- `--report-limit` _BYTES_, `--limit` _BYTES_ Limit display to items above the
given size. Suffixes (e.g. `K`) are accepted.
- `--report-by` _GROUP_, `--by` _GROUP_ Reporting group. One of:
Expand Down
113 changes: 113 additions & 0 deletions scripts/tools/memory/gh_db_load.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
#!/usr/bin/env python3
#
# Copyright (c) 2021 Project CHIP Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
"""Fetch data from GitHub size artifacts."""

import io
import logging
import sys

import memdf.sizedb
import memdf.util.config
import memdf.util.markdown
import memdf.util.sqlite
from memdf.util.github import Gh
from memdf import Config, ConfigDescription

GITHUB_CONFIG: ConfigDescription = {
Config.group_def('github'): {
'title': 'github options',
},
'github.event': {
'help': 'Download only event type(s) (default ‘push’)',
'metavar': 'EVENT',
'default': [],
'argparse': {
'alias': ['--event']
},
},
'github.limit-artifacts': {
'help': 'Download no more than COUNT artifacts',
'metavar': 'COUNT',
'default': 0,
'argparse': {
'type': int,
},
},
'github.label': {
'help': 'Download artifacts for one label only',
'metavar': 'LABEL',
'default': '',
},
}


def main(argv):
status = 0
try:
sqlite_config = memdf.util.sqlite.CONFIG
sqlite_config['database.file']['argparse']['required'] = True

config = Config().init({
**memdf.util.config.CONFIG,
**memdf.util.github.CONFIG,
**sqlite_config,
**GITHUB_CONFIG,
})
config.argparse.add_argument('inputs', metavar='FILE', nargs='*')
config.parse(argv)

db = memdf.sizedb.SizeDatabase(config['database.file']).open()

if gh := Gh(config):

artifact_limit = config['github.limit-artifacts']
artifacts_added = 0
events = config['github.event']
if not events:
events = ['push']
for a in gh.get_size_artifacts(label=config['github.label']):
if events and a.event not in events:
logging.debug('Skipping %s artifact %d', a.event, a.id)
continue
cur = db.execute('SELECT id FROM build WHERE artifact = ?',
(a.id,))
if cur.fetchone():
logging.debug('Skipping known artifact %d', a.id)
continue
blob = gh.download_artifact(a.id)
if blob:
logging.info('Adding artifact %d %s %s %s %s',
a.id, a.commit[:12], a.pr, a.event, a.group)
db.add_sizes_from_zipfile(io.BytesIO(blob),
{'artifact': a.id})
db.commit()
artifacts_added += 1
if artifact_limit and artifact_limit <= artifacts_added:
break

for filename in config['args.inputs']:
db.add_sizes_from_file(filename)
db.commit()

except Exception as exception:
raise exception

return status


if __name__ == '__main__':
sys.exit(main(sys.argv))
Loading

0 comments on commit 91ea3d5

Please sign in to comment.