From 585deedd4657af9ec58f81c1b8c2fed6fb6d7e16 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 20 Feb 2024 12:32:50 +1100 Subject: [PATCH 1/7] Bump setuptools from 69.0.3 to 69.1.0 (#1671) Bumps [setuptools](https://github.com/pypa/setuptools) from 69.0.3 to 69.1.0. - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) - [Commits](https://github.com/pypa/setuptools/compare/v69.0.3...v69.1.0) --- updated-dependencies: - dependency-name: setuptools dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- requirements.txt | 2 +- requirements_test.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/requirements.txt b/requirements.txt index 71f1534db..9a48a7f3d 100644 --- a/requirements.txt +++ b/requirements.txt @@ -11,6 +11,6 @@ lxml==5.1.0 multidict==6.0.5 packaging==23.2 pyparsing==3.1.1 -setuptools==69.0.3 +setuptools==69.1.0 six==1.16.0 yarl==1.9.4 diff --git a/requirements_test.txt b/requirements_test.txt index f967e0274..17c4f5673 100644 --- a/requirements_test.txt +++ b/requirements_test.txt @@ -9,7 +9,7 @@ pre-commit==3.6.0 pytest==7.4.4 pytest-asyncio==0.23.5 pytest-timeout==2.2.0 -setuptools==69.0.3 +setuptools==69.1.0 tox==4.12.1 types-filelock==3.2.7 types-freezegun==1.1.10 From 10f652c1352db959ee82a60769ead4ac72e87d34 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 20 Feb 2024 13:05:50 +1100 Subject: [PATCH 2/7] Bump flake8-bugbear from 24.1.17 to 24.2.6 (#1670) Bumps [flake8-bugbear](https://github.com/PyCQA/flake8-bugbear) from 24.1.17 to 24.2.6. - [Release notes](https://github.com/PyCQA/flake8-bugbear/releases) - [Commits](https://github.com/PyCQA/flake8-bugbear/compare/24.1.17...24.2.6) --- updated-dependencies: - dependency-name: flake8-bugbear dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Cooper Lees --- requirements_test.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements_test.txt b/requirements_test.txt index 17c4f5673..ef8f51090 100644 --- a/requirements_test.txt +++ b/requirements_test.txt @@ -2,7 +2,7 @@ async-timeout>=4.0.0a3 black==24.1.1 coverage==7.4.1 flake8==7.0.0 -flake8-bugbear==24.1.17 +flake8-bugbear==24.2.6 freezegun==1.4.0 mypy==1.8.0 pre-commit==3.6.0 From 9785a8a4172c0b03bd7fcde1d78f3eb96336a0b3 Mon Sep 17 00:00:00 2001 From: Matthew Seiler Date: Sun, 3 Mar 2024 14:01:56 -0500 Subject: [PATCH 3/7] Rewrite mirror configuration documentation (#1669) * Rewrite mirror configuration documentation - Add options that were missing from the documentation - Document expected defaults for all options - Group examples into larger snippets that show related options being used together - Show the type and default value for each option in a consistent, non-prose format - Rewrite option short descriptions to be more concise - Make extensive (or excessive?) use of cross references - Add examples of how different options affect mirror directory structure - Embed the content of `src/bandersnatch/default.conf` for user reference * Apply suggestions from code review Apply suggested tweaks to option descriptions. Co-authored-by: Cooper Lees * Make internal references more consistent. - Remove explicit reference targets and use heading anchors instead - Remove file name for links to headings in the same document * Update descriptions for 'simple-format' & 'proxy' - Elaborate a little in the description for 'simple-format', & include a link to PEP-691 - Add note regarding environment variables & SOCKS support in the description of 'proxy' * the aiohttp client setup in Master checks the noted environment variables & adds aiohttp-socks to the client session if it sees a socks URL * if there wasn't a socks URL, 'trust_env' is set to True, which sets aiohttp to use urllib.request.getproxies to scan the environment for proxy configuration. * Indicate mirror options required by current implementation This changes the mirror config documentation to match reality, where the current config reading implementation makes more options required than originally intended / strictly necessary. All options now have a `Required` field in addition to `Type` and `Default`. Intended future defaults for required fields are moved into the option description / recommendations. Updates example configs to always include required options. I also found that the output of `[mirror].json` appears to be required for running `bandersnatch verify`. * Fix example snippets et al. - Checked the example mirror configuration snippets and corrected mistakes - they should all work with 'bandersnatch mirror' when combined with filter plugin config. - Corrected the default value for 'diff-file' and tweak description - Add entry to CHANGES --------- Co-authored-by: Matthew Seiler Co-authored-by: Cooper Lees --- CHANGES.md | 4 + docs/conf.py | 3 +- docs/mirror_configuration.md | 618 ++++++++++++++++++++++++++--------- 3 files changed, 476 insertions(+), 149 deletions(-) diff --git a/CHANGES.md b/CHANGES.md index e61675a72..6f84930bf 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -4,6 +4,10 @@ - Added HTTPS support in Docker Compose + Enabled bind-mount volume for Nginx config + add documentation in README.md (PR #1653) +## Documentation + +- Updated documentation for `[mirror]` configuration options `PR #1669` + # 6.5.0 ## New Features diff --git a/docs/conf.py b/docs/conf.py index a7cfb94d8..17998752b 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -103,7 +103,8 @@ class DocStub: # Enable certain MyST-Parser extensions # see also: https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html -myst_enable_extensions = ["colon_fence"] +myst_enable_extensions = ["colon_fence", "fieldlist"] +myst_heading_anchors = 3 # -- Options for HTML output ---------------------------------------------- diff --git a/docs/mirror_configuration.md b/docs/mirror_configuration.md index d012446ee..9b0a32a8f 100644 --- a/docs/mirror_configuration.md +++ b/docs/mirror_configuration.md @@ -1,272 +1,460 @@ -# Mirror configuration +# Mirror Configuration -The mirror configuration settings are in a configuration section of the configuration file -named **\[mirror\]**. +The **\[mirror\]** section of the configuration file contains general options for how Bandersnatch should operate. This includes settings like the source repository to mirror, how to store mirrored files, and the kinds of files to include in the mirror. -This section contains settings to specify how the mirroring software should operate. +The following options are currently _required_: -## directory +- [](#directory) +- [](#master) +- [](#workers) +- [](#timeout) +- [](#global-timeout) +- [](#stop-on-error) +- [](#hash-index) -The mirror directory setting is a string that specifies the directory to -store the mirror files. +## Examples -The directory used must meet the following requirements: +These examples only show `[mirror]` options; a complete configuration may include [mirror filtering plugins][filter-plugins] and/or options for a [storage backend][storage-backends]. -- The filesystem must be case-sensitive filesystem. -- The filesystem must support large numbers of sub-directories. -- The filesystem must support large numbers of files (inodes) +### Minmal -Example: +A basic configuration with reasonable defaults for the required options: ```ini [mirror] +; base destination path for mirrored files directory = /srv/pypi + +; upstream package repository to mirror +master = https://pypi.org + +; parallel downloads - keep low to avoid overwhelming upstream +workers = 3 + +; per-request time limit +timeout = 15 + +; global time limit - applied to aiohttp coroutines +global-timeout = 18000 + +; continue syncing when an error occurs +stop-on-error = false + +; use PyPI-compatible folder structure for index files +hash-index = false ``` -## json +This will mirror index files and package release files from PyPI and store the mirror in `/srv/pypi`. Add configuration for [mirror filtering plugins][filter-plugins] to optionally filter what packages are mirrored in a variety of ways. -The mirror json setting is a boolean (true/false) setting that indicates that -the json packaging metadata should be mirrored in addition to the packages. +### Alternative Download Source -Example: +It is possible to download metadata from one repository, but package release files from another: ```ini [mirror] -json = false +directory = /srv/pypi +; Project and package metadata received from this repository +master = https://pypi.org +; Package distribution artifacts downloaded from here if possible +download-mirror = https://pypi-mirror.example.com/ + +; required options from basic config +workers = 3 +timeout = 15 +global-timeout = 18000 +stop-on-error = false +hash-index = false ``` -## release-files +This will download release files from `https://pypi-mirror.example.com` if possible and fall back to PyPI if a download fails. See [](#download-mirror). Add [](#download-mirror-no-fallback) to download release files exclusively from `download-mirror`. -The mirror release-files setting is a boolean (true/false) setting that indicates that -the package release files should be mirrored. Defaults to `true`. When this option is disabled (via setting to false), you -should also specify the `root_uri` configuration. If the uri is empty, it will be set -to . +### Index Files Only -Example: +It is possible to mirror just index files without downloading any package release files: ```ini [mirror] -release-files = true +directory = /srv/pypi-filtered +master = https://pypi.org +simple-format = ALL +release-files = false +root_uri = https://files.pythonhosted.org/ + +; required options from basic config +workers = 3 +timeout = 15 +global-timeout = 18000 +stop-on-error = false +hash-index = false ``` -## master +This will mirror index files for projects and versions allowed by your [mirror filters][filter-plugins], but will not download any package release files. File URLs in index files will use the configured `root_uri`. See [](#release-files) and [](#root_uri). -The master setting is a string containing a url of the server which will be mirrored. +## Option Reference -The master url string must use https: protocol. +% +% mirror output / file structure related +% -The default value is: +### `directory` -If you would like to configure an alternative download mirror of package distribution artifacts -please also take a look at the `download-mirror` option. +The directory where mirrored files are stored. _This option is always required._ -Example: +:Type: folder path +:Required: **yes** -```ini -[mirror] -master = https://pypi.org +The exact interpretation of this value depends on the configured [storage backend](#storage-backend). For the default [filesystem](./storage_options.md#filesystem-support) backend, the directory used should meet the following requirements: + +- The filesystem must be case-sensitive. +- The filesystem must support large numbers of sub-directories. +- The filesystem must support large numbers of files (inodes) + +### `storage-backend` + +The [storage backend][storage-backends] used to save data and metadata when mirroring packages. + +:Type: string +:Required: no +:Default: `filesystem` + +```{seealso} +Available storage backends are documented at [][storage-backends]. ``` -## timeout +### `simple-format` -The timeout value is an integer that indicates the maximum number of seconds for web requests. +The [Simple Repository API][simple-repository-api] index file formats to generate. -The default value for this setting is 10 seconds. +:Type: one of `HTML`, `JSON`, or `ALL` +:Required: no +:Default: `ALL` -Example: +[PEP 691 – JSON-based Simple API for Python Package Indexes](https://peps.python.org/pep-0691/) extended the Simple Repository API to support both HTML and JSON. Bandersnatch generates project index files in both formats by default. Set this option to restrict index files to a single data format. -```ini -[mirror] -timeout = 10 +[](#simple-format-index-files) describes the generated folder structure and file names. + +### `release-files` + +Mirror package release files. Release files are the uploaded sdist and wheel files for mirrored projects. + +:Type: boolean +:Required: no +:Default: true + +Disabling this will mirror repository [index files](#simple-format) and/or [project metadata](#json) without downloading any associated package files. [](#release-files-folder-structure) describes the folder structure for mirrored package release files. + +```{note} +If `release-files = false`, you should also specify the [](#root_uri) option. +``` + +### `json` + +Save copies of JSON project metadata downloaded from PyPI. + +:Type: boolean +:Required: no +:Default: false + +When enabled, this saves copies of all JSON project metadata downloaded from [PyPI's JSON API](https://warehouse.pypa.io/api-reference/json.html). These files are used by the subcommand. + +[](#json-api-metadata-files) describes the folder structure generated by this option. The format of the saved JSON is not standardized and is specific to [Warehouse](https://warehouse.pypa.io/). + +```{note} +This option does _not_ effect the generation of simple repository API index files in JSON format ([](#simple-format)). +``` + +### `root_uri` + +A base URL to generate absolute URLs for package release files. + +:Type: URL +:Required: no +:Default: `https://files.pythonhosted.org/` + +Bandersnatch creates index files containing relative URLs by default. Setting this option generates index files with absolute URLs instead. + +If [](#release-files) is disabled _and_ this option is unset, Bandersnatch uses a default value of `https://files.pythonhosted.org/`. + +```{note} +This is generally not necessary, but was added for the official internal PyPI mirror, which requires serving packages from ``. +``` + +### `diff-file` + +File location to write a list of all new or changed files during a mirror operation. + +:Type: file or folder path +:Required: no +:Default: `/mirrored-files` + +Bandersnatch creates a plain-text file at the specified location containing a list of all files created or updated during the last mirror/sync operation. The files are listed as absolute paths separated by blank lines. + +This is useful when mirroring to an offline network where it is required to only transfer new files to the downstream mirror. The diff file can be used to copy new files to an external drive, sync the list of files to an SSH destination such as a diode, or send the files through some other mechanism to an offline system. + +If the specified path is a directory, Bandersnatch will use the file name "`mirrored-files`" within that directory. + +The file will be overwritten on each mirror operation unless [](#diff-append-epoch) is enabled. + +#### Example Usage + +The diff file can be used with rsync for copying only new files: + +```console +rsync -av --files-from=/srv/pypi/mirrored-files / /mnt/usb/ ``` -## global-timeout +It can also be used with 7zip to create split archives for transfers: -The global-timeout value is an integer that indicates the maximum runtime of individual aiohttp coroutines. +```console +7za a -i@"/srv/pypi/mirrored-files" -spf -v100m path_to_new_zip.7z +``` -The default value for this setting is 18000 seconds, or 5 hours. +### `diff-append-epoch` -Example: +Append the current epoch time to the file name for [](#diff-file). + +:Type: boolean +:Required: no +:Default: false + +For example, the configuration: ```ini [mirror] -global-timeout = 18000 +; ... +diff-file = /srv/pypi/new-files +diff-append-epoch = true ``` -## workers +Will generate diff files with names like `/srv/pypi/new-files-1568129735`. This can be used to track diffs over time by creating a new diff file each time Bandersnatch runs. -The workers value is an integer from from 1-10 that indicates the number of concurrent downloads. +### `hash-index` -The default value is 3. +Group generated project index folders by the first letter of their normalized project name. -Recommendations for the workers setting: +:Type: boolean +:Required: **yes** -- leave the default of 3 to avoid overloading the pypi master -- official servers located in data centers could run 10 workers -- anything beyond 10 is probably unreasonable and is not allowed. +Enabling this changes the way generated index files are organized. Project folders are grouped into subfolders alphabetically as shown here: [](#hash-index-index-files). This has the effect of splitting up a large `/web/simple` directory into smaller subfolders, each containing a subset of the index files. This can improve file system efficiency when mirroring a very large number of projects, but requires a web server capable of translating Simple Repository API URLs into file paths. -## hash-index +```{warning} +It is recommended to set this to `false` for full pip/pypi compatibility. -The hash-index is a boolean (true/false) to determine if package hashing should be used. +The path structure created by this option is _incompatible_ with the [Simple Repository API][simple-repository-api]. Serving the generated `web/simple/` folder directly will not work with pip. `hash-index` should only be used with a web server that can translate request URIs into alternative filesystem locations. -The Recommended setting: the default of false for full pip/pypi compatibility. +Requests for subfolders of `/web/simple` must be re-written using the first letter of the requested project name: -:::{warning} -Package index directory hashing is incompatible with pip, and so this should only be used in an environment where it is behind an application that can translate URIs to filesystem locations. -::: +- Requested path: `/simple/someproject/index.html` +- Translated path: `/simple/s/someproject/index.html` +``` -### Apache rewrite rules when using hash-index +#### Example Apache `RewriteRule` Configuration -When using this setting with an apache server. The apache server will need the following rewrite rules: +Configuration like the following is required to use the `hash-index` option with an Apache web server: -```text +``` RewriteRule ^([^/])([^/]*)/$ /mirror/pypi/web/simple/$1/$1$2/ RewriteRule ^([^/])([^/]*)/([^/]+)$/ /mirror/pypi/web/simple/$1/$1$2/$3 ``` -### NGINX rewrite rules when using hash-index +#### Example NGINX `rewrite` Configuration -When using this setting with an nginx server. The nginx server will need the following rewrite rules: +Configuration like the following is required to use `hash-index` with an NGINX web server: -```text +``` rewrite ^/simple/([^/])([^/]*)/$ /simple/$1/$1$2/ last; rewrite ^/simple/([^/])([^/]*)/([^/]+)$/ /simple/$1/$1$2/$3 last; ``` -## stop-on-error +% +% Mirror source / network related options +% -The stop-on-error setting is a boolean (true/false) setting that indicates if bandersnatch -should stop immediately if it encounters an error. +### `master` -If this setting is false it will not stop when an error is encountered but it will not -mark the sync as successful when the sync is complete. +The URL of the Python package repository server to mirror. -```ini -[mirror] -stop-on-error = false -``` +:Type: URL +:Required: **yes** -## log-config +Bandersnatch requests metadata for projects and packages from this repository server, and downloads package release files from the URLs specified in the received metadata. -The log-config setting is a string containing the filename of a python logging configuration -file. +To mirror packages from PyPI, set this to `https://pypi.org`. -Example: +The URL _must_ use the `https:` protocol. -```ini -[mirror] -log-config = /etc/bandersnatch-log.conf +```{seealso} +Bandersnatch can download package release files from an alternative source by configuring a [](#download-mirror). ``` -## root_uri +### `proxy` -The root_uri is a string containing a uri which is the root added to relative links. +Use an HTTP proxy server. -:::{note} -This is generally not necessary, but was added for the official internal PyPI mirror, which requires serving packages from -::: +:Type: URL +:Required: no +:Default: none -Example: +The proxy server is used when sending requests to a repository server set by the [](#master) or [](#download-mirror) option. -```ini -[mirror] -root_uri = https://example.com +```{seealso} +HTTP proxies are supported through the `aiohttp` library. See the aiohttp manual for details on what connection types are supported: ``` -## diff-file - -The diff file is a string containing the filename to log the files that were downloaded during the mirror. -This file can then be used to synchronize external disks or send the files through some other mechanism to offline systems. -You can then sync the list of files to an attached drive or ssh destination such as a diode: +```{note} +Alternatively, you can specify a proxy URL by setting one of the environment variables `HTTPS_PROXY`, `HTTP_PROXY`, or `ALL_PROXY`. _This method supports both HTTP and SOCKS proxies._ Support for `socks4`/`socks5` uses the [aiohttp-socks](https://github.com/romis2012/aiohttp-socks) library. -```console -rsync -av --files-from=/srv/pypi/mirrored-files / /mnt/usb/ +SOCKS proxies are not currently supported via the `mirror.proxy` config option. ``` -You can also use this file list as an input to 7zip to create split archives for transfers, allowing you to size the files as you needed: +### `timeout` -```console -7za a -i@"/srv/pypi/mirrored-files" -spf -v100m path_to_new_zip.7z -``` +The network request timeout to use for all connections, in seconds. This is the maximum allowed time for individual web requests. -Example: +:Type: number, in seconds +:Required: **yes** -```ini -[mirror] -diff-file = /srv/pypi/mirrored-files +```{note} +It is recommended to set this to a relatively low value, e.g. 10 - 30 seconds. This is so temporary problems will fail quickly and allow retrying, instead of having a process hang infinitely and leave TCP unable to catch up for a long time. ``` -## diff-append-epoch +### `global-timeout` -The diff append epoch is a boolean (true/false) setting that indicates if the diff-file should be appended with the current epoch time. -This can be used to track diffs over time so the diff file doesn't get cobbered each run. It is only used when diff-file is used. +The maximum runtime of individual aiohttp coroutines, in seconds. -Example: +:Type: number, in seconds +:Required: **yes** -```ini -[mirror] -diff-append-epoch = true +```{note} +It is recommended to set this to a relatively high value, e.g. 3,600 - 18,000 (1 - 5 hours). This supports coroutines mirroring large package files on slow connections. ``` -## compare-method +### `download-mirror` -The compare method is used to set how to compare an existing file with upstream file to determine whether a download is required: +Download package release files from an alternative repository server. -- hash: this is the default which reads local file content and computes hashes (currently sha256sum), it is reliable but sometimes slower; -- stat: use file size and change time to compare, which is named after the stat() syscall, this avoids retrieving the full file content thus reducing some io workloads. +:Type: URL +:Required: no +:Default: none -Example: +By default, Bandersnatch downloads packages from the URL supplied in the master server's JSON response. Setting this option to a repository URL will try to download release files from that repository first, and fallback to the URL supplied by the master server if that is unsuccessful (unable to get content or checksum mismatch). -```ini -[mirror] -compare-method = hash +This is useful to sync most of the files from an existing, nearby mirror - for example, when creating a new mirror identical to an existing one for the purpose of load sharing. + +### `download-mirror-no-fallback` + +Disable the fallback behavior for [](#download-mirror). + +:Type: boolean +:Required: no +:Default: false + +When set to `true`, Bandersnatch only downloads package distribution artifacts from the repository set in [](#download-mirror) and ignores file URLs received from the [](#master) server. + +```{warning} +This could lead to more failures than expected and is not recommended for most scenarios. ``` -## proxy +% +% processing and miscellaneous options +% -The proxy is used only when requesting master server, eg. downloading index or package file from pypi.org. -The proxy value will be passed to aiohttp as proxy parameter, like `aiohttp.get(link, proxy=yourproxy)`, -check the aioproxy manual for more details: +### `cleanup` -Example: +Enable cleanup of legacy simple directories with non-normalized names. -```ini -[mirror] -proxy=http://myproxy.com +:Type: boolean +:Required: no +:Default: false + +Bandersnatch versions prior to 4.0 used directories with non-normalized package names for compatability with older versions of pip. Enabling this option checks for and removes these directories. + +```{seealso} +[Python Packaging User Guide - Names and Normalization](https://packaging.python.org/en/latest/specifications/name-normalization/) ``` -## download-mirror +### `workers` -By default bandersnatch downloads packages from the URL supplied in the master server server's json response. -This option asks bandersnatch to try to download from the configured PyPI mirror first, and fallback to the -URL supplied by the master server if it was not successful (unable to get content or checksum mismatch). -This is useful to sync most of the files from an existing, nearby mirror, for example when setting up a new -server sitting next to an existing one for the purpose of load sharing. +The number of worker threads used for parallel downloads. -Example: +:Type: number, 1 ≤ N ≤ 10 +:Required: **yes** -```ini -[mirror] -download-mirror = https://pypi-mirror.example.com/ +Use **1 - 3** workers to avoid overloading the PyPI master (and maybe your own internet connection). If you see timeouts and have a slow connection, try lowering this setting. + +Official servers located in data centers could feasibly run up to 10 workers. Anything beyond 10 is considered unreasonable. + +### `verifiers` + +The number of concurrent consumers used for verifying metadata. + +:Type: number +:Required: no +:Default: 3 + +```{seealso} +This option is used by the subcommand. ``` -## simple-format +### `stop-on-error` -Format for Simple API to be stored in. With PEP691 we now have HTML and JSON formats. +Stop mirror/sync operations immediately when an error occurs. -Valid options are: +:Type: boolean +:Required: **yes** -- ALL -- HTML -- JSON +When disabled (`stop-on-error = false`), Bandersnatch continues syncing after an error occurs, but will mark the sync as unsuccessful. When enabled, Bandersnatch will stop all syncing as soon as possible if an error occurs. This can be helpful when debugging the cause of an unsuccessful sync. -Default: `ALL` formats +### `compare-method` -```ini -simple-format = ALL +The method used to compare existing files with upstream files. + +:Type: one of `hash`, `stat` +:Required: no +:Default: `hash` + +- `hash`: compare by creating a checksums of a local file content. This is slower than `stat`, but more reliable. The hash algorithm is specified by [](#digest_name). +- `stat`: compare by using file size and change time. This can reduce IO workload when frequently verifying a large number of files. + +### `digest_name` + +The algorithm used to compute file hashes when [](#compare-method) is set to `hash`. + +:Type: one of `sha256`, `md5` +:Default: `sha256` + +### `keep_index_versions` + +Store previous versions of generated index files. + +:Type: number +:Required: no +:Default: 0 (do not keep previous index versions) + +This can be used as a safeguard against upstream changes generating blank index.html files. + +By default or when set to 0, no prior versions are stored and `index.html` is the latest version. + +When enabled by setting a value > 0, Bandersnatch stores the most recently generated versions of each index file, up to the configured number of versions. Prior versions are stored under `versions/index__.html` and the current `index.html` is a symlink to the latest version. + +### `log-config` + +Provide a custom logging configuration file. + +:type: file path +:Required: no +:Default: none + +The file must be a Python `logging.config` module configuration file in INI format, as used with [](inv:python:py:function:#logging.config.fileConfig). The specified configuration replaces Bandersnatch's default logging configuration. + +```{seealso} + +% myst-inv.exe 'https://docs.python.org/3' -d std -o label -n logging-config-fileformat + +Refer to [](inv:python:std:label#logging-config-fileformat) for the logging configuration file format. ``` -## sample-log-config +#### Sample Alternative Logging Configuration ```ini [loggers] @@ -298,3 +486,137 @@ delay=False args=('/repo/bandersnatch/banderlogfile.log', 'D', 1, 0) ``` + +## Folder Structures + +### `simple-format` index files + +Folder structure of generated index files for [](#simple-format): + +```text +/ +└── web/ + ├── packages/... + └── simple/ + ├── index.html + ├── index.v1_html + ├── index.v1_json + ├── someproject/ + │ ├── index.html + │ ├── index.v1_html + │ └── index.v1_json + ├── anotherproject/ + │ ├── index.html + │ ├── index.v1_html + │ └── index.v1_json + └── ... +``` + +This path structure is compatible with the [Simple Repository API][simple-repository-api]. + +If `simple-format` is set to `HTML`, Bandersnatch will only create `index.html` and `index.v1_html`. If `simple-format` is set to `JSON`, it will only create `index.v1_json`. + +### `release-files` folder structure + +Package release files are distributed into subdirectories based on their checksum: + +```text +/ +└── web/ + ├── packages/ + │ ├── 1a/ + │ │ └── 70/ + │ │ └── e63223f8116931d365993d4a6b7ef653a4d920b41d03de7c59499962821f/ + │ │ └── click-8.1.6-py3-none-any.whl + │ ├── 8b/ + │ │ ├── 3a/ + │ │ │ └── b569b932cf737b525eb4c7a2b615ec07b102dff64f1d8a0fe52a48b911fc/ + │ │ │ └── diff-2023.12.5.tar.gz + │ │ └── e2/ + │ │ └── 4823d9f02d2743a02e2c236f98b96b52f7a16b2bedc0e3148322dffbd06f/ + │ │ └── black-24.1.0-cp39-cp39-win_amd64.whl + │ ├── 31/ + │ │ ├── 5f/ + │ │ │ └── ... + │ │ └── 7a/ + │ │ └── ... + │ └── ... + └── simple/ + ├── click/ + ├── diff/ + ├── black/ + ├── ... + └── index.html +``` + +By default, generated index files contain releative links into the `web/packages/` directory. + +### `json` API metadata files + +Folder structure of saved PyPI project metadata when [](#json) is enabled: + +```text +/ +├── web/ +│ └── json/ +│ ├── someproject +│ ├── anotherproject +│ └── ... +├── pypi/ +│ ├── someproject/ +│ │ └── json +│ ├── anotherproject/ +│ │ └── json +│ └── ... +├── packages/ +│ └── ... +└── simple/ + └── ... +``` + +The files `web/json/someproject` and `web/pypi/someproject/json` both contain the JSON metadata for a PyPI project with the normalized name "someproject". + +### `hash-index` index files + +When [](#hash-index) is enabled, project index folders are grouped by the first letter of their name - for example: + +```text +/ +└── web/ + └── simple/ + ├── b/ + │ ├── boto3/ + │ │ └── index.html + │ └── botocore/ + │ └── index.html + ├── c/ + │ ├── charset-normalizer/ + │ │ └── index.html + │ ├── certifi/ + │ │ └── index.html + │ └── cryptography/ + │ └── index.html + ├── t/ + │ └── typing-extensions/ + │ └── index.html + ├── ... + └── index.html +``` + +The content of the index files themselves is unchanged. + +## Default Configuration File + +Bandersnatch loads default values from a configuration file inside the package. You can use this file as a reference or as the basis for your own configuration. + +```{literalinclude} ../src/bandersnatch/default.conf +--- +name: default.conf +language: ini +caption: Default configuration file from `src/bandersnatch/default.conf` +--- +``` + +[filter-plugins]: ./filtering_configuration.md +[simple-repository-api]: https://packaging.python.org/en/latest/specifications/simple-repository-api/ +[storage-backends]: ./storage_options.md From 04d38d774a3070d0f4447abe2d09894d0509880c Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Sun, 3 Mar 2024 11:07:37 -0800 Subject: [PATCH 4/7] Bump black from 24.1.1 to 24.2.0 (#1675) Bumps [black](https://github.com/psf/black) from 24.1.1 to 24.2.0. - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](https://github.com/psf/black/compare/24.1.1...24.2.0) --- updated-dependencies: - dependency-name: black dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Cooper Lees --- requirements_test.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements_test.txt b/requirements_test.txt index ef8f51090..69c176cbe 100644 --- a/requirements_test.txt +++ b/requirements_test.txt @@ -1,5 +1,5 @@ async-timeout>=4.0.0a3 -black==24.1.1 +black==24.2.0 coverage==7.4.1 flake8==7.0.0 flake8-bugbear==24.2.6 From 3994f1e54c1386711eeaee00e9ce69c17fde4356 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Sun, 3 Mar 2024 11:12:55 -0800 Subject: [PATCH 5/7] Bump boto3 from 1.34.41 to 1.34.54 + botocore (#1677) * Bump boto3 from 1.34.41 to 1.34.54 Bumps [boto3](https://github.com/boto/boto3) from 1.34.41 to 1.34.54. - [Release notes](https://github.com/boto/boto3/releases) - [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst) - [Commits](https://github.com/boto/boto3/compare/1.34.41...1.34.54) --- updated-dependencies: - dependency-name: boto3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] * Update requirements_s3.txt --------- Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Cooper Lees --- requirements_s3.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/requirements_s3.txt b/requirements_s3.txt index 69e2d3317..70a017161 100644 --- a/requirements_s3.txt +++ b/requirements_s3.txt @@ -1,5 +1,5 @@ -boto3==1.34.41 -botocore==1.34.41 +boto3==1.34.54 +botocore==1.34.54 jmespath==1.0.1 python-dateutil==2.8.2 s3path==0.5.0 From be03d8eebdfdda72a6609206f0bd35f8667003a5 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 4 Mar 2024 08:46:49 -0800 Subject: [PATCH 6/7] Bump coverage from 7.4.1 to 7.4.3 (#1679) Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.4.1 to 7.4.3. - [Release notes](https://github.com/nedbat/coveragepy/releases) - [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst) - [Commits](https://github.com/nedbat/coveragepy/compare/7.4.1...7.4.3) --- updated-dependencies: - dependency-name: coverage dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- requirements_test.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements_test.txt b/requirements_test.txt index 69c176cbe..88cba7b3d 100644 --- a/requirements_test.txt +++ b/requirements_test.txt @@ -1,6 +1,6 @@ async-timeout>=4.0.0a3 black==24.2.0 -coverage==7.4.1 +coverage==7.4.3 flake8==7.0.0 flake8-bugbear==24.2.6 freezegun==1.4.0 From 98578bf01a18e2d6307b260ae76d113c5dcbc3c8 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 4 Mar 2024 08:53:47 -0800 Subject: [PATCH 7/7] Bump sphinx-argparse-cli from 1.11.1 to 1.13.1 (#1678) Bumps [sphinx-argparse-cli](https://github.com/tox-dev/sphinx-argparse-cli) from 1.11.1 to 1.13.1. - [Release notes](https://github.com/tox-dev/sphinx-argparse-cli/releases) - [Changelog](https://github.com/tox-dev/sphinx-argparse-cli/blob/main/CHANGELOG.md) - [Commits](https://github.com/tox-dev/sphinx-argparse-cli/compare/1.11.1...1.13.1) --- updated-dependencies: - dependency-name: sphinx-argparse-cli dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Cooper Lees --- requirements_docs.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements_docs.txt b/requirements_docs.txt index b1690d345..1b97d138e 100644 --- a/requirements_docs.txt +++ b/requirements_docs.txt @@ -7,7 +7,7 @@ six==1.16.0 sphinx==7.2.6 MyST-Parser==2.0.0 xmlrpc2==0.3.1 -sphinx-argparse-cli==1.11.1 +sphinx-argparse-cli==1.13.1 git+https://github.com/pypa/pypa-docs-theme.git#egg=pypa-docs-theme git+https://github.com/python/python-docs-theme.git#egg=python-docs-theme