Reports bot maintains reports and other useful things for WikiProjects.
First, clone the bot:
git clone [email protected]:harej/reports_bot.git
cd reports_bot
Python 3.4+ is required. You should set up and activate a
virtual environment in the venv
directory, though this is not required. The following workaround may be
necessary on Debian systems:
python3 -m venv --without-pip venv
source venv/bin/activate
curl https://bootstrap.pypa.io/get-pip.py | python
Next, install these dependencies:
pip install mwoauth requests PyYAML oursql3 mwparserfromhell BTrees \
mediawiki-utilities numpy scikit-learn
You also need Pywikibot. You
can pip install pywikibot
, but I recommend installing it from source to get
the latest updates:
cd venv
git clone https://gerrit.wikimedia.org/r/pywikibot/core.git pywikibot
cd pywikibot
git submodule update --init
python setup.py develop
If you set up a virtualenv, run the following command to ensure the bot's task runner always uses it:
sed -e "1s|.*|#! $PWD/venv/bin/python|" -i "" ./run
Depending on your setup, you may wish to create a separate, unprivileged user for the bot. The recommended method is:
sudo adduser --system --home /path/to/reportsbot reportsbot
If so, make sure to create the bot's config
and logs
directories with the
appropriate ownership:
mkdir config logs && sudo chown reportsbot config logs
Reports bot uses a MySQL database to store its on-wiki config, the WikiProject
page indices, and some other information. Create this database using the schema
described in schema.sql
. You may change the database name—for example, on
Labs, it should probably be something like sXXXXX__wpx
. Make note of this for
the next step.
The bot requires a config/config.yml
file for itself and
config/user-config.py
file for Pywikibot. The easiest way to create these
is with the configuration assistant:
./run -q configure
(As described below, you should use sudo ./run -q configure
if the bot is
running under an unprivileged user.)
Afterwards, you may edit these files manually whenever necessary.
Reports bot's standard tasks are located in the tasks/
directory. A ./run
script is provided to make things simple, assuming you've followed the standard
setup procedure above.
To run a task located at tasks/foobar.py
:
./run foobar
For a full description of the command-line interface:
./run --help
If you're using a separate user for the bot, be aware that the ./run
script
tries to ensure that it is running under the account that owns the config
directory. You can run jobs from reportsbot
's crontab using the plain ./run
syntax, but manual jobs under your own account should be initiated with
sudo ./run
.
If you prefer to use reportsbot
as a regular Python package and execute task
files at arbitrary locations, you can use this syntax, which supports the same
arguments as ./run
:
python3 -m reportsbot.cli full/path/to/task.py
python3 -m reportsbot.cli --help
The bot stores logs in the logs/
directory unless -t
(--traceless
) is
passed to ./run
. A few different kinds of logs are kept:
all.log
stores non-DEBUG level logs for all tasks. It automatically rotates when it grows large.all.err
stores WARNING-level logs and above for all tasks. It automatically rotates when it grows large.<sitename>/<taskname>.log
stores non-DEBUG level logs for the specified task running on the specified site. It automatically rotates nightly.<sitename>/<taskname>.err
stores WARNING-level logs and above. It automatically rotates when it grows large.<sitename>/<taskname>.log.verbose
stores full logs for the last run of the task. It is cleared at the start of each run.
The bot also prints all logs (including DEBUG-level) to standard error unless
-q
(--quiet
) is passed to ./run
, in which case only ERROR-level and
higher are printed. This option can be useful for cron jobs; if you set up cron
to email you the output of ./run -q
, you will be notified immediately when
problems occur.
A number of tasks are provided. Advice on developing your own is given at the end of this section.
load_project_config
: Loads WikiProject-specific configuration from the wiki and stores it in the bot's database.metrics
: Updates monthly metrics on the number of articles in a project.new_discussions
: Provides a list of new discussions within a WikiProject's scope.update_members
: Updates WikiProject membership lists based on WikiProjectCard transclusions.update_project_index
: Updates the index of articles associated with each project.
To create new tasks, you can follow the skeleton in tasks/example.py
.
Important things to keep in mind:
- The name of the task class doesn't matter (the bot searches by filename), but it should be descriptive.
- The
run
method is the only method called by the task runner, other than__init__
, which should only do inexpensive setup if you override it. - There should only be one Task subclass per module.
__all__
is used to identify which class to run in case multiple exist in the module namespace, like if you import other Task subclasses to use their methods.
The task has access to two important attributes:
self._bot
is the Bot instance, which provides the following functionality:self._bot.site
: the Pywikibot site instanceself._bot.wikidb
: a database connection to the wiki replicaself._bot.localdb
: a connection to the bot's local databaseself._bot.wikidata
: an interface to Wikidata Some methods are available for working with WikiProjects in a structured manner. See thereportsbot.bot.Bot
class documentation for details.
self._logger
is the Logger instance that you should use for all log messages.print
and writing tostdout
directly should be avoided.