Assorted scripts used for LSH batch upload. The work originally done on the redux branch (now merged) was a first attempt to cleanup the scripts and include the original steps needed to convert the original csv files to those required for the input
Apart from the exceptions mentioned below this was produced by Lokal_Profil. Note however that the majority of the codebase is old and the coding might make you want to cry (I know that is how I feel when I see it).
The SQL scripts were created by Fredrik Andersson at LSH.
Requires WikiApi and lokal-profil/BatchUploadTools.
WikiApi is based on PyCJWiki Version 1.31 (C) by Smallman12q GPL, see http://www.gnu.org/licenses/.
BatchUploadTools can be installed via pip install git+https://github.com/lokal-profil/BatchUploadTools.git
Note: You might have to add the --process-dependency-links
flag to the above
command if you are running a different version of pywikibot from the required one.
This bot does not (yet) support OAuth, hence you must use [[Special:BotPasswords]] with a bot account given the following grants:
- Basic rights
- High-volume editing
- Upload new files
For clean up it is also recommended that it is given:
- Create, edit, and move pages
- Upload, replace, and move files
All of these should be run from the main code directory. Note that up to step 10 all can be done without the actual image files.
- Copy config.example.json to config.json and fill in your username and password you also need to set up user_config.py (from pywikibot) with the same credentials.
- Run
python py_listscraper.py old_connections data
, to make a copy of the latest Commons mappings - Unpack the new csv files to the
original_csv
directory - Update
csv_config.json
to the new file names - Run
python py_prepCSVData.py
, to populate a newclean_csv
directory - Run
python py_analyseCSVData.py
and fix any errors, repeat until no actionable errors
- creates
csv_analys.log
- Run
python py_crunchCSVData.py
, to populate a newdata
directory
- Note that this takes some time and that there are three prompts at the start
- individual log files are found in
data/logs
-
Run
python py_filenames.py
, to generate the filenames -
Run
python py_makeMappings.py
, to create mapping tables for Commons -
Upload the mapping tables to the right place
-
Do the actual mappings...
-
Run
python py_listscraper.py
, to populate a newconnections
directory and updatefilenames.csv
- If filenames are updated then don't run again until Commons table has been updated.
- Run
python py_prepUpload.py moveHits ../bilder
where../bilder
is the relative path to the main image directory. Moves the relevant files to base directories and adds extension tofilenames.csv
- Run
python py_prepUpload.py makeAndRename ../bilder/m_a batchCat
etc. for each of the new image subdirectory. WherebatchCat
, if provided is a datestamp (e.g.2015-11
) used for collecting the images in a category. Creates info files and renames files
- Check
¤generator.log
for possible problems
- Run
python py_prepUpload.py negatives ../bilder/m_a
etc. for each of the image subdirectory containing negatives. Creates a positive version and renames correctly
- Series with negatives are A, B, D, E, G, O
- Check
¤imageMagick-errors.log
for error reports
- Run
python py_prepUpload.py negativeCleanup ../bilder/m_a
etc. for each of the image subdirectory where the previous step was run
- Check
¤conversion-errors.log
for problematic conversions (fix manually)
- Run
python py_Uploader.py -path:../bilder/m_a
etc. to upload the files
- Successful uploads end up in the
Uploaded
subdirectory - Failed uploads in the
Uploaded_errors
subdirectory - Uploads with warnings in the
Uploaded_warnings
subdirectory - Details on problematic uploads can be found in
¤uploader.log
(fix manually, often by just trying again...)
- Run
python py_postUpload.py purge
to purge LSH-files in Category:Files with broken file links - Look at
postAnalysis/BrokenFileLinks.csv
to identify any remaining files with broken file links. Add any known renames after the pipe (excluding prefix but including the file extension) - Run
python py_postUpload.py rename
to repair file pages linking to renamed files and updating the list of broken links
- These indicate missing files, these can some times be uploaded manually but should otherwise be unlinked.
- Run
python py_postUpload.py findMissing
to checkfilenames.csv
for any files not present on Commons
- This also generates an export file with photo-id to url-links for LSH
Some basic tests have been added to simplify maintenance/improvement of
the code base. Run these with tox
.
The maintenance directory contains scripts which may be useful for one-off actions in relation to the batch upload.
upload_dupes
replaces previously uploaded files without changing their description pages.replace_descriptions
attempts to replace previously uploaded descriptions while preserving any changes made by users since the upload.