Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All jpg files were ignored during campaign file upload round one creation #234

Open
geertivp opened this issue Oct 6, 2023 · 3 comments

Comments

@geertivp
Copy link

geertivp commented Oct 6, 2023

The default file type for images on Wikimedia Commons is jpg.

When using the Montage file interface to create the first round of a campaign, all files in the campaign were ignored, because only jpeg is registered in module montage/rdb.py

DEFAULT_ALLOWED_FILETYPES = ['jpeg', 'png', 'gif', 'svg', 'tiff', 'xcf', 'webp']
Problem:
This is a blocking error, when using the file interface.

Solution:

  • 'jpg' must be explicitly added in the module rdb.py.

This problem did not occur when using a Category upload.

  • Example upload file:
img_name
17e-eeuws Statenjacht van Utrecht Museumschip Tordino Plassendale 26-07-2023 11-42-06.jpg
Amel, Kirche Sankt Huberrtus oeg31027 IMG 7647 2023-08-28 14.48.jpg
Amel-Iveldingen, de Sankt Barbara Kapelle oeg31007 IMG 7720 2023-08-29 10.48.jpg
...

More context:

@mahmoud
Copy link
Member

mahmoud commented Oct 6, 2023

Hey @geertivp! Thanks for this. The DEFAULT_ALLOWED_FILETYPES are not extensions, but actually MIME types (just the minor type, since image is presumed), as used by Commons. See the highlight in this screenshot:

Screenshot from 2023-10-06 10-09-44

I was able to load the images when I loaded it as a "File List", but got failures when trying to load it as a google sheet and CSV. The UI error says that disqualifications were due to round settings, but the server logs show that entries simply weren't loaded (see screenshot of logs below), so there's something else going on.

Screenshot from 2023-10-06 10-35-44

Since you've worked around via Category import, I'll dig into it as time allows. Thanks again for the report!

@mahmoud
Copy link
Member

mahmoud commented Oct 6, 2023

Ah, it just occurred to me, img_name. Another workaround.

If you export your file list as a CSV (basically put quotes around all image filenames, but the easiest/best way is to export from Excel/GSheets), and also make the first row be filename (no quotes), instead of img_name, then upload to https://gist.github.com and use the "Raw URL" (should be a gist.githubusercontent.com URL), then the load will work. (example)

This is very finicky, and we'll have to improve this in the next version, but for now there's another workaround for folks with spreadsheets / long file lists. Thanks again!

@geertivp
Copy link
Author

geertivp commented Oct 6, 2023

Actually, I was wrong in the file contents sample above. When I initially encountered the problem, I actually loaded a file URL from a webserver with the following content:

img_name
17e-eeuws_Statenjacht_van_Utrecht_Museumschip_Tordino_Plassendale_26-07-2023_11-42-06.jpg
Amel,_Kirche_Sankt_Huberrtus_oeg31027_IMG_7647_2023-08-28_14.48.jpg
Amel-Iveldingen,_de_Sankt_Barbara_Kapelle_oeg31007_IMG_7720_2023-08-29_10.48.jpg
...

(having "_" in the images file names instead of spaces...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants