Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

[Bug] [ingestion-server] Tldextract caching error #139

Closed
1 task
zackkrida opened this issue Jul 9, 2021 · 2 comments
Closed
1 task

[Bug] [ingestion-server] Tldextract caching error #139

zackkrida opened this issue Jul 9, 2021 · 2 comments
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix help wanted Open to participation from the community 🟧 priority: high Stalls work on the project or its dependents

Comments

@zackkrida
Copy link
Member

Description

Error message from production:

2021-07-09 16:31:52,160 WARNING cache.py:166 - unable to cache publicsuffix.org-tlds.{'urls': ('https://publicsuffix.org/list/public_suffix_list.dat', 'https://raw.githubusercontent.com/publicsuffix/list/master/public_suffix_list.dat'), 'fallback_to_snapshot': True} in /home/supervisord/.cache/python-tldextract/3.9.6.final__local__ecb11d__tldextract-3.1.0/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json. This could refresh the Public Suffix List over HTTP every app startup. Construct your `TLDExtract` with a writable `cache_dir` or set `cache_dir=False` to silence this warning. [Errno 13] Permission denied: '/home/supervisord'
2021-07-09 16:31:52,623 INFO cleanup.py:162 - https://www.flickr.com/photos/35762655@N04:200
2021-07-09 16:31:52,624 INFO cleanup.py:80 - Tested domain www.flickr.com

As this error report suggests, our docker user isn't able to write to the default tld suffix list. This causes the cleanup job to take significantly longer, as each round it needs to download the public suffix list.

Reproduction

Should be reproducible by attempting to run the curl -XPOST localhost:8001/task -H "Content-Type: application/json" -d '{"model": "image", "action": "INGEST_UPSTREAM"}' task, at which point the above error would be visible during the cleaning process.

Resolution

  • 🙋 I would be interested in resolving this bug.
@zackkrida zackkrida added 🟧 priority: high Stalls work on the project or its dependents 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository labels Jul 9, 2021
@zackkrida zackkrida changed the title [Bug] [ingestion-server] [Bug] [ingestion-server] Tldextract caching error Jul 9, 2021
@zackkrida zackkrida added help wanted Open to participation from the community python and removed 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work labels Jul 9, 2021
@dhruvkb
Copy link
Member

dhruvkb commented Jul 10, 2021

This cURL request is run as a part of the load_sample_data.sh script. But I have not seen this message appear in the logs for that script.

@zackkrida
Copy link
Member Author

This seems related to the production infra, and not the api itself. Closing this and will create a new ticket there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix help wanted Open to participation from the community 🟧 priority: high Stalls work on the project or its dependents
Projects
None yet
Development

No branches or pull requests

2 participants