Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example #33

Merged
merged 7 commits into from
Apr 8, 2024
Merged

Example #33

merged 7 commits into from
Apr 8, 2024

Conversation

jgarciab
Copy link
Contributor

Improved examples and README

@jgarciab jgarciab requested a review from modhurita February 29, 2024 11:07
@jgarciab
Copy link
Contributor Author

Please @modhurita review, and add yourself to the citation, once we have a new version we can put it in Zenodo and add your name there too

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated
scraper.close()
```
To download data from GoogleArt it is necessary to install
[Firefox](https://www.mozilla.org/en-US/firefox/new/) and `geckodriver`. Geckodriver is installed automatically when you run the code for the first time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not confusing to first say that geckodriver needs to be installed, and then say that it is installed automatically? Maybe it is better to leave out any mention of geckodriver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's remove geckodriver

]
},
{
"cell_type": "code",
"execution_count": null,
"id": "54afc420",
"execution_count": 5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cell 5 is empty and needs to be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please

@modhurita
Copy link
Contributor

Hi @jgarciab :

Thanks for your work on improving this repository!

I have provided some comments above, and changed some things myself online. I hope they look okay to you.

I could run the Google Arts & Culture parts of the example notebooks, but not the WikiArt ones. I obtained API keys by creating a WikiArt account, and placed them in the examples directory, as per the instructions in the README. However, the very first example cell didn't execute successfully. This is the error I got:

FileNotFoundError                         Traceback (most recent call last)
~/ResearchEngineering/artscraper/artscraper/wikiart.py in __init__(self, output_dir, skip_existing, min_wait, timeout)
     23         try:
---> 24             with open(".wiki_session", "r", encoding="utf-8") as f:
     25                 self.session_key = f.read()

FileNotFoundError: [Errno 2] No such file or directory: '.wiki_session'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_13612/1175252453.py in <module>
      3 art_url = "https://www.wikiart.org/en/edvard-munch/anxiety-1894"
      4 
----> 5 with WikiArtScraper() as scraper:
      6     scraper.load_link(art_url)
      7     metadata = scraper.get_metadata()

~/ResearchEngineering/artscraper/artscraper/wikiart.py in __init__(self, output_dir, skip_existing, min_wait, timeout)
     25                 self.session_key = f.read()
     26         except FileNotFoundError:
---> 27             self._new_session()
     28             with open(".wiki_session", "w", encoding="utf-8") as f:
     29                 f.write(self.session_key)

~/ResearchEngineering/artscraper/artscraper/wikiart.py in _new_session(self)
     66                                 },
     67                                 timeout=self.timeout)
---> 68         self.session_key = json.loads(response.text)["SessionKey"]
     69         self.last_request = time.time()
     70 

KeyError: 'SessionKey'

Finally, at which position in the names list in the citation should I add my name?

Thanks,
Modhurita

@jgarciab
Copy link
Contributor Author

jgarciab commented Apr 8, 2024

Hi Modhurita, I only tried with the interactive version, could you try to figure it out? That was Raoul's part, if you don't understand it how it works maybe you could ask him.

You can add your name in second place in the citation if you are okay with that.

@modhurita
Copy link
Contributor

Hi @jgarciab :

The WikiArt part seems to work now - not sure why I got that error earlier. I have made the other changes. I approve the pull request; you can now merge this branch into main now.

@modhurita modhurita merged commit 1ca4ab6 into main Apr 8, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants