Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add yt-dlp based archiving for TwitterArchiver #138

Merged
merged 4 commits into from
Apr 15, 2024

Conversation

JettChenT
Copy link
Contributor

This PR adds a yt-dlp based twitter archiving function to TwitterArchiver as a fallback to the existing two archiving strategies. It uses the _extract_status function of yt-dlp's TwitterIE extractor to extract tweet metadata, and processes it in a similar way to the existing archiving implementation.

Upon local testing, the existing snscrape(which seems to be unmaintained) and twitter-hack solution does not work reliably for tweets, but the yt-dlp based solution does. Happy to know if it can be replicated!

Also happy to add a configuration option to TwitterArchiver for specifying the preference of tweet archiving methods(snscrape/twitter-hack/yt-dlp)

@msramalho msramalho self-assigned this Apr 9, 2024
else:
variant = var.get("src") if not variant else variant
variant = var if not variant else variant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these key changes in variant_choosing may break the other scrapers, but as they are well... broken ... it's just fine for now.

Copy link
Contributor

@msramalho msramalho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good and impactful contribution, thank you for taking the time to properly implementing this new alternative. As we're building on 3 and ytdlp is the most likely to work now, I would accept refactoring the priorities of the code but that's an extra and makes little impact in the running time. will merge

@msramalho msramalho merged commit cf8691b into bellingcat:main Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants