-
Notifications
You must be signed in to change notification settings - Fork 9
Dealing with medias
GDPR archives also contain all medias uploaded to Twitter, linked to tweets and DMs.
Media access is restricted to GDPR archives !
Please know first that medias linked to tweets can be obtained through their URL (always public, even if the account is protected). Direct message medias are protected with OAuth, so you should use a Twitter application in order to get medias via API.
Otherwise, twitter-archive-reader
can find for you the right file linked to a direct message, tweets or more with methods available on the MediaArchive
instance, located on the .medias
property of TwitterArchive
.
Some methods available on this object are made to facilitate access to tweet and DM medias.
import TwitterArchive, { MediaArchive } from 'twitter-archive-reader';
const archive = new TwitterArchive('filename');
await archive.ready();
const medias = archive.medias;
You can rapidly check if your archive has medias in it with the .has_medias
property.
It will return true
if you can access some medias.
If this property is false
, every call to file getter methods will be rejected with an Error
containing "File not found"
.
In archives made between June 2019 and December 2019, media files were zipped inside the archive.
In this case, twitter-archive-reader
must extract the ZIP from the original archive to read its content.
This cause a huge overhead when first accessing selected media archive, and may be fatal for RAM-limited systems with very big archives.
To know if medias are zipped inside the archive, you can use the .is_medias_zipped
property of MediaArchive
.
if (medias.is_medias_zipped) {
// Try to avoid media getters through archive
}
An enumeration (MediaArchiveType
) is available to reference each supported folder by MediaArchive
.
Enumeration items are used with .get()
and .list()
methods.
enum MediaArchiveType {
SingleDM, GroupDM, Moment, Tweet, Profile
}
Each enum is respectively linked to direct_message_media
, direct_message_group_media
, moments_media
,
tweet_media
and profile_media
folders in GDPR archives.
You can import it as a component twitter-archive-reader
package.
import { MediaArchiveType } from 'twitter-archive-reader';
A bunch of methods on MediaArchive
allow raw and easy access to medias from various sources.
Listing and getting files is totally asynchronous, so every method returns a Promise
.
When you try to get a media, if it is not found (that can happen !), the Promise
will be rejected.
Note:
On each method that extract file(s), they can be returned in two formats: Blob
or ArrayBuffer
.
Choice is controlled by the last parameter on each file getter method, as_array_buffer
.
By default, with as_array_buffer === undefined
, the return type will be Blob
if the platform supports it (generally, in browsers), otherwise it will be ArrayBuffer
(in Node.js).
You can force return type by setting as_array_buffer
to true
(=> ArrayBuffer
) or false
(=> Blob
).
For type safety, it is recommanded to always explicitely set this parameter.
You can get available files on a directory with .list(archive_type)
method.
// Get a list of files in tweet_media directory (medias related to tweets)
const filenames = await archive.medias.list(MediaArchiveType.Tweet);
By using .get(archive_type, filename, as_array_buffer)
, you can get a filename in the media archive of your choice.
const file = await archive.medias.get(MediaArchiveType.Tweet, filenames[0], /* as_array_buffer */);
When you have a direct message, you don't -directly- have the related media filenames attached to it.
Some helpers are here to guide you: .ofDm()
and .fromDmMediaUrl()
.
const dm = archive.messages.single('dm_id');
// You can get directly all the medias of a DM
const medias_of_my_dm: (Blob | ArrayBuffer)[] = await archive.medias.ofDm(dm, /* as_array_buffer */);
// ...or you can get one of them via a media URL
if (dm.mediaUrls.length) {
const media_1 = await archive.medias.fromDmMediaUrl(
dm.mediaUrls[0],
/* is_group */, // You should specify here if a message come or not from a group conversation
/* as_array_buffer */
);
}
The same kind of methods exists for tweets.
Some helpers are here to guide you: .ofDm()
and .fromTweetMediaEntity()
.
const tweet = archive.tweets.all[0];
// You can get directly all the medias of the tweet
const medias_of_tweet: (Blob | ArrayBuffer)[] = await archive.medias.ofTweet(tweet, /* as_array_buffer */);
// ...or you can get one of them via a media entity
if (tweet.extended_entities || tweet.entities) {
// Always try to use extended entities instead of classic entities
const m_entities = (tweet.extended_entities || tweet.entities).media;
if (m_entities && m_entities.length) {
const media_file = archive.medias.fromTweetMediaEntity(m_entities[0]);
}
}
Helpers .getProfilePictureOf()
and .getProfileBannerOf()
allows you to get easily profile medias.
The both methods take a UserData
instance in parameter. Usally, this is archive.user
.
const [profile, header] = await Promise.all([
archive.medias.getProfilePictureOf(archive.user),
archive.medias.getProfileBannerOf(archive.user)
]);
On browser, by default, getters returns Blob
on browsers.
This facilitate usage inside <img>
and <video>
tags.
const msg = archive.messages.single('dm_id');
/* Browser */
// Get the image
const blob = await archive.medias.fromDmMediaUrl(msg.mediaUrls[0], false, false) as Blob;
// Create a URL and set it as img
const url = URL.createObjectURL(blob);
document.querySelector('img').src = url;
On Node.js, getters returns ArrayBuffer
by default.
You can also force it using true
as last parameter on file getter methods.
/* Node.js */
// Get the image
const array_buffer = await archive.medias.fromDmMediaUrl(msg.mediaUrls[0], false, true) as ArrayBuffer;
// Write the file to disk
fs.writeFileSync('test_dir/my_img.jpg', Buffer.from(array_buffer));
Next part is Explore favorites.
- Direct Messages