Link between archive files to data structures in twitter-archive-reader module
In order to facilitate reading, TwitterArchive
instance will be presented as archive
.
When files contains nested data structures, presented properties are the unwrapped ones.
When a property isn't present/read/stored by twitter-archive-reader
, a X will be used as description.
This part will link GDPR archive files and properties to used data structures.
- email:
archive.user.email_address
- createdVia: X
- username:
archive.user.screen_name
- accountId:
archive.user.id
- createdAt:
archive.user.created_at
- accountDisplayName:
archive.user.name
- accountId:
archive.user.id
- userCreationIp:
archive.user.account_creation_ip
Due to lack of information in the package creator dataset, this file could not be parsed.
- accountId:
archive.user.id
- timeZone:
archive.user.timezone
Every ad.adsUserData.adEngagements.engagements
array in file is merged into archive.ads.engagements
.
Every ad.adsUserData.adImpressions.impressions
array in file is merged into archive.ads.impressions
.
Every ad.adsUserData.attributedMobileAppConversions.conversions
array in file is merged into archive.ads.mobile_conversions
.
This file is not parsed.
Every ad.adsUserData.attributedOnlineConversions.conversions
array in file is merged into archive.ads.online_conversions
.
This file is not parsed.
- age:
archive.user.age.age
- birthDate:
archive.user.age.birthDate
Every accountId
in this file is available in archive.blocks
set.
This file is not parsed.
Every connectedApplication
in this file is merged into archive.user.authorized_applications
array.
Due to lack of information in the package creator dataset, this file could not be parsed.
This file is not parsed. (maybe TODO ?)
All direct messages are available, grouped by conversation, in archive.messages
.
If you look for events other than direct messages, they're grouped by message and are available in .events.before
and .events.after
properties of a LinkedDirectMessage
object, or via .events()
generator of a Conversation
instance.
Direct message "headers" files are not parsed, because their content is the same as direct messages files, but without the text.
Every emailChange
available is grouped in archive.user.email_address_history
array.
Every accountId
in this file is available in archive.followers
set.
Every accountId
in this file is available in archive.followings
set.
Every ipAudit
in this file is grouped into archive.user.last_logins
array.
Likes are organisated into archive.favorites
.
You can get a set of favorited tweets ID with archive.favorites.registred
,
and get every like
object of this file with archive.favorites.all
.
Nested URLs of those files are respectively in archive.lists.created
,
archive.lists.member_of
and archive.lists.subscribed
.
Moments are stored in archive.moments
. You will find every moment
properties of the file
merged in this array.
Due to lack of data about moments, type definitions maybe incomplete or incorrect, you're warned.
Every accountId
in this file is available in archive.mutes
set.
In this file, every niDeviceResponse.pushDevice
will be merged in archive.user.devices.push_devices
array,
and every niDeviceResponse.messagingDevice
are merged in archive.user.devices.messaging_devices
array.
Due to lack of information in the package creator dataset about Periscope informations, all files related to Periscope aren't parsed.
Personalization data is parsed and rearranged in archive.user.personalization
.
See UserPersonalization
interface in types/GDPRUserInformations.ts
for more details about how the data is rearranged.
- phoneNumber:
archive.user.phone_number
- bio:
archive.user.bio
- website:
archive.user.url
- location:
archive.user.location
- avatarMediaUrl:
archive.user.profile_img_url
- headerMediaUrl:
archive.user.profile_banner_url
Every protectedHistory
object is stored in an array available at archive.user.protected_history
.
Due to lack of information in the package creator dataset, this file could not be parsed.
Every screenNameChange.screenNameChange
object is stored in an array available at archive.user.screen_name_history
.
Every tweet is parsed and stored into archive.tweets
container.
Some properties can change in GDPR archive tweets in order to ensure compatibility between multiple types of archives. To know more about tweets, please see the related documentation "Tweet access and manipulating tweets" in the wiki.
You can check if archive owner is verified with the boolean archive.user.verified
.
This part will link GDPR archive directories data to twitter-archive-reader
.
Folders in GDPR archives store media data, like tweet images, videos, direct messages and profile medias.
You can access medias with the MediaArchive
instance, located on the .medias
property of Twitter Archive Reader.
Some methods available on this object are made to facilitate access to tweet and DM medias. Those methods are
described in the Dealing with medias part of the wiki, please refer to it in order to learn more about them.
In archives made between June 2019 and December 2019, media files were zipped inside the archive.
In this case, twitter-archive-reader
must extract the ZIP from the original archive to read its content.
This cause a huge overhead when first accessing selected media archive, and may be fatal for RAM-limited systems with very big archives.
An enumeration (MediaArchiveType
) is available to reference each supported folder by MediaArchive
.
Enumeration items are used with .get()
and .list()
methods.
enum MediaArchiveType {
SingleDM, GroupDM, Moment, Tweet, Profile
}
You can import it as a component twitter-archive-reader
package.
import { MediaArchiveType } from 'twitter-archive-reader';
Here's the folder list to enum reference.
-
direct_message_media
: MediaArchiveType.SingleDM -
direct_message_group_media
: MediaArchiveType.GroupDM -
tweets_media
: MediaArchiveType.Tweet -
profile_media
: MediaArchiveType.Profile -
moments_media
: MediaArchiveType.Moment
If other folders are present in the archive, they aren't accessible.
Classic archives are limited in informations.
- tweets:
archive.tweets.length
- created_at:
archive.generation_date
- lang: X
For every item in tweet index array:
File name and var name are unaccessible.
All the related information to each year/month to tweet count is located in archive.tweets.index
.
// Tweet count of 2019/08
const index = archive.tweets.index;
// Index is organized from years to months to tweets IDs
if (2019 in index) {
if (8 in index[2019]) {
const tweet_count = Object.keys(index[2019][8]).length;
}
}
- screen_name:
archive.user.screen_name
- location:
archive.user.location
- full_name:
archive.user.name
- bio:
archive.user.bio
- id:
archive.user.id
- created_at:
archive.user.created_at
Every .js
file contains tweets, organized per month.
To access tweets indexed per year and month, use archive.tweets.index
, as previously shown.
You can also use archive.tweets.month(month, year)
.