Temporal Popularity Image Collection (TPIC) is a large-scale and social media popularity prediction dataset including 680K social media posts with images from anonymized users of Flickr.com and their photo-sharing records range of 3 years. Meanwhile, TPIC is a multi-faceted social media dataset, which consists of photo images, user profiles, and photo metadata. We provide the rescaled and normalized popularity scores based on the view count of each online post. In order to protect the privacy of users and their sharing behaviors, we anonymised user and post identification and converted post timestamps to time segments with integer indexes.
- user&photo metadata (4.5MB)
- photo urls(6MB)
- time flag(2.5MB)
- labels(3MB)
Each row of data has a unique photo id (pid) along with user id (uid). All the CSV files listed above have data header that demonstrate the the meaning of the column.
The file organization inside the file contains picture id, user id, comment count, has people, title length, description length, tag count, average view, group count, average member count information:
pid uid commentcount haspeople titlelen deslen tagcount avgview groupcount avgmembercount
...
304582 50@N31 0 0 15 0 14 199.32 1188 6601
304592 142@N94 0 0 11 9 0 615.61 67 21637
...
The data is collected from Flickr, all user ids or photo ids are anonymized.
Data organized inside the file are the phtoto urls correspond to given photo id and user id:
pid uid url
...
9624 25@N92 https://www.flickr.com/photos/7626362@N07/1251837061
665085 275@N38 https://www.flickr.com/photos/7690920@N06/863366976
...
In order to use temporal information from dataset while protecting the user privacy, we extract year, month, day, and hour index with corresponding photo and user from dataset:
pid uid year month day hour_index
...
311862 11@N30 2007 3 16 4
311863 89@N59 2007 3 16 4
...
The definition of hour index is defined below:
-
Hour Index
-
0: 2am-6am
-
1: 6am-10am
-
2: 10am-2pm
-
3: 2pm-6pm
-
4: 6pm-10pm
-
5: 10pm-2am
The label file contains the popularity (log-views), picture id with associate user id:
pid uid logview
...
9624 25@N92 3.2
665085 275@N38 2.3
...
@inproceedings{Wu2017DTCN,
title={Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks},
author={Wu, Bo and Cheng, Wen-Huang and Zhang, Yongdong and Qiushi, Huang and Jintao, Li and Mei, Tao},
booktitle={IJCAI},
year={2017}
}
-
Bo Wu, Tao Mei, Wen-Huang Cheng, and Yongdong Zhang, Unfolding Temporal Dynamics: Predicting Social Media Popularity Using Multi-scale Temporal Decomposition, In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI'16). AAAI Press 272-278, 12-17 February, 2016, Phoenix, USA.
-
Bo Wu, Wen-Huang Cheng, Yongdong Zhang, and Tao Mei. Time Matters: Multi-scale Temporalization of Social Media Popularity. In Proceedings of the 2017 ACM on Multimedia Conference (ACM MM '17). ACM, New York, NY, USA, 1336-1344
-
Bo Wu, Wen-Huang Cheng, Peiye Liu, Bei Liu, Zhaoyang Zeng, Jiebo Luo. SMP Challenge: An Overview of Social Media Prediction Challenge 2019, In Proceedings of the 27th ACM International Conference on Multimedia (ACM MM), 2019.