GitHub

This is the code repository for our paper "Preserve Integrity in Realtime Event Summarization", to appear in Transactions on Knowledge Discovery from Data.

As the limitation of LFS, the Glove pre-trained embedding dataset can be downloaded from Google Drive.

HID_Model

Requirements

Hardwares: a machine with two Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz, 256 GB main memory and a GeForce RTX 2080 Ti graphics card
OS: Ubuntu 18.04
Packages:
- python 3.6
- tensorflow 1.13.1-gpu
- keras 2.2.4
- numpy 1.16.2

Train

python HID_train.py

Get HID pre-trained parameters

python getInconsistentWeight.py

After the above steps, data/inconsistent_weight.npy and data/inconsistent_bias.npy are obtained for use in IAEA-Model.

Note: You can directly use these .npy file we provide in data/ folder to train IAEA-Model.

IAEA_Model

Requirements

Hardwares: a machine with two Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz, 256 GB main memory and a GeForce RTX 2080 Ti graphics card
OS: Ubuntu 18.04
Packages:
- python 3.5
- tensorflow 1.2.1-gpu
- py-readability-metrics

Note: you can use the command to start a tf1.2.1-gpu docker

docker run -itd --gpus all --name tf1.2 -v /:/workspace tensorflow/tensorflow:1.2.1-gpu-py3

docker exec -it tf1.2 /bin/bash

Data preprocess

python data/IAEA/make_datafiles_testdata.py data/IAEA/twitter_final/train

Run this command to get train dataset, to same to valid and test dataset.

Note:you can skip this step and then use the processed dataset of data/IAEA/finished_files_twitter folder

Extractor

Train

python main.py --model=selector --mode=train --data_path=data/IAEA/finished_files_twitter/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter/vocab --log_root=log_gpu_selector_lr001 --exp_name=exp_sample --max_art_len=110 --max_sent_len=50 --max_train_iter=1500 --batch_size=5 --save_model_every=500 --lr=0.01 --model_max_to_keep=25

Abstractor

Train

python main.py --model=rewriter --mode=train --data_path=data/IAEA/finished_files_twitter/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter/vocab --log_root=log_rewriter --exp_name=exp_sample --max_enc_steps=400 --max_dec_steps=100 --batch_size=5 --max_train_iter=5000 --save_model_every=1000 --model_max_to_keep=10 --use_temporal_attention=True --intradecoder=True --rl_training=False

Add reinforcement learning

python main.py --model=rewriter --mode=train --data_path=data/IAEA/finished_files_twitter/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter/vocab --log_root=log_rewriter --exp_name=exp_sample --batch_size=5 --max_train_iter=1000 --intradecoder=True --use_temporal_attention=True --eta=2.5E-05 --rl_training=True --convert_to_reinforce_model=True --max_enc_steps=400 --max_dec_steps=100 --save_model_every=100 --model_max_to_keep=10

python main.py --model=rewriter --mode=train --data_path=data/IAEA/finished_files_twitter/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter/vocab --log_root=log_rewriter --exp_name=exp_sample --batch_size=5 --max_train_iter=1000 --intradecoder=True --use_temporal_attention=True --eta=2.5E-05 --rl_training=True --max_enc_steps=400 --max_dec_steps=100 --save_model_every=100 --model_max_to_keep=10

Add coverage mechanism

python main.py --model=rewriter --mode=train --data_path=data/IAEA/finished_files_twitter/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter/vocab --log_root=log_rewriter --exp_name=exp_sample --batch_size=5 --max_train_iter=1000 --intradecoder=True --use_temporal_attention=True --eta=2.5E-05 --rl_training=True --max_enc_steps=400 --max_dec_steps=100 --save_model_every=100 --model_max_to_keep=10 --coverage=True --convert_to_coverage_model=True

python main.py --model=rewriter --mode=train --data_path=data/IAEA/finished_files_twitter/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter/vocab --log_root=log_rewriter --exp_name=exp_sample --batch_size=5 --max_train_iter=1000 --intradecoder=True --use_temporal_attention=True --eta=2.5E-05 --rl_training=True --max_enc_steps=400 --max_dec_steps=100 --save_model_every=100 --model_max_to_keep=10 --coverage=True

End2End

Train

python main.py --model=end2end --mode=train --data_path=data/IAEA/finished_files_twitter/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter/vocab --log_root=log_gpu_endlr0001 --exp_name=exp_sample --max_enc_steps=800 --max_dec_steps=120 --max_train_iter=10000 --batch_size=5 --use_temporal_attention=True --intradecoder=True --eta=2.5E-05 --max_art_len=110 --max_sent_len=50 --selector_loss_wt=5.0 --inconsistent_loss=True --inconsistent_topk=3 --save_model_every=1000 --model_max_to_keep=20 --rl_training=True --coverage=True --pretrained_selector_path=log_gpu_selector_lr001/selector/exp_sample/train/model.ckpt-500 --pretrained_rewriter_path=log_rewriter/rewriter/exp_sample/train/model.ckpt_cov-7000 --lr=0.001

Decode(output final summary)

python main.py --model=end2end --mode=evalall --data_path=data/IAEA/finished_files_twitter/chunked/test_* --vocab_path=data/IAEA/finished_files_twitter/vocab --log_root=log_gpu_endlr0001 --exp_name=exp_sample --max_enc_steps=800 --max_dec_steps=120 --use_temporal_attention=True --intradecoder=True --eta=2.5E-05 --max_art_len=110 --max_sent_len=50 --decode_method=beam --coverage=True --single_pass=1 --save_pkl=True --save_vis=False --inconsistent_loss=True --inconsistent_topk=3 --eval_method=loss --load_best_eval_model=False --coverage=True --rl_training=True --eval_ckpt_path=log_gpu_endlr0001/end2end/exp_sample/train/model.ckpt_cov-10000

IAEA_H(Abstractor使用IAEA的)

Extractor

python main_withoutHID.py --model=selector --mode=train --data_path=data/IAEA/finished_files_twitter_withoutHID/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter_withoutHID/vocab --log_root=log_gpu_selector_lr001_withoutHID --exp_name=exp_sample --max_art_len=110 --max_sent_len=50 --max_train_iter=1500 --batch_size=5 --save_model_every=500 --lr=0.01 --model_max_to_keep=25

End2End

Train

python main_withoutHID.py --model=end2end --mode=train --data_path=data/IAEA/finished_files_twitter_withoutHID/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter_withoutHID/vocab --log_root=log_gpu_endlr0001_withoutHID --exp_name=exp_sample --max_enc_steps=800 --max_dec_steps=120 --max_train_iter=10000 --batch_size=5 --use_temporal_attention=True --intradecoder=True --eta=2.5E-05 --max_art_len=110 --max_sent_len=50 --selector_loss_wt=5.0 --inconsistent_loss=True --inconsistent_topk=3 --save_model_every=1000 --model_max_to_keep=20 --rl_training=True --coverage=True --pretrained_selector_path=log_gpu_selector_lr001_withoutHID/selector/exp_sample/train/model.ckpt-500 --pretrained_rewriter_path=log_rewriter/rewriter/exp_sample/train/model.ckpt_cov-7000 --lr=0.001

Decode(output final summary)

python main_withoutHID.py --model=end2end --mode=evalall --data_path=data/IAEA/finished_files_twitter_withoutHID/chunked/test_* --vocab_path=data/IAEA/finished_files_twitter_withoutHID/vocab --log_root=log_gpu_endlr0001_withoutHID --exp_name=exp_sample --max_enc_steps=800 --max_dec_steps=120 --use_temporal_attention=True --intradecoder=True --eta=2.5E-05 --max_art_len=110 --max_sent_len=50 --decode_method=beam --coverage=True --single_pass=1 --save_pkl=True --save_vis=False --inconsistent_loss=True --inconsistent_topk=3 --eval_method=loss --load_best_eval_model=False --coverage=True --rl_training=True --eval_ckpt_path=log_gpu_endlr0001_withoutHID/end2end/exp_sample/train/model.ckpt_cov-10000

IAEA_R(Extractor使用IAEA的)

Abstractor

Train

python main.py --model=rewriter --mode=train --data_path=data/IAEA/finished_files_twitter_random/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter_random/vocab --log_root=log_rewriter_random --exp_name=exp_sample --max_enc_steps=400 --max_dec_steps=100 --batch_size=5 --max_train_iter=5000 --save_model_every=1000 --model_max_to_keep=10 --use_temporal_attention=True --intradecoder=True --rl_training=False

Add reinforcement learning

python main.py --model=rewriter --mode=train --data_path=data/IAEA/finished_files_twitter_random/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter_random/vocab --log_root=log_rewriter_random --exp_name=exp_sample --batch_size=5 --max_train_iter=1000 --intradecoder=True --use_temporal_attention=True --eta=2.5E-05 --rl_training=True --convert_to_reinforce_model=True --max_enc_steps=400 --max_dec_steps=100 --save_model_every=100 --model_max_to_keep=10

python main.py --model=rewriter --mode=train --data_path=data/IAEA/finished_files_twitter_random/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter_random/vocab --log_root=log_rewriter_random --exp_name=exp_sample --batch_size=5 --max_train_iter=1000 --intradecoder=True --use_temporal_attention=True --eta=2.5E-05 --rl_training=True --max_enc_steps=400 --max_dec_steps=100 --save_model_every=100 --model_max_to_keep=10

Add coverage mechanism

python main.py --model=rewriter --mode=train --data_path=data/IAEA/finished_files_twitter_random/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter_random/vocab --log_root=log_rewriter_random --exp_name=exp_sample --batch_size=5 --max_train_iter=1000 --intradecoder=True --use_temporal_attention=True --eta=2.5E-05 --rl_training=True --max_enc_steps=400 --max_dec_steps=100 --save_model_every=100 --model_max_to_keep=10 --coverage=True --convert_to_coverage_model=True

python main.py --model=rewriter --mode=train --data_path=data/IAEA/finished_files_twitter_random/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter_random/vocab --log_root=log_rewriter_random --exp_name=exp_sample --batch_size=5 --max_train_iter=1000 --intradecoder=True --use_temporal_attention=True --eta=2.5E-05 --rl_training=True --max_enc_steps=400 --max_dec_steps=100 --save_model_every=100 --model_max_to_keep=10 --coverage=True

End2End

Train

python main_gpu_withoutHID.py --model=end2end --mode=train --data_path=data/IAEA/finished_files_twitter_random/chunked/train_* --vocab_path=data/IAEA/finished_files_twitter_random/vocab --log_root=log_gpu_endlr0001_random --exp_name=exp_sample --max_enc_steps=800 --max_dec_steps=120 --max_train_iter=10000 --batch_size=5 --use_temporal_attention=True --intradecoder=True --eta=2.5E-05 --max_art_len=110 --max_sent_len=50 --selector_loss_wt=5.0 --inconsistent_loss=True --inconsistent_topk=3 --save_model_every=1000 --model_max_to_keep=20 --rl_training=True --coverage=True --pretrained_selector_path=log_gpu_selector_lr001/selector/exp_sample/train/model.ckpt-500 --pretrained_rewriter_path=log_rewriter_random/rewriter/exp_sample/train/model.ckpt_cov-7000 --lr=0.001

Decode(output final summary)

python main.py --model=end2end --mode=evalall --data_path=data/IAEA/finished_files_twitter_random/chunked/test_* --vocab_path=data/IAEA/finished_files_twitter_random/vocab --log_root=log_gpu_endlr0001_random --exp_name=exp_sample --max_enc_steps=800 --max_dec_steps=120 --use_temporal_attention=True --intradecoder=True --eta=2.5E-05 --max_art_len=110 --max_sent_len=50 --decode_method=beam --coverage=True --single_pass=1 --save_pkl=True --save_vis=False --inconsistent_loss=True --inconsistent_topk=3 --eval_method=loss --load_best_eval_model=False --coverage=True --rl_training=True --eval_ckpt_path=log_gpu_endlr0001_random/end2end/exp_sample/train/model.ckpt_cov-10000

Aknowledgement

The code of IAEA-Model is modified on the basis of unified-summarization and RLSeq2Seq.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.idea		.idea
HID-Model		HID-Model
IAEA-Model		IAEA-Model
data		data
evalution		evalution
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HID_Model

Requirements

IAEA_Model

Requirements

Data preprocess

Extractor

Abstractor

End2End

Decode(output final summary)

IAEA_H(Abstractor使用IAEA的)

Extractor

End2End

Decode(output final summary)

IAEA_R(Extractor使用IAEA的)

Abstractor

End2End

Decode(output final summary)

Aknowledgement

About

Releases

Packages

Languages

ZhichaoOuyang/IAEA

Folders and files

Latest commit

History

Repository files navigation

HID_Model

Requirements

IAEA_Model

Requirements

Data preprocess

Extractor

Abstractor

End2End

Decode(output final summary)

IAEA_H(Abstractor使用IAEA的)

Extractor

End2End

Decode(output final summary)

IAEA_R(Extractor使用IAEA的)

Abstractor

End2End

Decode(output final summary)

Aknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages