https://nlp.stanford.edu/projects/tacred/#access
Convert TACRED to a list of instances per relation, you can use this script, on each of these train dev test dataset.
These three commands convert each of TACRED train/dev/test into a list of instances per relation type.
python convert_dataset_to_list_by_relation.py --dataset TACRED_raw_data/train.json --output_file TACRED_raw_data/instances_per_relation/TACRED_train.json
python convert_dataset_to_list_by_relation.py --dataset TACRED_raw_data/dev.json --output_file TACRED_raw_data/instances_per_relation/TACRED_dev.json
python convert_dataset_to_list_by_relation.py --dataset TACRED_raw_data/test.json --output_file TACRED_raw_data/instances_per_relation/TACRED_test.json
Convert these data partitions into Few-Shot dataset, in which the classes across partitions are disjoint.
This command utilizes our method of transforming supervised dataset into Few-Shot Learning dataset on TACRED.
python data_transformation.py --train_data TACRED_raw_data/instances_per_relation/TACRED_train.json --dev_data TACRED_raw_data/instances_per_relation/TACRED_dev.json --test_data TACRED_raw_data/instances_per_relation/TACRED_test.json --fixed_categories_split categories_split.json --test_size 10 --output_dir ./data_few_shot
voila, the new Few-Shot TACRED dataset, divided into train dev and test datasets.
python episodes_sampling_for_fs_TACRED.py --file_name [train/dev/test] --episodes_size [episodes_size] --N [N_way] --K [K_shot] --number_of_queries [number_of_test_instances] --seed [123] --output_file_name [output_file_name]
To create the test episodes benchmark, use this shell script: Creating five files of episodes with seed ranging from 160290 to 160294
Here is the shell command:
./create_test_episodes.sh
For each test episode file we generated an id file which composed of the episodes ids, you can use these files to verify that your generated test episodes are identical to our test episodes benchmark. These files are stored under the ids_of_episodes directory
Here is the command that generates the same downsampled training dataset as we used. If you choose to downsample the training data, apply this downsampling before generating episodes.
python downsample_train_data.py --dataset data_few_shot/_train_data.json --output_file data_few_shot/new_downsampled_train_data.json