This repository contains source code for LLM generated synthetic evidence based Fact Verification Task.
Run generated_sentence_creation.py
to generate the LLM generated synthetic data and then convert that into .csv format using generation_pickle_to_csv_convert.py
file.
First create a .csv
file including claim_id
, claim
,label
, using the annotated FEVER .jsonl files.
Then run generated_sentence_creation.py
to generate the LLM generated synthetic data.
Using these two sets of .csv files(train, test and validation) run the create_filtered_data.py
dataset required in BERT_FSD model.
First create a .csv
file including claim_id
, claim
,label
, using the annotated FEVER .jsonl files.
Then run wiki_chunk_wise_data.py
for the required dataset of BERT_ER model.
For training run training_with_LLM_generated_synthetic_data.py
.
After training, you can find the best checkpoint on the dev set according to the evaluation results. For this use prediction.py
.