You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A group of kids is playing in a yard and an old man is standing in the background
A group of boys in a yard is playing and a man is standing in the background
NEUTRAL
4.5
A_neutral_B
B_neutral_A
A group of children playing in a yard, a man in the background.
A group of children playing in a yard, a man in the background.
FLICKR
FLICKR
TRAIN
2
A group of children is playing in the house and there is no man standing in the background
A group of kids is playing in a yard and an old man is standing in the background
NEUTRAL
3.2
A_contradicts_B
B_neutral_A
A group of children playing in a yard, a man in the background.
A group of children playing in a yard, a man in the background.
FLICKR
FLICKR
TRAIN
Description:
The dataset contains:
- two sentences, SentenceA and SentenceB
- thier relation to each other (entailment, contradiction, neutral)
- their relatedness score (sentA and sentB have a relatedness score of 3.4)
Dataset Reader:
importreimportosclassSickReader:
"""Reader object to provide training data from the SICK dataset More details can be found here : http://clic.cimec.unitn.it/composes/sick.html The dataset contains: - two sentences, SentenceA and SentenceB - thier relation to each other (entailment, contradiction, neutral) - their relatedness score (sentA and sentB have a relatedness score of 3.4) Example row in the dataset: pair_ID sentence_A sentence_B entailment_label relatedness_score entailment_AB entailment_BA sentence_A_original sentence_B_original sentence_A_dataset sentence_B_dataset SemEval_set 1 A group of kids is playing in a yard and an old man is standing in the background A group of boys in a yard is playing and a man is standing in the background NEUTRAL 4.5 A_neutral_B B_neutral_A A group of children playing in a yard, a man in the background. A group of children playing in a yard, a man in the background. FLICKR FLICKR TRAIN You can get the entailment data with labels with `get_entailment_data` You can get the relatedness data with labels with `get_relatedness_data` Parameters ---------- filepath : str path to folder with SICK.txt preprocess_fn : function function to preprocess sentences. If None, will use `self._preprocess_fn` """def__init__(self, filepath, preprocess_fn=None):
ifpreprocess_fn!=None:
self.preprocess_fn=preprocess_fnelse:
self.preprocess_fn=self._preprocess_fnSENTA_INDEX, SENTB_INDEX, ENTAILMENT_INDEX, RELATEDNESS_INDEX=1, 2, 3, 4SPLIT_INDEX=11self.sentenceA, self.sentenceB, self.entailment_label, self.relatedness_score= {}, {}, {}, {}
forsin [self.sentenceA, self.sentenceB, self.entailment_label, self.relatedness_score]:
s['TRAIN'], s['TEST'], s['TRIAL'] = [], [], []
self.entailment_label2index= {'CONTRADICTION': 0, 'ENTAILMENT': 1, 'NEUTRAL': 2}
splits= []
withopen(os.path.join(filepath, 'SICK.txt'), 'r') asf:
fori, lineinenumerate(f):
ifi==0:
continue# to skip the headerline=line[:-1]
split_line=line.split('\t')
self.sentenceA[split_line[SPLIT_INDEX]].append(self.preprocess_fn(split_line[SENTA_INDEX]))
self.sentenceB[split_line[SPLIT_INDEX]].append(self.preprocess_fn(split_line[SENTB_INDEX]))
self.entailment_label[split_line[SPLIT_INDEX]].append(self.entailment_label2index[split_line[ENTAILMENT_INDEX]])
self.relatedness_score[split_line[SPLIT_INDEX]].append(float(split_line[RELATEDNESS_INDEX]))
def_preprocess_fn(self, sent):
"""Utility function to lower, strip and tokenize each sentence(on spaces) Replace this function if you want to handle preprocessing differently Parameters ---------- sent : str The string sentence """returnre.sub("[^a-zA-Z0-9]", " ", sent.strip().lower()).split()
defget_entailment_data(self):
"""Returns data in the format: SentA, SentB, Entailment Label where each of them is a dict which contains the train, test and trial data in 'TRAIN', 'TEST' and 'TRIAL' keys respectively"""returnself.sentenceA, self.sentenceB, self.entailment_labeldefget_relatedness_data(self):
"""Returns data in the format: SentA, SentB, RelatednessScore where each of them is a dict which contains the train, test and trial data in 'TRAIN', 'TEST' and 'TRIAL' keys respectively"""returnself.sentenceA, self.sentenceB, self.relatedness_scoredefget_entailment_label_dict(self):
"""Returns the mapping from int to label(str) {'CONTRADICTION': 0, 'ENTAILMENT': 1, 'NEUTRAL': 2}"""returnself.entailment_label2index
The text was updated successfully, but these errors were encountered:
SICK(Sentences Involving Compositional Knowledge)
Dataset website: http://clic.cimec.unitn.it/composes/sick.html
Dataset download link: http://clic.cimec.unitn.it/composes/materials/SICK.zip
License:
Example:
Description:
The dataset contains:
- two sentences, SentenceA and SentenceB
- thier relation to each other (entailment, contradiction, neutral)
- their relatedness score (sentA and sentB have a relatedness score of 3.4)
Dataset Reader:
The text was updated successfully, but these errors were encountered: