NLAProject

Files Attached

Task 1A: NLA_Project_Task1A.ipynb
Task 1B: FinalTask1B.ipynb
Report : Team_22.pdf

Problem Statement

Task 1A

For each citance (i.e. a citation sentence that references the RP), identify the spans of text (cited text spans) in the RP that most accurately reflect the citance.

Task 1B

For each cited text span, identify what facet of the paper it belongs to, from a predefined set of facets that mainly are:

Method
Aim
Implication
Results
Hypothesis

Task 1A

Models

Supervised Model : For the two given inputs (citing text and the cited text) we considered building two identical networks sharing the same weights and then computing the manhattan distance between them which indicates the similarity between the two inputs.
Network Architecture :
- Input Layer
- Embedding Layer
- LSTM Layer
- Fully Connected Layer
- Manhattan Computation
UnSupervised Model : We performed semantic search. We used the BERT embeddings for computing the sentence embeddings and then took the cosine similarity between the citing sentence and every sentence of the reference paper. Among the sentences we considered top 3 sentences corresponding to maximum cosine similarity and formed the text span which is the most relevant to the citing sentence.

Sentence Embeddings

We experimented with different sentence Embeddings

Glove Embeddings We considered Glove pre-trained word embeddings to compute the sentence embeddings. For each sentence we computed the average of the embedding of all the word involved and formed a 300 d embedding for each sentence.
BERT Embeddings Bert embeddings produces word embeddings that are dynamically computed by the context of the sentence.
- Bert-Based-NLI-mean-tokens
- SciBert Model

Task 1B

Pipeline

Considering the skewed nature of data, we go with 2-layered classification pipeline. First we classify whether the given instance belong to the method facet or not. Then a multi-class classifier to identify one of the other facets in case it was previously classified as not-Method.

Features

Positional features : The sentence position in a paper can inform about the facet the sentence belongs to. We use two features based on the location of the sentence in the reference document: Sentence position: the position of the sentence in the reference paper. Section sentence position: the position of the sentence in the section.
BERT based sentence embeddings : We use BERT to extract features, namely sentence embedding vectors, from the reference sentences. BERT produces word representations that are dynamically informed by the words around them. We encode each reference sentence (identified cited text of the RP) into a vector of size of 768.

Resampling and Ensemble Techniques

The models we trained using raw skewed data were biased towards method facet. So we tried a few resampling and ensemble techniques, some of them are listed below :

Oversampling techniques : We used Synthetic Minority Oversampling Technique(SMOTE) to synthesize new examples from the existing examples.
Undersampling techniques : For this we tried CNN(Condensed Nearest Neighbour) and OSS(One-Sided Selection).
Ensemble Models : We tried boosting based techniques. We tried RUSBoostClassifier and EasyEnsembleClassifier.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
FinalTask1B.ipynb		FinalTask1B.ipynb
NLA_Project_Task1A.ipynb		NLA_Project_Task1A.ipynb
README.md		README.md
Team_22.pdf		Team_22.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLAProject

Files Attached

Problem Statement

Task 1A

Task 1B

Task 1A

Models

Sentence Embeddings

Task 1B

Pipeline

Features

Resampling and Ensemble Techniques

About

Releases

Packages

Languages

vanisancheti/NLAProject

Folders and files

Latest commit

History

Repository files navigation

NLAProject

Files Attached

Problem Statement

Task 1A

Task 1B

Task 1A

Models

Sentence Embeddings

Task 1B

Pipeline

Features

Resampling and Ensemble Techniques

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages