ML4HC_Project3

Project 3 - Natural language processing assignment

Preprocessing: The preprocessing file loads the data from the relevant local path which needs to be assigned. I am not sure how you would do integrated in github.

The data performs a preprocessing function that consecutively:

Selects the relevant sentences in the text.
splits the label, from the sentence data.
cleans the data with specific succesive tasks

The cleaning includes some features that I assumed could be important such as:

Including specific symbols as text.
Handle dashes in a certain way.
Define any number as an 'integer', 'float', or a 'fractions' - This could be extended but not sure if relevant
Remove any single letter words.

Things that are not taken into accout:

How to deal with abbreviations
How to deal with scientific parameter names (mm, kg, cc, ml, etc)
Lemmatization is included in the code but havent tested it because I could not load the relavant nltk library
stemming - because I am not sure if it is relevant for this classification task.

Tokenization is performed as well, not sure if this is part of the preprocessing or not. There are some decisions to make here such as the num_words. I have not performed padding because not sure if this is necessary.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.ipynb_checkpoints		.ipynb_checkpoints
logs/fc_w2v_model_20210517-214754/train		logs/fc_w2v_model_20210517-214754/train
trained_models		trained_models
.gitignore		.gitignore
Notebook_Frederic.ipynb		Notebook_Frederic.ipynb
README.md		README.md
Robert_final.ipynb		Robert_final.ipynb
Untitled.ipynb		Untitled.ipynb
all.ipynb		all.ipynb
baseline_with_preprocess.ipynb		baseline_with_preprocess.ipynb
baseline_without_preprocess.ipynb		baseline_without_preprocess.ipynb
functions.py		functions.py
notebook_Liine.ipynb		notebook_Liine.ipynb
preprocessing.py		preprocessing.py
word2vec.py		word2vec.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML4HC_Project3

About

Releases

Packages

Contributors 4

Languages

ropertool/ML4HC_Project3

Folders and files

Latest commit

History

Repository files navigation

ML4HC_Project3

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages