wordfinder

1 clone this repository

git clone [email protected]:psd-project/wordfinder.git

If prompts you permission denied or enter your password or whatever, then you should follow this:

https://stackoverflow.com/questions/2643502/how-to-solve-permission-denied-publickey-error-when-using-git

last step you need to put your local "id_rsa. pub" file content to your "SSH keys" where it is in your gitlab Settings(top-right corner) and then you would see it left side: SSH keys.

By the way if you had any problem, please feel for free to contact me.

Also, you can git clone https://oauth2:[email protected]/psd-project/wordfinder.git

but it's not the recommended way.

Get Started

Current level we have a demo version following:

First, select English language

Second, enter the word: sink, then click "Find" button

Enjoy the demo! Only support to show.

2 product backlog

2.1 main functions of product

what's the main functionality of our product?

support multilingualism
enter a word and word's Part-Of-Speech, return corresponding sentences as fast as possible
Should then “cluster” those sentences into examples with related senses; Present to the user one or more “clusters” of example sentences
Must allow the user to examine, then change the number of clusters

2.2 Three main modules of big task

Database: gather and store text corpora in many languages in a way that makes queries of the type we want (word/part-of-speech lookup) fast and easy
Analysis: code to cluster example sentences containing given word; interesting machine learning approaches here that I’ll explain eventually!
Front end: simple, usable interface; must work on any platform, and should support messages/menu items in multiple languages

3 Sprint backlog

3.1 alpha version

what's the main functionality of the alpha version (deadline March 19)?

support at least two languages
finish design of database tables , including: word-tag-sentence table as language type, such as English table:

word_name	pos_tag	sentence
sink	NOUN	Don't just leave your dirty plates in the sink!
sink	VERB	The wheels started to sink into the mud.
sink	VERB	How could you sink so low?

Also, as fields above, we should train data and get tags of each word in our selected corpus and then put results to write into tables such as table called English_data, another table called Chinese_data, etc.
finish front-end disign, including available to any platform, supporting to enter word text box and supporting to select messages/menu items in multiple languages, etc.
a simple alogrithm to implement “cluster” functionality: sentences found by search into examples with related senses.
support users to change the number of clusters.

Tips:

in alpha version we don't need to care to much about the number of words and maybe one millon words are OK, but need to support at least two language
Note a possible little trick: sort table accordig to alphabeta order.
Preference choices for sql database is mysql
NOTE We only return the sentences that exactly contain searched word, such as sink rather than sunk and sinking, etc.
universal dependencies POS tag types:
- ADJ: adjective
- ADP: adposition
- ADV: adverb
- AUX: auxiliary
- CCONJ: coordinating conjunction
- DET: determiner
- INTJ: interjection
- NOUN: noun
- NUM: numeral
- PART: particle
- PRON: pronoun
- PROPN: proper noun
- PUNCT: punctuation
- SCONJ: subordinating conjunction
- SYM: symbol
- VERB: verb
- X: other
more important and useful links about how we develop this project have put at tmp folder

3.2 beta version

what's the main function of the beta version (deadline April 9 )?

to do

3.3 final version

what's the main function of the final version (deadline )?

to do

development tools

tips:

based on Dr.Scannell materials that contains important corpus we need, like UD , and tools for POS tag like UDpipe. Once we build some codes, then we can write data to our tables of database, which is very important.
Python as development language and web application
our repository: https://git.cs.slu.edu/psd-project/wordfinder/-/project_members
flask as the web framework as convenient
unit test

how assign to individual of group

TODO list

HERE we make development plans, dicuss them and pass them. Then we should followthese plans to start. If happing a problem in development, you should tell us in time and then we group should sovle it together before deadline.

2/16/2021 - 2/21/2021 TASKS

Sprint 1

Develop UI in any language
Obtain Corpus
Clean the Corpus(Tokenization, lemmatization and stemming)
Tag the data according the POS

Sprint 2

Discussion list:

discuss NLTK and UDpipe, key is multiple language support
corpus for 7-8 languages need to decide

3 load UDpipe pre-train model, then train our corpus of 2

4 let result write to our database, and core fields: word , POS tag, sentence

5 cluster sentences to get example sentences.

Done list:

User interface English corpora POS Tag

To do list:

Decide NLTK or CorPy Multilingual functionality Start writing to csv to build database structure

mysql

view all tables of a database, here called mysql database

select table_name from information_schema.tables where table_schema='mysql';

sprint 2: review

new features:

finish development of POS tag, based on udpipe pre-train model, available to multiple languages, including:

base_model.py
train_model.py
base data structure: result_model.py

finish application for database at hopper.slu.edu, which hosts our web servers and database store. Our train can be put on this server to keep running all time.
finish development for mysql store model, and the module is store.py

unfinished features:

corpus for many other languages
cluster

Sprint #3 planning

1 methods to get corpus for many languages
- 1.1 wikipedia : language abbreviation: https://zh.wikipedia.org/wiki/ISO_639-1
- 1.2 how to get via wikipedia https://jdhao.github.io/2019/01/10/two_chinese_corpus/

database, tables structures
- current tables structure
- wordpos table and sentence table
updating and cleaning the database all the time
add cluster function by word2vec the gensim library can do it
web interfaces update
add logging
test task, cleaning the database
deploy to hopper.slu.edu
alpha version release

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
corpus		corpus
sql		sql
src		src
tmp		tmp
.gitignore		.gitignore
README.md		README.md
big-task.xmind		big-task.xmind
questions.md		questions.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wordfinder

1 clone this repository

Get Started

2 product backlog

2.1 main functions of product

2.2 Three main modules of big task

3 Sprint backlog

3.1 alpha version

3.2 beta version

3.3 final version

development tools

how assign to individual of group

TODO list

Done list:

To do list:

mysql

view all tables of a database, here called mysql database

sprint 2: review

About

Releases

Packages

Languages

indraneelL/wordfinder

Folders and files

Latest commit

History

Repository files navigation

wordfinder

1 clone this repository

Get Started

2 product backlog

2.1 main functions of product

2.2 Three main modules of big task

3 Sprint backlog

3.1 alpha version

3.2 beta version

3.3 final version

development tools

how assign to individual of group

TODO list

Done list:

To do list:

mysql

view all tables of a database, here called mysql database

sprint 2: review

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages