WordNet

Create a Simple network of words related to each other using Twitter Streaming API.

Major parts of this project.

Streamer : ~/twitter_streaming.py
TF-IDF Gene : ~/wordnet/tf_idf_generator.py
NN words Gene :~/ wordnet/nn_words.py
NETWORK Gene : ~/wordnet/word_net.py

Using Streamer Functionality

Clone this repo and run on bash '$pip install -r requirements.txt' @ root directory and you will be ready to go..

Go to root-dir(~), Create a config.py file with details mentioned below:

# Variables that contains the user credentials to access Twitter Streaming API
# this link will help you(http://socialmedia-class.org/twittertutorial.html)
access_token = "xxx-xx-xxxx"
access_token_secret = "xxxxx"
consumer_key = "xxxxxx"
consumer_secret = "xxxxxxxx"

run Streamer with an array of filter words that you want to fetch tweets on. eg. $python twitter_streaming.py hello hi hallo namaste > data_file.txt this will save a line by line words from tweets filtered according to words used as args in data_file.txt.

Using WordNet Module

Clone this repo and install wordnet module using this script,
```
 $python setup.py install
```

To create a TF-IDF structure file for every doc, use:

from wordnet import find_tf_idf

df, tf_idf = find_tf_idf(
file_names=['file/path1','file/path2',..],       # paths of files to be processed.(create using twitter_streamer.py)
prev_file_path='prev/tf/idf/file/path.tfidfpkl', # prev TF_IDF file to modify over, format standard is .tfidfpkl. default = None
dump_path='path/to/dump/file.tfidfpkl'           # dump_path if tf-idf needs to be dumped, format standard is .tfidfpkl. default = None
)

'''
if no file is provided prev_file_path parameter, new TF-IDF file will be generated ,and else
TF-IDF values will be combined with previous file, and dumped at dump_path if mentioned,
else will only return the new tf-idf list of dictionaries, and df dictionary.
'''

To use NN Word Gene of this module, simply use wordnet.find_knn:

from wordnet import find_knn

words = find_knn(
tf_idf=tf_idf,       # this tf_idf is returned by find_tf_idf() above.
input_word='german', # a word for which k nearest neighbours are required.
k=10,                # k = number of neighbours required, default=10
rand_on=True         # rand_on = either to randomly skip few words or show initial k words default=True
)

'''
This function will return a list of words closely related to provided input_word refering to
tf_idf var provided to it. either use find_tf_idf() to gather this var or pickle.load() a dump
file dumped by the same function at your choosen directory. the file contains 2 lists in format
(idf, tf_idf).
'''

To create a Word Network, use :

from wordnet import generate_net

word_net = generate_net(
df=df,                          # this df is returned by find_tf_idf() above.
tf_idf=tf_idf,                  # this tf_idf is returned by find_tf_idf() above.
dump_path='path/to/dump.wrnt'   # dump_path = path to dump the generated files, format standard is .wrnt. default=None
)

'''
this function returns a dict of Word entities, with word as key.
'''

To retrieve a Word Network, use :

from wordnet import retrieve_net

word_net = retrieve_net(
    'path/to/network.wrnt' # path to network file, format standard is .wrnt.
    )
'''
this function returns a dictionary of Word entities, with word as key.
'''

To retrieve list of words that are at some depth form a root word in the network, use:

from wordnet import return_net

words = return_net(
    word,       # root word in this process.
    word_net,   # word network generated from generate_net()
    depth=1    # depth to which you wish this word collector to traverse.
)
'''
This function returns a list of words that are at provided depth from root word in the
network provided.
'''

Test Run

To run a formal test, simply run this script. python test.py, this module will return 0 if everythinig worked as expected.

test.py uses sample data provided here and executes unittest on find_tf_idf(), find_knn() & generate_net().

Streamer functionality will not be provided under distribution of this code. That is just a script independent from the module.

Contributions Are welcomed here

by @Anurag

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
icons		icons
test		test
wordnet		wordnet
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
requirements.txt		requirements.txt
setup.py		setup.py
test.py		test.py
twitter_streaming.py		twitter_streaming.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WordNet

Using Streamer Functionality

Using WordNet Module

Test Run

Contributions Are welcomed here

About

Releases

Packages

Contributors 2

Languages

License

anuragkumarak95/wordnet

Folders and files

Latest commit

History

Repository files navigation

WordNet

Using Streamer Functionality

Using WordNet Module

Test Run

Contributions Are welcomed here

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages