-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull request for Sift 47 #6
Conversation
Any idea why the build is failing? @estroz |
Not exactly sure, everything works fine on my machine. its the requirements installation, which is weird |
@@ -1,6 +1,18 @@ | |||
language: python | |||
python: 2.7 | |||
# Setup anaconda | |||
before_install: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow Travis is not happy with us
return df, feature_names | ||
|
||
# Initialize an LDA model object with 20 topics and 'online' learning method | ||
def init_and_fit_lda_(dataframe, num_topics=20, rand_state=1, learn_method='online'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to be explicit about num_topics
? How do we choose a good value for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runtime is O(NKV) where N = #docs, V = #words in vocabulary and K = #topics, so the runtime scales linearly with K and N/V kept relatively static. The more topics, the more granularity we'll see for similarity in meaning of the words in each topic-word set, i.e. the clarity of topic meaning, but longer runtime. I chose 20 because that's what all the examples have pointed to, but there's a good resource I found on choosing K (http://archive.is/KBGwt) which I need to go through.
print(sentimentResults) | ||
assert sentimentResults[0]== 45.23809523809524 | ||
assert sentimentResults[1] == 54.761904761904766 | ||
assert sentimentResults['pos_pct']== 45.23809523809524 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing ...pct'] == 45...
@@ -2,7 +2,7 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the name of this test to something more informative?
@@ -1,5 +1,5 @@ | |||
# Project-specific files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the siftnlp
folder from root if there isn't anything in it? Maybe replace with a lib
folder for functions common to multiple jobs?
EDIT never mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also consolidate data/*
and test/test_data/*
, unless there's a good reason to have both of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this LDA stuff testable in a useful way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we have curated data (hand-selected topics for pieces of feedback) I can set up tests for ranges of correctness.
polarityProp.append(negative_percentage) | ||
return(polarityProp) | ||
|
||
positive = sum(filter(lambda x: x > 0, polarity_scores)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 🐍
No description provided.