Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull request for Sift 47 #6

Merged
merged 9 commits into from
Nov 16, 2016
Merged

Pull request for Sift 47 #6

merged 9 commits into from
Nov 16, 2016

Conversation

estroz
Copy link
Contributor

@estroz estroz commented Nov 14, 2016

No description provided.

@jordanschalm
Copy link
Member

Any idea why the build is failing? @estroz

@estroz
Copy link
Contributor Author

estroz commented Nov 14, 2016

Not exactly sure, everything works fine on my machine. its the requirements installation, which is weird
EDIT: there's an issue with numpy and scipy compilation on travis servers using pip install ..., and the workaround I used is https://gist.github.com/dan-blanchard/7045057

@@ -1,6 +1,18 @@
language: python
python: 2.7
# Setup anaconda
before_install:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow Travis is not happy with us

return df, feature_names

# Initialize an LDA model object with 20 topics and 'online' learning method
def init_and_fit_lda_(dataframe, num_topics=20, rand_state=1, learn_method='online'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to be explicit about num_topics? How do we choose a good value for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime is O(NKV) where N = #docs, V = #words in vocabulary and K = #topics, so the runtime scales linearly with K and N/V kept relatively static. The more topics, the more granularity we'll see for similarity in meaning of the words in each topic-word set, i.e. the clarity of topic meaning, but longer runtime. I chose 20 because that's what all the examples have pointed to, but there's a good resource I found on choosing K (http://archive.is/KBGwt) which I need to go through.

print(sentimentResults)
assert sentimentResults[0]== 45.23809523809524
assert sentimentResults[1] == 54.761904761904766
assert sentimentResults['pos_pct']== 45.23809523809524
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing ...pct'] == 45...

@@ -2,7 +2,7 @@
import json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the name of this test to something more informative?

@@ -1,5 +1,5 @@
# Project-specific files
Copy link
Member

@jordanschalm jordanschalm Nov 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the siftnlp folder from root if there isn't anything in it? Maybe replace with a lib folder for functions common to multiple jobs?

EDIT never mind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also consolidate data/* and test/test_data/*, unless there's a good reason to have both of them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this LDA stuff testable in a useful way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have curated data (hand-selected topics for pieces of feedback) I can set up tests for ranges of correctness.

polarityProp.append(negative_percentage)
return(polarityProp)

positive = sum(filter(lambda x: x > 0, polarity_scores))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 🐍

@estroz estroz merged commit ba90c3e into master Nov 16, 2016
@estroz estroz deleted the sift-47 branch November 16, 2016 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants