Pull request for Sift 47 #6

estroz · 2016-11-14T03:27:00Z

No description provided.

… associated test case

jordanschalm · 2016-11-14T04:36:42Z

Any idea why the build is failing? @estroz

estroz · 2016-11-14T06:21:36Z

Not exactly sure, everything works fine on my machine. its the requirements installation, which is weird
EDIT: there's an issue with numpy and scipy compilation on travis servers using pip install ..., and the workaround I used is https://gist.github.com/dan-blanchard/7045057

jordanschalm · 2016-11-15T02:19:26Z

.travis.yml

@@ -1,6 +1,18 @@
 language: python
 python: 2.7
+# Setup anaconda
+before_install:


Wow Travis is not happy with us

jordanschalm · 2016-11-15T02:22:35Z

jobrunner/jobs/lda_nlp.py

+    return df, feature_names
+
+# Initialize an LDA model object with 20 topics and 'online' learning method
+def init_and_fit_lda_(dataframe, num_topics=20, rand_state=1, learn_method='online'):


Do we need to be explicit about num_topics? How do we choose a good value for this?

The runtime is O(NKV) where N = #docs, V = #words in vocabulary and K = #topics, so the runtime scales linearly with K and N/V kept relatively static. The more topics, the more granularity we'll see for similarity in meaning of the words in each topic-word set, i.e. the clarity of topic meaning, but longer runtime. I chose 20 because that's what all the examples have pointed to, but there's a good resource I found on choosing K (http://archive.is/KBGwt) which I need to go through.

jordanschalm · 2016-11-15T02:25:40Z

tests/test_sentiment.py

-    print(sentimentResults)
-    assert sentimentResults[0]== 45.23809523809524
-    assert sentimentResults[1] == 54.761904761904766
+    assert sentimentResults['pos_pct']== 45.23809523809524


spacing ...pct'] == 45...

jordanschalm · 2016-11-15T02:26:15Z

tests/test_nlp.py

@@ -2,7 +2,7 @@
 import json


Can we change the name of this test to something more informative?

jordanschalm · 2016-11-15T02:28:01Z

.gitignore

@@ -1,5 +1,5 @@
 # Project-specific files


Can we remove the siftnlp folder from root if there isn't anything in it? Maybe replace with a lib folder for functions common to multiple jobs?

EDIT never mind

Can we also consolidate data/* and test/test_data/*, unless there's a good reason to have both of them

Is this LDA stuff testable in a useful way?

Once we have curated data (hand-selected topics for pieces of feedback) I can set up tests for ranges of correctness.

jordanschalm · 2016-11-15T02:30:28Z

jobrunner/jobs/sentiment.py

-    polarityProp.append(negative_percentage)
-    return(polarityProp)
-
+    positive = sum(filter(lambda x: x > 0, polarity_scores))


estroz added 4 commits November 13, 2016 16:05

sift-47 implemented LDA and added celery run() function to re_nlp and…

c41648b

… associated test case

Cleaned up sentiment job functions

c4649a8

sift-47 refactord LDA into celery job, fixed response dict formats

f85b071

sift-47 trimmed requirements.txt

7fb1cc7

estroz assigned jordanschalm and Siunami Nov 14, 2016

estroz added 4 commits November 13, 2016 22:26

Fixed pip installation bug

9cfc3d3

Retry fix for installation bug, is a known issue with travis CI servers

801ee90

Updated .travis.yml with apparent fix

d134d36

Updated conda path

473e988

jordanschalm requested changes Nov 15, 2016

View reviewed changes

sift-47 Renamed re_nlp test file, removed leftover test data

53da6c4

estroz merged commit ba90c3e into master Nov 16, 2016

estroz deleted the sift-47 branch November 16, 2016 06:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull request for Sift 47 #6

Pull request for Sift 47 #6

estroz commented Nov 14, 2016

jordanschalm commented Nov 14, 2016

estroz commented Nov 14, 2016 •

edited

Loading

jordanschalm Nov 15, 2016

jordanschalm Nov 15, 2016

estroz Nov 16, 2016

jordanschalm Nov 15, 2016

jordanschalm Nov 15, 2016

jordanschalm Nov 15, 2016 •

edited

Loading

jordanschalm Nov 15, 2016

jordanschalm Nov 15, 2016

estroz Nov 16, 2016

jordanschalm Nov 15, 2016

Pull request for Sift 47 #6

Pull request for Sift 47 #6

Conversation

estroz commented Nov 14, 2016

jordanschalm commented Nov 14, 2016

estroz commented Nov 14, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jordanschalm Nov 15, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

estroz commented Nov 14, 2016 •

edited

Loading

jordanschalm Nov 15, 2016 •

edited

Loading