REST Keyword crawler: Configuration of [languageFilter] #1

MonkandMonkey · 2019-02-13T02:41:28Z

I am using the [REST Keyword crawler], and I only want English tweets.
I did as the snippet in README.md, but I still get tweets of multi Langs. So I checked the file: "crawler.properties", and changed:
####################################################################################
# REST Cralwer of Twitter - by keyword(s)
# Class: org.backingdata.twitter.crawler.rest.TwitterRESTKeywordSearchCrawler
# - Full path of the txt file to read terms from (one term ID per line)
tweetKeyword.fullPathKeywordList=keywords.txt
# - Full path of the output folder to store crawling results
tweetKeyword.fullOutputDirPath=./data/
# - Storage format: "json" to store one tweet per line as tweet JSON object or "tab" to store
# one tweet per line as TWEET_IDTWEET_TEXT
tweetID.outputFormat=json
# - If not empty, it is possible specify a language to retrieve only tweet of a specific language
# (en, es, it, etc.) - if empty all tweet are retrieved, indipendently from their language
# IMPORTANT: The language code may be formatted as ISO 639-1 alpha-2 (en), ISO 639-3 alpha-3 (msa), or ISO 639-1 alpha-2 combined with an ISO 3166-1 alpha-2 localization (zh-tw).
tweetID.languageFilter=en

into:
####################################################################################
# REST Cralwer of Twitter - by keyword(s)
# Class: org.backingdata.twitter.crawler.rest.TwitterRESTKeywordSearchCrawler
# - Full path of the txt file to read terms from (one term ID per line)
tweetKeyword.fullPathKeywordList=keywords.txt
# - Full path of the output folder to store crawling results
tweetKeyword.fullOutputDirPath=./data/
# - Storage format: "json" to store one tweet per line as tweet JSON object or "tab" to store
# one tweet per line as TWEET_IDTWEET_TEXT
tweetKeyword.outputFormat=json
# - If not empty, it is possible specify a language to retrieve only tweet of a specific language
# (en, es, it, etc.) - if empty all tweet are retrieved, indipendently from their language
# IMPORTANT: The language code may be formatted as ISO 639-1 alpha-2 (en), ISO 639-3 alpha-3 (msa), or ISO 639-1 alpha-2 combined with an ISO 3166-1 alpha-2 localization (zh-tw).
tweetKeywod.languageFilter=en

And it worked!
Thanks for your great tool, which is useful and helped a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REST Keyword crawler: Configuration of [languageFilter] #1

REST Keyword crawler: Configuration of [languageFilter] #1

MonkandMonkey commented Feb 13, 2019 •

edited

Loading

REST Keyword crawler: Configuration of [languageFilter] #1

REST Keyword crawler: Configuration of [languageFilter] #1

Comments

MonkandMonkey commented Feb 13, 2019 • edited Loading

MonkandMonkey commented Feb 13, 2019 •

edited

Loading