You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the [REST Keyword crawler], and I only want English tweets.
I did as the snippet in README.md, but I still get tweets of multi Langs. So I checked the file: "crawler.properties", and changed:
####################################################################################
# REST Cralwer of Twitter - by keyword(s)
# Class: org.backingdata.twitter.crawler.rest.TwitterRESTKeywordSearchCrawler
# - Full path of the txt file to read terms from (one term ID per line)
tweetKeyword.fullPathKeywordList=keywords.txt
# - Full path of the output folder to store crawling results
tweetKeyword.fullOutputDirPath=./data/
# - Storage format: "json" to store one tweet per line as tweet JSON object or "tab" to store
# one tweet per line as TWEET_IDTWEET_TEXT
tweetID.outputFormat=json
# - If not empty, it is possible specify a language to retrieve only tweet of a specific language
# (en, es, it, etc.) - if empty all tweet are retrieved, indipendently from their language
# IMPORTANT: The language code may be formatted as ISO 639-1 alpha-2 (en), ISO 639-3 alpha-3 (msa), or ISO 639-1 alpha-2 combined with an ISO 3166-1 alpha-2 localization (zh-tw).
tweetID.languageFilter=en
into:
####################################################################################
# REST Cralwer of Twitter - by keyword(s)
# Class: org.backingdata.twitter.crawler.rest.TwitterRESTKeywordSearchCrawler
# - Full path of the txt file to read terms from (one term ID per line)
tweetKeyword.fullPathKeywordList=keywords.txt
# - Full path of the output folder to store crawling results
tweetKeyword.fullOutputDirPath=./data/
# - Storage format: "json" to store one tweet per line as tweet JSON object or "tab" to store
# one tweet per line as TWEET_IDTWEET_TEXT
tweetKeyword.outputFormat=json
# - If not empty, it is possible specify a language to retrieve only tweet of a specific language
# (en, es, it, etc.) - if empty all tweet are retrieved, indipendently from their language
# IMPORTANT: The language code may be formatted as ISO 639-1 alpha-2 (en), ISO 639-3 alpha-3 (msa), or ISO 639-1 alpha-2 combined with an ISO 3166-1 alpha-2 localization (zh-tw).
tweetKeywod.languageFilter=en
And it worked!
Thanks for your great tool, which is useful and helped a lot!
The text was updated successfully, but these errors were encountered:
I am using the [REST Keyword crawler], and I only want English tweets.
I did as the snippet in README.md, but I still get tweets of multi Langs. So I checked the file: "crawler.properties", and changed:
####################################################################################
# REST Cralwer of Twitter - by keyword(s)
# Class: org.backingdata.twitter.crawler.rest.TwitterRESTKeywordSearchCrawler
# - Full path of the txt file to read terms from (one term ID per line)
tweetKeyword.fullPathKeywordList=keywords.txt
# - Full path of the output folder to store crawling results
tweetKeyword.fullOutputDirPath=./data/
# - Storage format: "json" to store one tweet per line as tweet JSON object or "tab" to store
# one tweet per line as TWEET_IDTWEET_TEXT
tweetID.outputFormat=json
# - If not empty, it is possible specify a language to retrieve only tweet of a specific language
# (en, es, it, etc.) - if empty all tweet are retrieved, indipendently from their language
# IMPORTANT: The language code may be formatted as ISO 639-1 alpha-2 (en), ISO 639-3 alpha-3 (msa), or ISO 639-1 alpha-2 combined with an ISO 3166-1 alpha-2 localization (zh-tw).
tweetID.languageFilter=en
into:
####################################################################################
# REST Cralwer of Twitter - by keyword(s)
# Class: org.backingdata.twitter.crawler.rest.TwitterRESTKeywordSearchCrawler
# - Full path of the txt file to read terms from (one term ID per line)
tweetKeyword.fullPathKeywordList=keywords.txt
# - Full path of the output folder to store crawling results
tweetKeyword.fullOutputDirPath=./data/
# - Storage format: "json" to store one tweet per line as tweet JSON object or "tab" to store
# one tweet per line as TWEET_IDTWEET_TEXT
tweetKeyword.outputFormat=json
# - If not empty, it is possible specify a language to retrieve only tweet of a specific language
# (en, es, it, etc.) - if empty all tweet are retrieved, indipendently from their language
# IMPORTANT: The language code may be formatted as ISO 639-1 alpha-2 (en), ISO 639-3 alpha-3 (msa), or ISO 639-1 alpha-2 combined with an ISO 3166-1 alpha-2 localization (zh-tw).
tweetKeywod.languageFilter=en
And it worked!
Thanks for your great tool, which is useful and helped a lot!
The text was updated successfully, but these errors were encountered: