Skip to content
This repository has been archived by the owner on Jun 30, 2023. It is now read-only.

Error in make_query. Status code: 400 #180

Closed
shmuhammadd opened this issue Jul 3, 2021 · 25 comments
Closed

Error in make_query. Status code: 400 #180

shmuhammadd opened this issue Jul 3, 2021 · 25 comments
Labels
bug Something isn't working

Comments

@shmuhammadd
Copy link

shmuhammadd commented Jul 3, 2021

I run the code below to extract tweets with hashtag #BlackLivesMatter. But, it returns an error Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400.

I understand error 400 means bad request but the query is a verbatim copy from academictwitteR.

get_all_tweets(
    query = "#BlackLivesMatter",
    start_tweets = "2020-01-01T00:00:00Z",
    end_tweets = "2020-01-05T00:00:00Z",
    file = "blmtweets",
    data_path = "data/",
    n = 100,
    bearer_token = get_bearer()
  )

Expected behavior

Return the expected tweets as queried.

Session Info:


R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics 
[3] grDevices utils    
[5] datasets  methods  
[7] base     

other attached packages:
 [1] quanteda.textstats_0.94.1
 [2] quanteda.tidy_0.2        
 [3] quanteda_3.0.0           
 [4] forcats_0.5.1            
 [5] stringr_1.4.0            
 [6] dplyr_1.0.7              
 [7] purrr_0.3.4              
 [8] readr_1.4.0              
 [9] tidyr_1.1.3              
[10] tibble_3.1.2             
[11] ggplot2_3.3.4.9000       
[12] tidyverse_1.3.1          
[13] academictwitteR_0.2.0    
[14] goodshirt_0.2.2          

loaded via a namespace (and not attached):
 [1] rmsfact_0.0.3             
 [2] Rcpp_1.0.6                
 [3] stringdist_0.9.6.3        
 [4] lubridate_1.7.10          
 [5] lattice_0.20-44           
 [6] LexisNexisTools_0.3.4.9000
 [7] assertthat_0.2.1          
 [8] utf8_1.2.1                
 [9] cellranger_1.1.0          
[10] R6_2.5.0                  
[11] plyr_1.8.6                
[12] backports_1.2.1           
[13] reprex_2.0.0              
[14] nsyllable_1.0             
[15] httr_1.4.2                
[16] pillar_1.6.1              
[17] rlang_0.4.11              
[18] readxl_1.3.1              
[19] curl_4.3.2                
[20] rstudioapi_0.13           
[21] data.table_1.14.0         
[22] praise_1.0.0              
[23] Matrix_1.3-4              
[24] munsell_0.5.0             
[25] broom_0.7.8               
[26] modelr_0.1.8              
[27] compiler_4.0.3            
[28] xfun_0.24                 
[29] pkgconfig_2.0.3           
[30] tidyselect_1.1.1          
[31] emo_0.0.0.9000            
[32] fansi_0.5.0               
[33] withr_2.4.2               
[34] crayon_1.4.1              
[35] dbplyr_2.1.1              
[36] grid_4.0.3                
[37] jsonlite_1.7.2            
[38] gtable_0.3.0              
[39] lifecycle_1.0.0           
[40] DBI_1.1.1                 
[41] magrittr_2.0.1            
[42] scales_1.1.1              
[43] RcppParallel_5.1.4        
[44] cli_2.5.0                 
[45] stringi_1.6.2             
[46] pbapply_1.4-3             
[47] reshape2_1.4.4            
[48] fs_1.5.0                  
[49] cowsay_0.8.0              
[50] xml2_1.3.2                
[51] ellipsis_0.3.2            
[52] stopwords_2.2             
[53] fortunes_1.5-4            
[54] generics_0.1.0            
[55] vctrs_0.3.8               
[56] fastmatch_1.1-0           
[57] tools_4.0.3               
[58] glue_1.4.2                
[59] hms_1.1.0                 
[60] parallel_4.0.3            
[61] colorspace_2.0-2          
[62] rvest_1.0.0               
[63] haven_2.4.1               
[64] knitr_1.33                
[65] usethis_2.0.1.9000 

Thanks @cjbarrie for the amazing work.

Please, kindly advised.

Best,
Shamsuddeen

@hsakareem
Copy link

I am getting the same error. In my case, it was working perfectly till yesterday. I'm getting this error from this evening. Might be a problem at Twitter's end.

@DrorWalt
Copy link

DrorWalt commented Jul 3, 2021

Same issue here. Python works with no problem though and same bearer.

@shmuhammadd
Copy link
Author

I am getting the same error. In my case, it was working perfectly till yesterday. I'm getting this error from this evening. Might be a problem at Twitter's end.

Works fine for me also yesterday.

@justinchuntingho
Copy link
Collaborator

justinchuntingho commented Jul 3, 2021

Twitter just changed its API a few days ago, if a user requests context_annotations with the tweet.fields parameter (by default on), the fetch will be limited to 100 tweets per page (by default 500, hence the error). A quick workaround would be to add page_n = 100. We are working on a fix and will update soon.

@shmuhammadd
Copy link
Author

Twitter just changed its API a few days ago, if a user requests context_annotations with the tweet.fields parameter (by default on), the fetch will be limited to 100 tweets per page (by default 500). A quick workaround would be to add page_n = 100. We are working on a fix and will update soon.

Thanks for the response @justinchuntingho.

@justinchuntingho justinchuntingho added the bug Something isn't working label Jul 4, 2021
@jmwright432
Copy link

I'm having an issue on this front. I ran this code about 4-5 days ago with no issue--I was getting tweets scraped of upwards of 250,000 which was fantastic. Now I am getting this 400 error message and using page_n =100 obviously limits me to 100 tweets per page and maxing out my tweets at 100. Is there a workaround for this or is this package now limiting to that few of tweets?

@justinchuntingho
Copy link
Collaborator

Starting from #181, you should now be able to specify context_annotations = FALSE (also the default), in this case you will be able to fetch 500 tweets per page. We will try to push the patch to CRAN soon but at the mean time you could install the development version to use this.

@jmwright432
Copy link

This is the message I get with the following code:

tweets4 <- get_all_tweets(query=build_query("sanctuary cities OR sanctuary city",is_retweet=FALSE,lang="en"),start_tweets="2018-01-01T00:00:00Z",end_tweets="2018-01-03T00:00:00Z", bearer_token=bearer_token, data_path = "data6/", bind_tweets = TRUE, context_annotations=FALSE, page_n=500)
query: sanctuary cities OR sanctuary city -is:retweet lang:en
Total pages queried: 1 (tweets captured this page: 496).
Total tweets captured now reach 100 : finishing collection.

@chainsawriot
Copy link
Collaborator

chainsawriot commented Jul 4, 2021

@jmwright432 How about

city",is_retweet=FALSE,lang="en"),start_tweets="2018-01-01T00:00:00Z",end_tweets="2018-01-03T00:00:00Z", bearer_token=bearer_token, data_path = "data6/", bind_tweets = TRUE, context_annotations=FALSE, page_n=500, n = Inf)

You needa tune the n.

@natesheehan
Copy link

natesheehan commented Jul 4, 2021

Is there a workaround for this or is this package now limiting to that few of tweets?

From what I understand about the new twitter update and this package, you should still be able to mine > 100 tweets , it will just be much slower if you want context_annotations until an update @justinchuntingho ?

@chainsawriot
Copy link
Collaborator

chainsawriot commented Jul 4, 2021

I am not @justinchuntingho (I'm the quiet Beatle). But I can answer your question @natesheehan.

The update is there, now. You can install the Github version.

First thing first, you can get more than 100 tweets. You can get 1000 tweets in 5s, for example. The only change is that you won't get the context annotations, the things (e.g. topics, name entities) that Twitter extracted for you from tweets.

require(academictwitteR)
#> Loading required package: academictwitteR

start_time <- Sys.time()
x <- get_all_tweets(
  query = "#ichbinhanna",
  start_tweets = "2021-01-01T00:00:00Z",
  end_tweets = "2021-07-01T00:00:00Z",
  n = 1000
)
#> Warning: Recommended to specify a data path in order to mitigate data loss when
#> ingesting large amounts of data.
#> Warning: Tweets will not be stored as JSONs or as a .rds file and will only be
#> available in local memory if assigned to an object.
#> query:  #ichbinhanna 
#> Total pages queried: 1 (tweets captured this page: 500).
#> Total pages queried: 2 (tweets captured this page: 500).
#> Total tweets captured now reach 1000 : finishing collection.
end_time <- Sys.time()
end_time - start_time
#> Time difference of 4.990046 secs
nrow(x)
#> [1] 1000

Created on 2021-07-04 by the reprex package (v2.0.0)

If you need those context annotations, you need to specify it explicitly in your call to get_all_tweets. It will be slower also.

require(academictwitteR)
#> Loading required package: academictwitteR

start_time <- Sys.time()
x <- get_all_tweets(
  query = "#ichbinhanna",
  start_tweets = "2021-01-01T00:00:00Z",
  end_tweets = "2021-07-01T00:00:00Z",
  n = 1000,
  context_annotations = TRUE
)
#> Warning: Recommended to specify a data path in order to mitigate data loss when
#> ingesting large amounts of data.
#> Warning: Tweets will not be stored as JSONs or as a .rds file and will only be
#> available in local memory if assigned to an object.
#> page_n is limited to 100 due to the restriction imposed by Twitter API
#> query:  #ichbinhanna 
#> Total pages queried: 1 (tweets captured this page: 100).
#> Total pages queried: 2 (tweets captured this page: 100).
#> Total pages queried: 3 (tweets captured this page: 100).
#> Total pages queried: 4 (tweets captured this page: 100).
#> Total pages queried: 5 (tweets captured this page: 100).
#> Total pages queried: 6 (tweets captured this page: 100).
#> Total pages queried: 7 (tweets captured this page: 100).
#> Total pages queried: 8 (tweets captured this page: 100).
#> Total pages queried: 9 (tweets captured this page: 100).
#> Total pages queried: 10 (tweets captured this page: 100).
#> Total tweets captured now reach 1000 : finishing collection.
end_time <- Sys.time()
end_time - start_time
#> Time difference of 11.94927 secs
nrow(x)
#> [1] 1000

Created on 2021-07-04 by the reprex package (v2.0.0)

@jmwright432
Copy link

Thanks @chainsawriot adding the n=Inf worked. I’m getting closer to 250k tweets now which was what I was getting a few days ago. Clearly the syntax has changed in the code. Much appreciated!

@natesheehan
Copy link

natesheehan commented Jul 4, 2021

@chainsawriot hey quiet Beatle - great answer and thanks for this tip!

You needa tune the n.

Got that n tuned finely now! Many thanks @justinchuntingho for the speedy fix!

@shmuhammadd
Copy link
Author

Many thanks guys for fixing this. @justinchuntingho @chainsawriot you guys are amazing.

@helennguyen1312
Copy link

Hi @chainsawriot, I still have a problem with the status code: 400. Below is my code. Can you please tell me what I did wrong? I tried to add page_n=500 but it did not work. page_n = 100 worked, but I noticed that it took longer than a few days ago when the update had not happened yet.
tweets <-
get_all_tweets("paris accord",
"2018-07-01T00:00:00Z",
"2021-07-04T00:00:00Z",
BEARER_TOKEN,
lang = "en") (this one did not work)
I am a newbie so I am sorry if my question is not good.

@shmuhammadd
Copy link
Author

shmuhammadd commented Jul 5, 2021

Hi @chainsawriot, I still have a problem with the status code: 400. Below is my code. Can you please tell me what I did wrong? I tried to add page_n=500 but it did not work. page_n = 100 worked, but I noticed that it took longer than a few days ago when the update had not happened yet.
tweets <-
get_all_tweets("paris accord",
"2018-07-01T00:00:00Z",
"2021-07-04T00:00:00Z",
BEARER_TOKEN,
lang = "en") (this one did not work)
I am a newbie so I am sorry if my question is not good.

Hi @helennguyen1312 ,

You need to update the package. It is not yet push to CRAN, but you can install the dev version as shown below.

devtools::install_github("cjbarrie/academictwitteR", build_vignettes = TRUE)

This is is what works for me.

Best,
Shamsuddeen

@helennguyen1312
Copy link

@shmuhammad2004 Thank you so much! I got it now.
And many thanks to @justinchuntingho @chainsawriot for fixing the issue.

@AndreaaMarche
Copy link

AndreaaMarche commented Jul 5, 2021

Hi @chainsawriot sorry to bother you. I'm still having issues in getting tweets.
Firstly I create bearer_token and query objects. The query is the following:
query <- build_query( query = "blabla", is_retweet = FALSE, has_hashtags = TRUE, remove_promoted = TRUE)

Then, I try to get tweets with the following command:
try <- get_all_tweets( query = query, bearer_token, file = NULL, data_path = NULL, bind_tweets = TRUE, start_tweets = "2021-06-11T00:00:00Z", end_tweets = "2021-07-04T23:59:59Z", verbose= FALSE)

Such a command does not work. The error is the following
Errore in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400

I tried to introduce context_annotations = TRUE, I tried to introduce n = . In both cases, the command does not work (it does not recognize context_annotations as valid argument).

The command works only with page_n = 100. Yet, I need to scrape many more tweets. How can I solve this? Any tip?

Thank you all in advance for your great work and support.

@chainsawriot
Copy link
Collaborator

@AndreaaMarche Have you installed the latest Github version?

devtools::install_github("cjbarrie/academictwitteR", build_vignettes = TRUE) 

Can't reproduce your error.

require(academictwitteR)
#> Loading required package: academictwitteR
query <- build_query( query = "blabla", is_retweet = FALSE, has_hashtags = TRUE, 
                      remove_promoted = TRUE)

try <- get_all_tweets( query = query, file = NULL, data_path = NULL, 
                       bind_tweets = TRUE, start_tweets = "2021-06-11T00:00:00Z", 
                       end_tweets = "2021-07-04T23:59:59Z", verbose= FALSE, n = 2000)
nrow(try)
#> [1] 2000

Created on 2021-07-05 by the reprex package (v2.0.0)

@AndreaaMarche
Copy link

@chainsawriot It works if I do not specify bearer_token in the command. I do not understand why, but this was the issue in my case, and I had to use set_bearer: maybe it can be helpful for other users.

If possible, I would like to know the maximum n = I can specify. Thank you very much for your help!

@chainsawriot
Copy link
Collaborator

@AndreaaMarche Study count_all_tweets

@kobihackenburg
Copy link

Hi @chainsawriot sorry to bother you. I'm still having issues in getting tweets.
Firstly I create bearer_token and query objects. The query is the following:
query <- build_query( query = "blabla", is_retweet = FALSE, has_hashtags = TRUE, remove_promoted = TRUE)

Then, I try to get tweets with the following command:
try <- get_all_tweets( query = query, bearer_token, file = NULL, data_path = NULL, bind_tweets = TRUE, start_tweets = "2021-06-11T00:00:00Z", end_tweets = "2021-07-04T23:59:59Z", verbose= FALSE)

Such a command does not work. The error is the following
Errore in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400

I tried to introduce context_annotations = TRUE, I tried to introduce n = . In both cases, the command does not work (it does not recognize context_annotations as valid argument).

The command works only with page_n = 100. Yet, I need to scrape many more tweets. How can I solve this? Any tip?

Thank you all in advance for your great work and support.

Hi @chainsawriot! I'm having the same exact issue @AndreaaMarche had, but her solution is not working for me, as I never specified bearer token in the command to begin with. My query is as follows:

hillary_tweets <- get_all_tweets(users = c("HillaryClinton"), start_tweets = "2015-04-12T00:00:00Z", end_tweets = "2016-06-06T00:00:00Z", bind_tweets = TRUE, page_n = 500, n = Inf)

This gives me the 400 error:

Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400

I've installed the latest dev version of the package, but like @AndreaaMarche I can't introduce context_annotations = FALSE or n = without getting errors. I can only get it to work with page_n = 100, which quickly exceeds the rate limit. Any suggestions?

Thanks so much!

@chainsawriot
Copy link
Collaborator

@kobihackenburg can't reproduce

require(academictwitteR)
#> Loading required package: academictwitteR
hillary_tweets <- get_all_tweets(users = c("HillaryClinton"), start_tweets = "2015-04-12T00:00:00Z", end_tweets = "2016-06-06T00:00:00Z", bind_tweets = TRUE, page_n = 500, n = Inf)
#> Warning: Recommended to specify a data path in order to mitigate data loss when
#> ingesting large amounts of data.
#> Warning: Tweets will not be stored as JSONs or as a .rds file and will only be
#> available in local memory if assigned to an object.
#> query:   (from:HillaryClinton) 
#> Total pages queried: 1 (tweets captured this page: 496).
#> Total pages queried: 2 (tweets captured this page: 500).
#> Total pages queried: 3 (tweets captured this page: 499).
#> Total pages queried: 4 (tweets captured this page: 496).
#> Total pages queried: 5 (tweets captured this page: 486).
#> Total pages queried: 6 (tweets captured this page: 494).
#> Total pages queried: 7 (tweets captured this page: 494).
#> Total pages queried: 8 (tweets captured this page: 500).
#> Total pages queried: 9 (tweets captured this page: 491).
#> Total pages queried: 10 (tweets captured this page: 498).
#> Total pages queried: 11 (tweets captured this page: 497).
#> Total pages queried: 12 (tweets captured this page: 430).
#> This is the last page for  (from:HillaryClinton) : finishing collection.

Created on 2021-07-06 by the reprex package (v2.0.0)

I am using 0.2.1 a.k.a. the current Github version.

@justinchuntingho
Copy link
Collaborator

Hi @chainsawriot sorry to bother you. I'm still having issues in getting tweets.
Firstly I create bearer_token and query objects. The query is the following:
query <- build_query( query = "blabla", is_retweet = FALSE, has_hashtags = TRUE, remove_promoted = TRUE)

Then, I try to get tweets with the following command:
try <- get_all_tweets( query = query, bearer_token, file = NULL, data_path = NULL, bind_tweets = TRUE, start_tweets = "2021-06-11T00:00:00Z", end_tweets = "2021-07-04T23:59:59Z", verbose= FALSE)

Such a command does not work. The error is the following
Errore in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400

I tried to introduce context_annotations = TRUE, I tried to introduce n = . In both cases, the command does not work (it does not recognize context_annotations as valid argument).

The command works only with page_n = 100. Yet, I need to scrape many more tweets. How can I solve this? Any tip?

Thank you all in advance for your great work and support.

If you are supplying the arguments in the order they were defined, you need to name them, eg you need to state explicitly bearer_token = bearer_token (recommended), or put your arguments in the order they were defined get_all_tweets(query, start_tweets, end_tweets, bearer_token, ...).

@cjbarrie
Copy link
Owner

cjbarrie commented Jul 7, 2021

Patch v0.2.1 now on CRAN: ref. commit 49d0c7e

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests