Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with the data reader fetching yahoo finance #315

Closed
Crowbeezy opened this issue May 9, 2017 · 66 comments · Fixed by #355
Closed

Issues with the data reader fetching yahoo finance #315

Crowbeezy opened this issue May 9, 2017 · 66 comments · Fixed by #355
Labels
Milestone

Comments

@Crowbeezy
Copy link

Crowbeezy commented May 9, 2017

Apologies first issue/comment on GitHub. I will review proper protocol. Please correct me if this is not the correct place to put this.


RemoteDataError Traceback (most recent call last)
in ()
4 end = dt.datetime(2017, 5, 8)
5
----> 6 INPX = data.DataReader(INPX ,'yahoo', start, end)
7
8 #Convert Volume from Int to Float

C:\Users\randomname\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas_datareader\data.py in DataReader(name, data_source, start, end, retry_count, pause, session)
92 adjust_price=False, chunksize=25,
93 retry_count=retry_count, pause=pause,
---> 94 session=session).read()
95
96 elif data_source == "yahoo-actions":

C:\Users\randomname\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas_datareader\yahoo\daily.py in read(self)
75 def read(self):
76 """ read one data from specified URL """
---> 77 df = super(YahooDailyReader, self).read()
78 if self.ret_index:
79 df['Ret_Index'] = _calc_return_index(df['Adj Close'])

C:\Users\randomname\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas_datareader\base.py in read(self)
176 df = self._dl_mult_symbols(self.symbols.index)
177 else:
--> 178 df = self._dl_mult_symbols(self.symbols)
179 return df
180

C:\Users\randomname\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas_datareader\base.py in _dl_mult_symbols(self, symbols)
195 if len(passed) == 0:
196 msg = "No data fetched using {0!r}"
--> 197 raise RemoteDataError(msg.format(self.class.name))
198 try:
199 if len(stocks) > 0 and len(failed) > 0 and len(passed) > 0:

RemoteDataError: No data fetched using 'YahooDailyReader'

@rgkimball
Copy link
Contributor

Can you provide a sample that replicates your issue? This works for me:

start = datetime(2016, 12, 31)
end = datetime.now()
INPX = data.DataReader('INPX', 'yahoo', start, end)

@Crowbeezy
Copy link
Author

Crowbeezy commented May 15, 2017 via email

@benpillet
Copy link

From my requirements.txt:

pandas-datareader==0.4.0
pandas==0.20.1

and in python shell:

from datetime import *
import pandas_datareader.data as data
start = datetime(2016, 12, 31)
end = datetime.now()
INPX = data.DataReader('INPX', 'yahoo', start, end)

with error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "./venv3.5/lib/python3.5/site-packages/pandas_datareader/data.py", line 117, in DataReader
    session=session).read()
  File "./venv3.5/lib/python3.5/site-packages/pandas_datareader/yahoo/daily.py", line 77, in read
    df = super(YahooDailyReader, self).read()
  File "./venv3.5/lib/python3.5/site-packages/pandas_datareader/base.py", line 157, in read
    params=self._get_params(self.symbols))
  File "./venv3.5/lib/python3.5/site-packages/pandas_datareader/base.py", line 74, in _read_one_data
    out = self._read_url_as_StringIO(url, params=params)
  File "./venv3.5/lib/python3.5/site-packages/pandas_datareader/base.py", line 85, in _read_url_as_StringIO
    response = self._get_response(url, params=params)
  File "./venv3.5/lib/python3.5/site-packages/pandas_datareader/base.py", line 120, in _get_response
    raise RemoteDataError('Unable to read URL: {0}'.format(url))
pandas_datareader._utils.RemoteDataError: Unable to read URL: http://ichart.finance.yahoo.com/table.csv?f=2017&ignore=.csv&b=31&c=2016&g=d&a=11&d=4&s=INPX&e=16

I have a feeling yahoo updated their endpoint to be something else. I get a 502 when I try curl too. The link at https://finance.yahoo.com/quote/SPY/history?p=SPY points to https://query1.finance.yahoo.com/v7/finance/download/SPY?period1=1492372898&period2=1494964898&interval=1d&events=history&crumb=MLOX17FWABw

@benpillet
Copy link

Looks like there's also a cookie that needs to be sent in order to avoid a 401 Unauthorized. https://www.elitetrader.com/et/threads/yahoo-historical-data-did-they-change-the-url-recently.309554/

rgkimball added a commit to rgkimball/pandas-datareader that referenced this issue May 17, 2017
@rgkimball
Copy link
Contributor

Not sure if the icharts failure is a temporary problem, but I submitted a WIP PR (above) to replace the request structure. Even if icharts does come back online, may be a good idea to implement a backup.

@IvanTrendafilov
Copy link

IvanTrendafilov commented May 17, 2017

I don't have the time to fix this in the library, but, essentially, there is another API endpoint that one can use. It's https://query1.finance.yahoo.com. But it requires a matching cookie and crumb to use it. I wrote a little PhantomJS script to get it, whilst I was working on it: https://github.com/IvanTrendafilov/YahooFinanceAPITokens

It can be useful to someone who needs to create a URL that they want to query automatically.

You can also get a valid cookie / crumb combination from the Chrome dev tools in the Network tab.

@IvanTrendafilov
Copy link

They've confirmed icharts isn't coming back, so @rgkimball's patch should certainly go in.

https://forums.yahoo.net/t5/Yahoo-Finance-help/Is-Yahoo-Finance-API-broken/td-p/250503/page/3

@bkcollection
Copy link

@rgkimball @IvanTrendafilov , can the fix be pip install upgrade for the ease for beginner?

@Franlodo
Copy link

Yahoo has change the URL, and the way the use date. Now date are Unixtime.
For example to get historic cvs from AAPL :
https://query1.finance.yahoo.com/v7/finance/download/AAPL?period1=1492510098&period2=1495102098&interval=1d&events=history&crumb=ydacXMYhzrn

period1 or period2 is date in (unixtime = (Human time - 25568) * 86400) but you must check your timezone, for example my place is Europe, then I have UTC+2 and I must sustract 7200 seconds. So my formula is (Human time - 25568) * 86400) -7200; where Human time is the time (d/mm/yyyy), 25568 is the number of days from 01/01/1900 till 01/01/1970 (This is because i do it in Excel and this is the minimun date), 86400 are the seconds in a day and 7200 are the number of seconds in my 2 hours difference with UTC

Interval is day, week or month
Events history is historic data prices, div|split&filter=split, for splits and div|split&filter=div for dividens
Crumb is the cookie ... I don't really know how it works, but I have with the same since monday.

I'm using this for update my data in Excel and it works and now I don't need to wait until morning to get the historical data because it's available at less an hour after the market close (I´m talking about american markets)

I apologyze for not to be fluent in english.

I hope this help

@bkcollection
Copy link

bkcollection commented May 18, 2017

@Franlodo DO you try do download using the new link to download like 1000 stocks, will it get blocked? The old API seems has no limitation but I am curious if the new one still allow that. Hope you can try on it to validate

@Franlodo
Copy link

The link is to get the csv file in the web, it must run for 2 or for 2000; In my Excel file I have nearly 200 and run properly.

You can get 1000 of csv files and import from pandas, it will be the same made it saving files or "in the air"
I had post here because the error reported for pandas-datareader was the url

Anyway, I will try and comment.

@rgkimball
Copy link
Contributor

@bkcollection This will be available once the bugs are ironed out and the pull request is merged into the main repository.

@bkcollection
Copy link

@rgkimball how optimistic the bugs can be fixed?

rgkimball added a commit to rgkimball/pandas-datareader that referenced this issue May 19, 2017
@rgkimball
Copy link
Contributor

rgkimball commented May 19, 2017

@bkcollection The latest commits of #331 about wraps it up. It's a little frustrating that the new API drops out periodically, but you can now pull any historical price range, splits and dividends. I haven't found a new interface for Yahoo Options - this may be permanently removed, but I'm happy to implement if someone finds the endpoint. Same thing for index constituents.

All of the failing tests on my PR are due to Eurostat or Yahoo's API sporadically failing. Notice that there is inconsistency across different tests for which ones are failing - appears to be random. Hoping now some other people find time pull down the code and give it a thorough review before the maintainers make a decision on merging it in.

rgkimball added a commit to rgkimball/pandas-datareader that referenced this issue May 19, 2017
@bkcollection
Copy link

bkcollection commented May 19, 2017 via email

@gusutabopb
Copy link

@bkcollection:

For a temporary fix (until this PR gets merged), try:

$ git clone https://github.com/rgkimball/pandas-datareader
$ cd pandas-datareader
$ git checkout fix-yahoo
$ pip install -e .

On Python:

import pandas_datareader as pdr
print(pdr.__version__)  # Make sure it is '0.4.1'.

I originally wrote this as an answer to this Stackoverflow question

@helhadry
Copy link

@gusutabopb
The latest version is "0.4.0" not it?

@gusutabopb
Copy link

@Hmz123
The latest on master/pypi is 0.4.0. The PR proposed by @rgkimball makes it 0.4.1.
The above instructions are for those who do not want to / can't wait for an official upgrade. See the commits of the fix-yahoo patch branch here: https://github.com/rgkimball/pandas-datareader/commits/fix-yahoo

@bkcollection
Copy link

seems like still have some errors. Not sure when will be a fixed release 0.4.1. Hopefully in this week.

@arose13
Copy link

arose13 commented May 23, 2017

@bkcollection what do you mean by some errors?
Do you mean being blocked by yahoo if you request a handful of symbols?

@bkcollection
Copy link

@arose13 you have blocked for using too many symbols? How many is it?

@liuyigh
Copy link

liuyigh commented Jun 17, 2017

@javadba a python 2.7-compatible pull request was merged 5 days ago. FYI.

@aisthesis
Copy link

rgkimball's version 0.4.1 is still working for me. Is there a reason why it isn't being merged?

@jreback
Copy link
Contributor

jreback commented Jun 28, 2017

it needs to pass tests and respond to comments

@rgkimball
Copy link
Contributor

rgkimball commented Jun 28, 2017 via email

@jainraje
Copy link

@rgkimball if you can provide me instructions on how to install your version, i will run some tests and provide feedback and results. i'm currently using the fix-yahoo_finance developed by ranaroussi.

jreback pushed a commit that referenced this issue Jul 2, 2017
…stock still failing (#315)

Restores change necessary for Google to function
Fixes yahoo-actions per API endpoint update
Update regex pattern for crumbs, per heyuhere's review
'v' is no longer a valid interval value
Fixes Yahoo intervals and cases where the Yahoo cookie could not be extracted.
Implements multi-stock queries to Yahoo API
Adds a pause multiplier for subsequent requests from Yahoo, error handling for empty data requests, and updates some test logic for pandas 0.20.x (notably ix deprecation)
Check object type before checking contents
Replacement regex logic for additional Yahoo cookie token structures, per chris-b1
Improved error handling and refactoring test to best practices, per jreback review.

closes #315
@jreback jreback added this to the 0.5.0 milestone Jul 2, 2017
@jreback jreback added the bug label Jul 2, 2017
jreback added a commit that referenced this issue Jul 2, 2017
* Replaces ichart API for single-stock price exports from Yahoo, multi-stock still failing (#315)

Restores change necessary for Google to function
Fixes yahoo-actions per API endpoint update
Update regex pattern for crumbs, per heyuhere's review
'v' is no longer a valid interval value
Fixes Yahoo intervals and cases where the Yahoo cookie could not be extracted.
Implements multi-stock queries to Yahoo API
Adds a pause multiplier for subsequent requests from Yahoo, error handling for empty data requests, and updates some test logic for pandas 0.20.x (notably ix deprecation)
Check object type before checking contents
Replacement regex logic for additional Yahoo cookie token structures, per chris-b1
Improved error handling and refactoring test to best practices, per jreback review.

closes #315

* better error handling after get_response

* docs for 0.5.0

* remove deprecation warnings: ix usage -> loc/iloc
remove deprecation warnings: sortlevel usage -> sort_index

* more resource cleaning

* update changelog

* skip enigma tests locally if no api key

* fixturize test_yahoo_options

* add in test.sh script

* CI: use trusty dist
@m3nu
Copy link

m3nu commented Jul 15, 2017

@Harrymon12 is a paid troll that spams about "MarketXLS" all over the internet. Here some examples of his "work":

Alias he uses to help find this post quickly: Harrison Delfino

@pydata pydata deleted a comment from Harrymon12 Jul 15, 2017
@Harrymon12
Copy link

Harrymon12 commented Jul 16, 2017

@m3nu Ahw. Oh really? I am just giving the facts about MarketXLS.
What is the problem about that?
You can check their forum if it really was a spam.
How can you say it is spam when you did not use MarketXLS yet?
Are you even normal or just a poser?

@javadba
Copy link

javadba commented Jul 16, 2017

@Harrymon12 This is a pandas site. If you have some tips about helping out on PANDAS and specifically this issue please feel free to do so. Otherwise your posts ARE spam.

@Harrymon12
Copy link

@javadba I understand. Thank you. :)

@alisiddiq
Copy link

Another small package I wrote to overcome the 401 issues

https://github.com/alisiddiq/py_yahoo_prices

@JECSand
Copy link

JECSand commented Aug 14, 2018

@javadba
@justinlent
@rgkimball

My replacement for the old yahoo-finance module, YahooFinancials, can get all of the historical price data needed by pandas' users. YF can return daily, weekly, and monthly historical price and volume JSON data for all stocks, ETFs, indices, cryptocurrencies, currencies, and commodity futures available on Yahoo Finance. Most Stackoverflow questions I have encountered regarding the module seem to revolve around getting it to work with Pandas. As long as Yahoo Finance is running on it's new React setup (why they killed the old API late last year, they got a new web app), my module will get the financial data.

Usage Example:

from yahoofinancials import YahooFinancials

yahoo_financials = YahooFinancials('WFC')
print(yahoo_financials.get_historical_price_data("2018-07-10", "2018-08-10", "monthly"))

Returns

{
    "WFC": {
        "currency": "USD",
        "eventsData": {
            "dividends": {
                "2018-08-01": {
                    "amount": 0.43,
                    "date": 1533821400,
                    "formatted_date": "2018-08-09"
                }
            }
        },
        "firstTradeDate": {
            "date": 76233600,
            "formatted_date": "1972-06-01"
        },
        "instrumentType": "EQUITY",
        "prices": [
            {
                "adjclose": 57.19147872924805,
                "close": 57.61000061035156,
                "date": 1533096000,
                "formatted_date": "2018-08-01",
                "high": 59.5,
                "low": 57.08000183105469,
                "open": 57.959999084472656,
                "volume": 138922900
            }
        ],
        "timeZone": {
            "gmtOffset": -14400
        }
    }
}

Anyway I'd be happy to fork a branch and build the price data from YahooFinancials into the panda-datareader's get_data_yahoo() method if you all want. I'd also be happy to work with one of your contributors to do so as well. Just let me know and I'd be happy to help!

More details at:
https://github.com/JECSand/yahoofinancials

@magicmathmandarin
Copy link

Hi, Yahoo Finance is working for me. I am confused as to why you guys saying it is deprecated?

@GrechTsangWL
Copy link

GrechTsangWL commented Apr 5, 2019 via email

@bashtage
Copy link
Contributor

bashtage commented Apr 5, 2019 via email

@magicmathmandarin
Copy link

magicmathmandarin commented Apr 5, 2019 via email

@abdoulayegk
Copy link

Can anyone help me fix this error

NotImplementedError: data_source=datetime.datetime(2015, 1, 1, 0, 0) is not implemented
Traceback:
File "/home/balde/.local/lib/python3.8/site-packages/streamlit/ScriptRunner.py", line 322, in _run_script
exec(code, module.dict)
File "/home/balde/Desktop/Projects/StockProject/stock_app.py", line 14, in
globals()[stock] = DataReader(stock,start, end)
File "/home/balde/.local/lib/python3.8/site-packages/pandas/util/_decorators.py", line 214, in wrapper
return func(*args, **kwargs)
File "/home/balde/.local/lib/python3.8/site-packages/pandas_datareader/data.py", line 376, in DataReader
raise NotImplementedError(msg)

@krb1971
Copy link

krb1971 commented Sep 5, 2020

I recently started to learn python for finance. From this thread, I understood at some level the yahoo fin package related issues but while I started to run the following code, 'ticker' not found error is persistent. Could you please check it and guide me further?

Here are the libraries used and the code:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Bidirectional
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from yahoo_fin import stock_info as si
from collections import deque

import numpy as np
import pandas as pd
import random

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import matplotlib.dates as mdates
import numpy as np

def load_data(ticker, n_steps=50, scale=True, shuffle=True, lookup_step=1,
test_size=0.2, feature_columns=['adjclose', 'volume', 'open', 'high', 'low']):
# see if ticker is already a loaded stock from yahoo finance
if isinstance(ticker, str):
# load it from yahoo_fin library
df = si.get_data(ticker)
elif isinstance(ticker, pd.DataFrame):
# already loaded, use it directly
df = ticker
# this will contain all the elements we want to return from this function
result = {}
# we will also return the original dataframe itself
result['df'] = df.copy()
# make sure that the passed feature_columns exist in the dataframe
for col in feature_columns:
assert col in df.columns, f"'{col}' does not exist in the dataframe."
if scale:
column_scaler = {}
# scale the data (prices) from 0 to 1
for column in feature_columns:
scaler = preprocessing.MinMaxScaler()
df[column] = scaler.fit_transform(np.expand_dims(df[column].values, axis=1))
column_scaler[column] = scaler

    # add the MinMaxScaler instances to the result returned
    result["column_scaler"] = column_scaler
# add the target column (label) by shifting by `lookup_step`
df['future'] = df['adjclose'].shift(-lookup_step)
# last `lookup_step` columns contains NaN in future column
# get them before droping NaNs
last_sequence = np.array(df[feature_columns].tail(lookup_step))
# drop NaNs
df.dropna(inplace=True)
sequence_data = []
sequences = deque(maxlen=n_steps)
for entry, target in zip(df[feature_columns].values, df['future'].values):
    sequences.append(entry)
    if len(sequences) == n_steps:
        sequence_data.append([np.array(sequences), target])
# get the last sequence by appending the last `n_step` sequence with `lookup_step` sequence
# for instance, if n_steps=50 and lookup_step=10, last_sequence should be of 59 (that is 50+10-1) length
# this last_sequence will be used to predict in future dates that are not available in the dataset
last_sequence = list(sequences) + list(last_sequence)
# shift the last sequence by -1
last_sequence = np.array(pd.DataFrame(last_sequence).shift(-1).dropna())
# add to result
result['last_sequence'] = last_sequence
# construct the X's and y's
X, y = [], []
for seq, target in sequence_data:
    X.append(seq)
    y.append(target)
# convert to numpy arrays
X = np.array(X)
y = np.array(y)
# reshape X to fit the neural network
X = X.reshape((X.shape[0], X.shape[2], X.shape[1]))
# split the dataset
result["X_train"], result["X_test"], result["y_train"], result["y_test"] = train_test_split(X, y, test_size=test_size, shuffle=shuffle)
# return the result
return result

load the data

data = load_data(ticker, N_STEPS, lookup_step=LOOKUP_STEP, test_size=TEST_SIZE, feature_columns=FEATURE_COLUMNS)

THIS GIVES FOLLOWING ERROR:

NameError Traceback (most recent call last)
in ()
1 # load the data
----> 2 data = load_data(ticker, N_STEPS, lookup_step=LOOKUP_STEP, test_size=TEST_SIZE, feature_columns=FEATURE_COLUMNS)
3
4 # save the dataframe
5 data["df"].to_csv(ticker_data_filename)

NameError: name 'ticker' is not defined

@rgkimball
Copy link
Contributor

load the data

data = load_data(ticker, N_STEPS, lookup_step=LOOKUP_STEP, test_size=TEST_SIZE, feature_columns=FEATURE_COLUMNS)

THIS GIVES FOLLOWING ERROR:

NameError Traceback (most recent call last)
in ()
1 # load the data
----> 2 data = load_data(ticker, N_STEPS, lookup_step=LOOKUP_STEP, test_size=TEST_SIZE, feature_columns=FEATURE_COLUMNS)
3
4 # save the dataframe
5 data["df"].to_csv(ticker_data_filename)

NameError: name 'ticker' is not defined

@krb1971,

The error you've shared doesn't have anything to do with the pandas datareader package. You need to define ticker before passing it into the function.

@krb1971
Copy link

krb1971 commented Sep 6, 2020

Thank you for responding. When I defined load_data as following, it did not ask for defining 'ticker'

def load_data(ticker, n_steps=50, scale=True, shuffle=True, lookup_step=1,
test_size=0.2, feature_columns=['adjclose', 'volume', 'open', 'high', 'low']):
....................

But when I use ticker below, it complains as "ticker" not defined. As I am novice to this, kindly guide.

data = load_data(ticker, N_STEPS, lookup_step=LOOKUP_STEP, test_size=TEST_SIZE, feature_columns=FEATURE_COLUMNS)

@Franlodo
Copy link

Franlodo commented Sep 6, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.