Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Added method to pandas.data.Options to download all option data for... #5602

Merged
merged 1 commit into from
Jun 17, 2014

Conversation

davidastephens
Copy link
Contributor

... a ticker.

Also added a few helper functions. These functions could be applied to refactor some of the other methods.

@@ -871,3 +871,112 @@ def get_forward_data(self, months, call=True, put=False, near=False,
if len(ret) != 2:
raise AssertionError("should be len 2")
return ret

def get_all_data(self, call=True, put=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is call default to True and put to False? That is not intuitive to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither - I matched it with the other methods (get_forward_data and get_near_stock_price).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the other methods say they return a dict, but actually return either a dataframe (if just one of call or put is chosen) or a list of 2 dataframes with no identifier. I've set up this method to return the dict, but it probably maybe it should be consistent either way..

@davidastephens
Copy link
Contributor Author

I fixed the issues you pointed out that weren't related to consistency with the rest of the module.

Any thoughts on the consistency issues? I'm happy to refactor the other methods if you think that would be helpful.

@jasonstrimpel
Copy link

Hi, I don't know how to contribute changes but I noticed that the Chg column only has positive values. This is because yahoo only provides an class to display an up or down arrow to show the change. I modified the _unpack method to look inside the elements to find the appropriate class name and change the Chg to the appropriate sign.

def _unpack(row, kind):
    els = row.xpath('.//%s' % kind)
    res = []
    for i in els:
        img = i.xpath('.//img')
        if len(img)>0:
            cls=img[0].xpath('.//@class')[0]
            if cls == 'neg_arrow':
                res.append(float(i.text_content())*(-1.0))
            else:
                res.append(i.text_content())
        else:
            res.append(i.text_content())
    return res

@davidastephens
Copy link
Contributor Author

Nice catch strimp. How about this?

def _unpack(row, kind):
    els = row.xpath('.//%s' % kind)
    return [_parse_row_values(val) for val in els]


def _parse_row_values(val):
    ret = val.text_content()
    img = val.xpath('.//img')
    if img:
        if img[0].xpath('.//@class')[0] == 'neg_arrow':
            ret = float(ret)*(-1.0)
    return ret

@davidastephens
Copy link
Contributor Author

Or this:

def _unpack(row, kind):
    els = row.xpath('.//%s' % kind)
    return [_parse_row_values(val) for val in els]


def _parse_row_values(val):
    ret = val.text_content()
    if 'neg_arrow' in val.xpath('.//@class'):
        ret = float(ret)*(-1.0)
    return ret

@jasonstrimpel
Copy link

That works. Same results as my suggestion but much more compact and likely faster (avoids appending to a list). Thanks.

@jasonstrimpel
Copy link

Another issue I was having with lxml. In the _get_option_data and _get_expiry_months methods, I believe your original code uses lxml.html.parse directly on the set url variable. I was having issues with that (potentially proxy although my HTML_PROXY path variable is set).

I simply used urllib to solve in both methods:

    url = 'http://finance.yahoo.com/q/op?s={sym}'.format(sym=self.symbol)

    try:
        from lxml.html import parse
    except ImportError:
        raise ImportError("Please install lxml if you want to use the "
                          "{0!r} class".format(self.__class__.__name__))
    try:
        # i am failing the try when attempting to call parse(url)
        page = urllib.urlopen(url) # added this line
        doc = parse(page) # changed to pass page not url
    except _network_error_classes:
        raise RemoteDataError("Unable to parse expiry months from URL "
                              "{0!r}".format(url))

@davidastephens
Copy link
Contributor Author

Hmm. I don't know what the problem is there, I didn't write that code, works fine for me. I have a couple fixes for the _get_expiry_months. I'll add them.

@jasonstrimpel
Copy link

Probably some proxy issue then.

Sent from mobile

On Dec 11, 2013, at 15:36, dstephens99 [email protected] wrote:

Hmm. I don't know what the problem is there, I didn't write that code,
works fine for me. I have a couple fixes for the _get_expiry_months. I'll
add them.


Reply to this email directly or view it on
GitHubhttps://github.com//pull/5602#issuecomment-30366529
.

@mcneary70
Copy link

New to git, new to python, new to pandas. For that reason I'm only commenting here and if it is in the wrong place I apologize.

There is an issue in data.py where if you request option data for CUR_MONTH and CUR_YEAR+1 you will only get CUR_MONTH/CUR_YEAR options. This is due to an unnecessary check in _get_option_data() to see if the passed in month is CUR_MONTH and using the URL without additional date parameters if it is. If you just simply pass the month and year parameters in the URL every time this problem goes away.

@jreback
Copy link
Contributor

jreback commented Jan 14, 2014

@mcneary70

why don't you create a new issue for that, and show some code that can reprod the problem?

thanks

@jreback
Copy link
Contributor

jreback commented Jan 25, 2014

@dstephens99 if you can rebase and get this passing, and prob needs some more tests, we can consider for 0.14

@davidastephens
Copy link
Contributor Author

Sure, I will do that.

@mcneary70 I fixed that issue in my last commit.

@mcneary70
Copy link

@dstephens99 That is awesome, thanks!

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

pls rebase this in current master

need to update the doc strings as well

I am ok with changing the API here where it makes sense
eg best for all of these routines to return dataframes always (possibly multi level so u can hold put/calls) - but should always do the same thing

@davidastephens
Copy link
Contributor Author

Rebased - I'll work on this some more this week.

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

gr8

as I say popose a nice pandonic API for this

@jreback
Copy link
Contributor

jreback commented Mar 9, 2014

@dstephens99 update?

@jreback jreback added the Data IO label Apr 6, 2014
@jreback
Copy link
Contributor

jreback commented Apr 6, 2014

@dstephens99 this needs some API work, can you comment on that

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Apr 6, 2014
@cancan101
Copy link
Contributor

I think you can clean up some of the logic using something like:

    df["rootExp"] = df.Symbol.str[0:-9]
    df["root"] = df.rootExp.str[0:-6]
    df["exp"] = pd.to_datetime(df.rootExp.str[-6:])

You can define has_tick (which would be better is named "is_nonstandard" (meaning the deliverable of the option basket is not 100 shares) to be: df["root"]==symbol

@jreback
Copy link
Contributor

jreback commented Apr 22, 2014

@dstephens99 can you show the API?

@davidastephens
Copy link
Contributor Author

Sorry for the delay in responding, I've been really busy on a project at work.

What do you think about the API being a data frame if you request just calls or just puts and a dictionary of {'calls': call_df, 'puts' : put_df} if you request both? Pythonic?

@davidastephens
Copy link
Contributor Author

Never mind, I just saw your prior comment about always returning a dataframe. I will do it that way.

@jreback jreback mentioned this pull request May 10, 2014
10 tasks
.. _remote_data.yahoo_Options:

Yahoo! Finance Options
--------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this line the same length as the title (otherwise the doc generator will complain)

@jreback jreback modified the milestones: 0.14.1, 0.14.0 May 12, 2014
Bid: Bid price, float
Ask: Ask price, float
Vol: Volume traded, int64
Open Int: Open interest, int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this 'Open_Int' (for consistency with the rest); also don't really like spaces in column names (pls change in doc-tring in all places). FYI, you might want to make 1 doc-string, and then just assign it (e.g. func.__doc__ = doc_string)``

@cancan101
Copy link
Contributor

This provides an alternative source of options data: http://www.cboe.com/delayedquote/QuoteTableDownload.aspx

@cancan101
Copy link
Contributor

This is failing for me:

In [3]:

tsla = pandas.io.data.Options("TSLA", "yahoo")
In [4]:

tsla.get_all_data()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-7b3741a59d2e> in <module>()
----> 1 tsla.get_all_data()

/home/alex/git/pandas/pandas/io/data.py in get_all_data(self, call, put)
   1109                 except AttributeError:
   1110                     meth_name = 'get_{0}_data'.format(name[:-1])
-> 1111                     frame = getattr(self, meth_name)(expiry=month)
   1112 
   1113                 all_data.append(frame)

/home/alex/git/pandas/pandas/io/data.py in get_call_data(self, month, year, expiry)
    774         for the expiry of the options.
    775         """
--> 776         return self._get_option_data(month, year, expiry, 'calls')
    777 
    778     def get_put_data(self, month=None, year=None, expiry=None):

/home/alex/git/pandas/pandas/io/data.py in _get_option_data(self, month, year, expiry, name)
    702             tables = getattr(self, table_name)
    703         except AttributeError:
--> 704             tables = self._get_option_tables(month, year, expiry)
    705 
    706         ntables = len(tables)

/home/alex/git/pandas/pandas/io/data.py in _get_option_tables(self, month, year, expiry)
    685         #Get the time of the quote, note this is actually the time of the underlying price.
    686         quote_time_text = root.xpath('.//*[@class="time_rtq"]')[0].getchildren()[0].text
--> 687         split = quote_time_text.split(",")
    688         timesplit = split[1].strip().split(":")
    689         timestring = split[0] + ", " + timesplit[0].zfill(2) + ":" + timesplit[1]

AttributeError: 'NoneType' object has no attribute 'split'

@davidastephens
Copy link
Contributor Author

I referenced that bug above - its the difference in Quote_Time format between the weekend and the week. I'll fix that one tonight.

@davidastephens
Copy link
Contributor Author

The CBOE website has an interesting quote at the bottom re downloading quote information electronically..

@cancan101
Copy link
Contributor

Good call...

@jreback
Copy link
Contributor

jreback commented Jun 4, 2014

pls rebase this

@@ -55,6 +55,10 @@ performance improvements along with a large number of bug fixes.

Highlights include:

Experimental Features
~~~~~~~~~~~~~~~~~~~~~
- ``pandas.io.data.Options`` now has a get_all_data method and consistently returns a ''MultiIndex'' ''DataFrame'' (:issue:`5602`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to consistently return a multi-indexed DataFrame

@jreback
Copy link
Contributor

jreback commented Jun 13, 2014

can you rebase on master, and ping when ready

…for a ticker.

Also added a few helper functions.  These functions could be applied to refactor some of the other methods.

ENH: In Options.get_all_data: Now checking for any option tag (instead of just mini)

Changed expiry to datetime from string.
Added tests for tick functions.

BUG: Fixed no sign in change column of option download.

BUG: Fix bugs in Options class

Dealt with situation of calculating expiry when symbol contains a hyphen
Fixed bug in finding current expiry month.

BUG: Fixed Options.get_forward_data expiry date

Method assumed expiry date is the same for all option in a given month.
Not the case for options with weekly's.  Also breaks with options that
have tags.

BUG: Fixed Option bug that didn't allow LEAP DL in January.

Option class was checking only the month to determine if the requested
month was the current month.  Changed to check year and month.  Now
allows downloads of next years LEAPS's in January.

ENH: Added option tag and underlying price to option data output.

Factored out URL parsing and error checking from individual methods.

ENH: Refactor of Option class in io.data.

 Consistently returns multi-index data frame.
 Improves speed of downloading combination of calls and puts by only accessing yahoo once per expiry month.

CLN: Fix out of date docstrings in io.data.Options

Moved _parse_row_values definition into _unpack.

CLN: Consistent capitalization in output data.

CLN: Remove Tag, leave Root in data frame output.

CLN: Remove unnecessary _tag_from_root method.

BUG: Fix different capitalizations of Rootexp in _process_data.

TST: Update tests for pandas.data.Options

TST: Remove test for helper function that no longer exists.

TST: Fix option test for change in output

TST: Changes io.data.Options tests to self.assertTrue

TST: Change tests raise nose.SkipTests on remote data errors

TST: Change nose.SkipTest on RemoteDataError instead of IndexError

ENH: Added quote time to outputs of data.Options.

DOC: Added documentation for io.data.Options

DOC: Added documentation of data.Options output.

DOC: Updated docstrings on data.io.Options

DOC: Added experimental tags to io.data.Options docstrings/documentation.

BUG: Bug fixes, added tests, cleanups on documentation

TST: Fix test_data Options tests.

TST: Add test yahoo finance option pages.

DOC: Update example to show slicing.

TST: Remove test for long for python 3 compatibility.

BUG: Fix quote time scraper

TST: Changed the error raised by no tables in data.Options

Tests were failing if the scraper got the webpage but there weren't any tables in it.  Changed from IndexError to RemoteDataError so that nose would skip it on failure.

DOC: Moved reference to new Options method to v0.14.1.txt

DOC: Updated release at 0.14.1.txt for io.data.Options
@davidastephens
Copy link
Contributor Author

@jreback Rebased and green.

jreback added a commit that referenced this pull request Jun 17, 2014
ENH: Added method to pandas.data.Options to download all option data for...
@jreback jreback merged commit 70baac7 into pandas-dev:master Jun 17, 2014
@jreback
Copy link
Contributor

jreback commented Jun 17, 2014

@dstephens99 thanks!

pls look at the docs after they are built, may need to adjust (after travis finishes they will update): http://pandas-docs.github.io/pandas-docs-travis/remote_data.html

@jreback
Copy link
Contributor

jreback commented Jun 17, 2014

@dstephens99 in particular, I suspect the appl optinos will now fail (as the 600 strike price is now gone after the split).

@jreback
Copy link
Contributor

jreback commented Jun 17, 2014

docs are built.... @jorisvandenbossche any thoughts?

Experimental Features
~~~~~~~~~~~~~~~~~~~~~
- ``pandas.io.data.Options`` has a get_all_data method and now consistently returns a multi-indexed ''DataFrame'' (:issue:`5602`)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to v0.14.1.txt (or removed if it is already there)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its in v0.14.1.txt, I will remove it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants