Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Added method to pandas.data.Options to download all option data for... #5602

Merged
merged 1 commit into from
Jun 17, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ performance improvements along with a large number of bug fixes.

Highlights include:

Experimental Features
~~~~~~~~~~~~~~~~~~~~~
- ``pandas.io.data.Options`` has a get_all_data method and now consistently returns a multi-indexed ''DataFrame'' (:issue:`5602`)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to v0.14.1.txt (or removed if it is already there)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its in v0.14.1.txt, I will remove it here.

See the :ref:`v0.14.1 Whatsnew <whatsnew_0141>` overview or the issue tracker on GitHub for an extensive list
of all API changes, enhancements and bugs that have been fixed in 0.14.1.

Expand Down
37 changes: 37 additions & 0 deletions doc/source/remote_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,43 @@ Yahoo! Finance
f=web.DataReader("F", 'yahoo', start, end)
f.ix['2010-01-04']

.. _remote_data.yahoo_Options:

Yahoo! Finance Options
----------------------
***Experimental***

The Options class allows the download of options data from Yahoo! Finance.

The ''get_all_data'' method downloads and caches option data for all expiry months
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use backticks ```` instead of ''? Then it renders as 'code'

and provides a formatted ''DataFrame'' with a hierarchical index, so its easy to get
to the specific option you want.

.. ipython:: python

from pandas.io.data import Options
aapl = Options('aapl', 'yahoo')
data = aapl.get_all_data()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails here (on conversions you need to protect with a try/except) in general. you prob need to wrap all of the float conversions with a ',' replacement (or better yet, don't convert them individually), let them be object dtype.
Then on columns that should be numeric (to avoid accidently changing other stuff), df[column].replace(',',''). Need to do this kind of check in a test as well.

ipdb> l
    523 
    524 def _unpack(row, kind):
    525     def _parse_row_values(val):
    526         ret = val.text_content()
    527         if 'neg_arrow' in val.xpath('.//@class'):
--> 528             ret = float(ret)*(-1.0)
    529         return ret
    530 
    531     els = row.xpath('.//%s' % kind)
    532     return [_parse_row_values(val) for val in els]
    533 

ipdb> p ret
'2,240.10'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, I had this issue in my code on the weekend. I did the replace - I'll push the update and add a test tonight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you suggest you do on ValueError here? Raise or return the string with an appended '-'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, you can try to replace the commas, then convert; on failure I would make it np.nan. If some values in general are string-like and some not then you are forced to leave it as object. However before u go down that road, see WHY its not converting; is it bogus data coming in or are misinterpreting the field (either case should make missing).

data.head()

#Show the $600 strike puts at all expiry dates:
data.loc[(600, slice(None), 'put'),:].head()

#Show the volume traded of $600 strike puts at all expiry dates:
data.loc[(600, slice(None), 'put'),'Vol'].head()

If you don't want to download all the data, more specific requests can be made.

.. ipython:: python

import datetime
expiry = datetime.date(2016, 1, 1)
data = aapl.get_call_data(expiry=expiry)
data.head()

Note that if you call ''get_all_data'' first, this second call will happen much faster, as the data is cached.


.. _remote_data.google:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works but I think that you need an example of how to slice this, because of this unless the Symbol is included in the index, then you can't slice it

This works

In [48]: data.set_index(['Symbol'],append=True).loc[(330,slice(None),'call'),:]
Out[48]: 
                                               Last  Chg  Bid  Ask  Vol  Open Int   Root IsNonstandard Underlying  Underlying_Price          Quote_Time
Strike Expiry     Type Symbol                                                                                                                          
330    2016-01-15 call AAPL160115C00330000   258.17    0  NaN  NaN    4        43   AAPL         False       AAPL            585.54 2014-05-09 04:00:00
                       AAPL7160115C00330000  270.00    0  NaN  NaN    5        21  AAPL7          True       AAPL            585.54 2014-05-09 04:00:00

[2 rows x 11 columns]

but simply slicing will not (though using .xs on a specific level will work as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code doesn't work for me, I get: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (0)'

What about data.loc[(330,slice(None), 'call')]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to do a df.sortlevel() on the created frame; always must be sorted to do any real indexing. Furthermore, I think the index should be ['Strike','Expiry','Type','Symbol'] as its completely unique and much more useful. Show a slicing example as well.


Google Finance
Expand Down
18 changes: 17 additions & 1 deletion doc/source/v0.14.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,23 @@ Performance
Experimental
~~~~~~~~~~~~

There are no experimental changes in 0.14.1
``pandas.io.data.Options`` has a get_all_data method and now consistently returns a multi-indexed ''DataFrame'' (PR `#5602`)
See :ref:`the docs<remote_data.yahoo_Options>` ***Experimental***

.. ipython:: python

from pandas.io.data import Options
aapl = Options('aapl', 'yahoo')
data = aapl.get_all_data()
data.head()

.. ipython:: python

from pandas.io.data import Options
aapl = Options('aapl', 'yahoo')
data = aapl.get_all_data()
data.head()


.. _whatsnew_0141.bug_fixes:

Expand Down
Loading