-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Added method to pandas.data.Options to download all option data for... #5602
Conversation
@@ -871,3 +871,112 @@ def get_forward_data(self, months, call=True, put=False, near=False, | |||
if len(ret) != 2: | |||
raise AssertionError("should be len 2") | |||
return ret | |||
|
|||
def get_all_data(self, call=True, put=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is call default to True and put to False? That is not intuitive to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me neither - I matched it with the other methods (get_forward_data and get_near_stock_price).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the other methods say they return a dict, but actually return either a dataframe (if just one of call or put is chosen) or a list of 2 dataframes with no identifier. I've set up this method to return the dict, but it probably maybe it should be consistent either way..
I fixed the issues you pointed out that weren't related to consistency with the rest of the module. Any thoughts on the consistency issues? I'm happy to refactor the other methods if you think that would be helpful. |
Hi, I don't know how to contribute changes but I noticed that the Chg column only has positive values. This is because yahoo only provides an class to display an up or down arrow to show the change. I modified the _unpack method to look inside the elements to find the appropriate class name and change the Chg to the appropriate sign.
|
Nice catch strimp. How about this?
|
Or this:
|
That works. Same results as my suggestion but much more compact and likely faster (avoids appending to a list). Thanks. |
Another issue I was having with I simply used
|
Hmm. I don't know what the problem is there, I didn't write that code, works fine for me. I have a couple fixes for the _get_expiry_months. I'll add them. |
Probably some proxy issue then. Sent from mobile On Dec 11, 2013, at 15:36, dstephens99 [email protected] wrote: Hmm. I don't know what the problem is there, I didn't write that code, — |
New to git, new to python, new to pandas. For that reason I'm only commenting here and if it is in the wrong place I apologize. There is an issue in data.py where if you request option data for CUR_MONTH and CUR_YEAR+1 you will only get CUR_MONTH/CUR_YEAR options. This is due to an unnecessary check in _get_option_data() to see if the passed in month is CUR_MONTH and using the URL without additional date parameters if it is. If you just simply pass the month and year parameters in the URL every time this problem goes away. |
why don't you create a new issue for that, and show some code that can reprod the problem? thanks |
@dstephens99 if you can rebase and get this passing, and prob needs some more tests, we can consider for 0.14 |
Sure, I will do that. @mcneary70 I fixed that issue in my last commit. |
@dstephens99 That is awesome, thanks! |
pls rebase this in current master need to update the doc strings as well I am ok with changing the API here where it makes sense |
Rebased - I'll work on this some more this week. |
gr8 as I say popose a nice pandonic API for this |
@dstephens99 update? |
@dstephens99 this needs some API work, can you comment on that |
I think you can clean up some of the logic using something like:
You can define |
@dstephens99 can you show the API? |
Sorry for the delay in responding, I've been really busy on a project at work. What do you think about the API being a data frame if you request just calls or just puts and a dictionary of {'calls': call_df, 'puts' : put_df} if you request both? Pythonic? |
Never mind, I just saw your prior comment about always returning a dataframe. I will do it that way. |
.. _remote_data.yahoo_Options: | ||
|
||
Yahoo! Finance Options | ||
-------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this line the same length as the title (otherwise the doc generator will complain)
Bid: Bid price, float | ||
Ask: Ask price, float | ||
Vol: Volume traded, int64 | ||
Open Int: Open interest, int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this 'Open_Int' (for consistency with the rest); also don't really like spaces in column names (pls change in doc-tring in all places). FYI, you might want to make 1 doc-string, and then just assign it (e.g. func.__doc__ = doc_string
)``
This provides an alternative source of options data: http://www.cboe.com/delayedquote/QuoteTableDownload.aspx |
This is failing for me:
|
I referenced that bug above - its the difference in Quote_Time format between the weekend and the week. I'll fix that one tonight. |
The CBOE website has an interesting quote at the bottom re downloading quote information electronically.. |
Good call... |
pls rebase this |
@@ -55,6 +55,10 @@ performance improvements along with a large number of bug fixes. | |||
|
|||
Highlights include: | |||
|
|||
Experimental Features | |||
~~~~~~~~~~~~~~~~~~~~~ | |||
- ``pandas.io.data.Options`` now has a get_all_data method and consistently returns a ''MultiIndex'' ''DataFrame'' (:issue:`5602`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to consistently return a multi-indexed DataFrame
can you rebase on master, and ping when ready |
…for a ticker. Also added a few helper functions. These functions could be applied to refactor some of the other methods. ENH: In Options.get_all_data: Now checking for any option tag (instead of just mini) Changed expiry to datetime from string. Added tests for tick functions. BUG: Fixed no sign in change column of option download. BUG: Fix bugs in Options class Dealt with situation of calculating expiry when symbol contains a hyphen Fixed bug in finding current expiry month. BUG: Fixed Options.get_forward_data expiry date Method assumed expiry date is the same for all option in a given month. Not the case for options with weekly's. Also breaks with options that have tags. BUG: Fixed Option bug that didn't allow LEAP DL in January. Option class was checking only the month to determine if the requested month was the current month. Changed to check year and month. Now allows downloads of next years LEAPS's in January. ENH: Added option tag and underlying price to option data output. Factored out URL parsing and error checking from individual methods. ENH: Refactor of Option class in io.data. Consistently returns multi-index data frame. Improves speed of downloading combination of calls and puts by only accessing yahoo once per expiry month. CLN: Fix out of date docstrings in io.data.Options Moved _parse_row_values definition into _unpack. CLN: Consistent capitalization in output data. CLN: Remove Tag, leave Root in data frame output. CLN: Remove unnecessary _tag_from_root method. BUG: Fix different capitalizations of Rootexp in _process_data. TST: Update tests for pandas.data.Options TST: Remove test for helper function that no longer exists. TST: Fix option test for change in output TST: Changes io.data.Options tests to self.assertTrue TST: Change tests raise nose.SkipTests on remote data errors TST: Change nose.SkipTest on RemoteDataError instead of IndexError ENH: Added quote time to outputs of data.Options. DOC: Added documentation for io.data.Options DOC: Added documentation of data.Options output. DOC: Updated docstrings on data.io.Options DOC: Added experimental tags to io.data.Options docstrings/documentation. BUG: Bug fixes, added tests, cleanups on documentation TST: Fix test_data Options tests. TST: Add test yahoo finance option pages. DOC: Update example to show slicing. TST: Remove test for long for python 3 compatibility. BUG: Fix quote time scraper TST: Changed the error raised by no tables in data.Options Tests were failing if the scraper got the webpage but there weren't any tables in it. Changed from IndexError to RemoteDataError so that nose would skip it on failure. DOC: Moved reference to new Options method to v0.14.1.txt DOC: Updated release at 0.14.1.txt for io.data.Options
@jreback Rebased and green. |
ENH: Added method to pandas.data.Options to download all option data for...
@dstephens99 thanks! pls look at the docs after they are built, may need to adjust (after travis finishes they will update): http://pandas-docs.github.io/pandas-docs-travis/remote_data.html |
@dstephens99 in particular, I suspect the appl optinos will now fail (as the 600 strike price is now gone after the split). |
docs are built.... @jorisvandenbossche any thoughts? |
Experimental Features | ||
~~~~~~~~~~~~~~~~~~~~~ | ||
- ``pandas.io.data.Options`` has a get_all_data method and now consistently returns a multi-indexed ''DataFrame'' (:issue:`5602`) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved to v0.14.1.txt (or removed if it is already there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its in v0.14.1.txt, I will remove it here.
... a ticker.
Also added a few helper functions. These functions could be applied to refactor some of the other methods.