-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ibovespa index support #990
Conversation
ibovespa(ibov) is the largest index in Brazil's stocks exchange. The br_index folder has support for downloading new companies for the current index composition. And has support, as well, for downloading companies from historic composition of ibov index. Partially resolves issue #956
…'s stocks exchange (B3) Together with commit c2f933 it resolves issue #956
@you-n-g Should it rerun the mac os test? |
In the course of reviewing this PR we found two issues.
Are you willing to fix them? If not, I will fix these issues in the future. |
@SunsetWolf I can work on this! |
…ode usability. Message in the PR request to understand the context of the change In the course of reviewing this PR we found two issues. 1. there are multiple places where the get_instruments() method is used, and we feel that scripts.index.py is the best place for the get_instruments() method to go. 2. data_collector.utils has some very generic stuff put inside it.
The reason to use retry=2 is due to the fact that Yahoo Finance unfortunately does not keep track of the majority of Brazilian stocks. Therefore, the decorator deco_retry with retry argument set to 5 will keep trying to get the stock data 5 times, which makes the code to download Brazilians stocks very slow. In future, this may change, but for now I suggest to leave retry argument to 1 or 2 in order to improve download speed. In order to achieve this code logic an argument called retry_config was added into YahooCollectorBR1d and YahooCollectorBR1min
qlib_dir=qlib_dir, index_name=index_name, freq=freq, request_retry=request_retry, retry_sleep=retry_sleep | ||
) | ||
getattr(obj, method)() | ||
|
||
|
||
if __name__ == "__main__": | ||
fire.Fire(get_instruments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fire.Fire(get_instruments) | |
fire.Fire(partial(get_instruments, market_index="br_index" )) |
- This applies to other index data source as well.
Do you think this will make the interface easier?
Then all the collector.py CLI in each folder can remove a redundant arguments.
@@ -53,9 +53,9 @@ With that reference, the index's composition can be compared quarter by quarter | |||
|
|||
```bash | |||
# parse instruments, using in qlib/instruments. | |||
python collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method parse_instruments | |||
python collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method parse_instruments --market_index br_index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scripts/data_collector/utils.py
Outdated
@@ -564,3 +562,47 @@ def generate_minutes_calendar_from_daily( | |||
|
|||
if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be better if ifmain
is at the bottom of the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, much better!
It was a lack of attention on my part
@@ -147,7 +148,29 @@ def _show_logging_func(): | |||
def get_data( | |||
self, symbol: str, interval: str, start_datetime: pd.Timestamp, end_datetime: pd.Timestamp | |||
) -> pd.DataFrame: | |||
@deco_retry(retry_sleep=self.delay) | |||
if hasattr(self, 'retry_config'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it look simpler to make retry as a part of the class interface ( for example, abstract attribute )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can define
class YahooCollector(...):
retry = 5 # Configuration attribute. How many times will it try to re-request the data if the network fails.
class YahooCollectorBR(...):
""""
The reason to use retry=2 is due to the fact that
Yahoo Finance unfortunately does not keep track of the majority
of Brazilian stocks.
Therefore, the decorator deco_retry with retry argument
set to 5 will keep trying to get the stock data 5 times,
which makes the code to download Brazilians stocks very slow.
In future, this may change, but for now
I suggest to leave retry argument to 1 or 2 in
order to improve download speed.
In order to achieve this code logic an argument called retry_config
was added into YahooCollectorBR base class
"""
retry = 2
Now the retry is part of the interface, so we don't have to use hasattr to access the retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using partial as `fire.Fire(partial(get_instruments, market_index="br_index" ))` will make the interface easier for the user to execute the script. Then all the collector.py CLI in each folder can remove a redundant arguments.
This way we don't have to use hasattr to access the retry attribute as previously done
It looks great now! |
* feat: download ibovespa index historic composition ibovespa(ibov) is the largest index in Brazil's stocks exchange. The br_index folder has support for downloading new companies for the current index composition. And has support, as well, for downloading companies from historic composition of ibov index. Partially resolves issue microsoft#956 * fix: typo error instead of end_date, it was written end_ate * feat: adds support for downloading stocks historic prices from Brazil's stocks exchange (B3) Together with commit c2f933 it resolves issue microsoft#956 * fix: code formatted with black. * wip: Creating code logic for brazils stock market data normalization * docs: brazils stock market data normalization code documentation * fix: code formatted the with black * docs: fixed typo * docs: more info about python version used to generate requirements.txt file * docs: added BeautifulSoup requirements * feat: removed debug prints * feat: added ibov_index_composition variable as a class attribute of IBOVIndex * feat: added increment to generate the four month period used by the ibov index * refactor: Added get_instruments() method inside utils.py for better code usability. Message in the PR request to understand the context of the change In the course of reviewing this PR we found two issues. 1. there are multiple places where the get_instruments() method is used, and we feel that scripts.index.py is the best place for the get_instruments() method to go. 2. data_collector.utils has some very generic stuff put inside it. * refactor: improve brazils stocks download speed The reason to use retry=2 is due to the fact that Yahoo Finance unfortunately does not keep track of the majority of Brazilian stocks. Therefore, the decorator deco_retry with retry argument set to 5 will keep trying to get the stock data 5 times, which makes the code to download Brazilians stocks very slow. In future, this may change, but for now I suggest to leave retry argument to 1 or 2 in order to improve download speed. In order to achieve this code logic an argument called retry_config was added into YahooCollectorBR1d and YahooCollectorBR1min * fix: added __main__ at the bottom of the script * refactor: changed interface inside each index Using partial as `fire.Fire(partial(get_instruments, market_index="br_index" ))` will make the interface easier for the user to execute the script. Then all the collector.py CLI in each folder can remove a redundant arguments. * refactor: implemented class interface retry into YahooCollectorBR * docs: added BR as a possible region into the documentation * refactor: make retry attribute part of the interface This way we don't have to use hasattr to access the retry attribute as previously done
Hi, @igor17400 |
Description
scripts/data_collector/br_index/README.md
scripts/data_collector/br_index/collector.py
scripts/data_collector/br_index/requirements.txt
scripts/data_collector/index.py
scripts/data_collector/utils.py
scripts/data_collector/yahoo/README.md
scripts/data_collector/yahoo/collector.py
Motivation and Context
Brazil's stock exchange is one of the biggest in the world, therefore adding its dataset into qlib can help people do research via qlib infrastructure using different datasets.
How Has This Been Tested?
Pass the test by running:
pytest qlib/tests/test_all_pipeline.py
under upper directory ofqlib
.If you are adding a new feature, test on your own test scripts.
There isn't a specific script for test, however the added code can be tested running the following commands:
python scripts/data_collector/br_index/collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method parse_instruments
python scripts/data_collector/br_index/collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method save_new_companies
python scripts/data_collector/br_index/collector.py download_data --source_dir ~/.qlib/stock_data/source/br_data --start 2003-01-03 --end 2022-03-01 --delay 1 --interval 1d --region BR
python scripts/data_collector/br_index/collector.py download_data --source_dir ~/.qlib/stock_data/source/br_data_1min --delay 1 --interval 1min --region BR
Screenshots of Test Results (if appropriate):
Types of changes
closes #956