This repository contains some zipline data bundles, which are used to download and extract historical price data for backtesting in zipline platform. Zipline is a backtesting framework written in python, which can be used to test, analyze and visualize trading strategies. Zipline was developed and maintained by Quantopian until 2020. Since Quantopian's shutdown, this repository is developed and tested with zipline_reloaded, one of zipline's actively developed fork.
You have already cloned the repository and switched the working directory to the repository root. Then, in an environment where zipline is installed and working, adding all data bundles can be done by
python install.py
Note that the installer complains if there already exist python
modules with the same name. To force the installer to overwrite the
existing modules, add -f
. Apart from zipline
itself and its
dependencies, there are additional dependencies used by each
bundle. To install them all run
pip install -r requirements.txt
You can check if the installation is complete by running:
zipline bundles
You should see new bundles are added to the list:
csvdir <no ingestions>
iex <no ingestions>
quandl <no ingestions>
quantopian-quandl <no ingestions>
yahoo_csv <no ingestions>
yahoo_direct <no ingestions>
To test the installation, a simple strategy backtest can be executed
over price data read by one of the bundles. For this, yahoo_csv
bundle is used, which reads data from a directory containing csv files
that are downloaded from yahoo finance. First, ingest price data
stored in data directory:
YAHOO_CSVDIR=./data/ zipline ingest -b yahoo_csv
Then backtest buy and hold strategy over the ingested data:
zipline run -f tests/buy_and_hold.py -b yahoo_csv --start 2019-07-02 --end 2020-07-02
The cumulative return of the strategy will be depicted in a plot after backtesting.
The following bundles are currently defined by the repository.
Bundle | Data Source | Dependency | Module |
---|---|---|---|
yahoo_csv |
csv files downloaded from yahoo finance | none | none |
yahoo_direct |
yahoo finance | yahoofinancials |
yahoo.py |
iex |
IEX cloud | iexfinance |
iex.py |
binance_daily |
daily price from binance exchange | python-binance |
binance.py |
binance_min |
per minute price from binance exchange | python-binance |
binance.py |
This bundle takes data from CSV files downloaded from yahoo
finance. Each file contains price data of a single asset and shall be
named as assert_name.csv
. The bundle reads all the csv files located
in a directory given by environment variable YAHOO_CSVDIR
:
YAHOO_CSVDIR=/path/to/csvdir zipline ingest -b yahoo_csv
It directly downloads price data from yahoo finance. The bundle
extracts asset names from environment variable YAHOO_SYM_LST
, which
holds a comma separated list of asset names. For example:
YAHOO_SYM_LST=SPY,AAPL zipline ingest -b yahoo_direct
ingests price data of assets SPY
and AAPL
. The start and the end
date of ingestion can be set into variables start_date
and
end_date
, respectively. These variables are passed to function
get_downloader
where the bundle is registered in
$HOME/.zipline/extension.py
. Here is how the registration may look
like:
register('yahoo_direct', # bundle's name
direct_ingester('YAHOO',
every_min_bar=False,
symbol_list_env='YAHOO_SYM_LST', # the environment variable holding the comma separated list of assert names
downloader=yahoo.get_downloader(start_date='2010-01-01',
end_date='2020-01-01'
),
),
calendar_name='NYSE',
)
In addition to the start and the end date, the environment variable
name holding price data can be set here. direct_ingester
can
additionally takes callable filter_cb
. It takes as a parameter a
data frame that is just retured from the downloader and returns a new
data frame. It is useful when the downloaded price data needs
additional prepossessing.
It downloads price data from IEX cloud. Its usage is fairly similar to
that of yahoo_direct
. Fetching price data from IEX cloud however
requires passing a valid API token, which is stored in environment
variable IEX_TOKEN
. Moreover, the environment variable storing asset
names is called IEX_SYM_LST
.
Both collect data from binance cryptocurrency exchange with daily and
minutely frequency. The list of symbols are taken from environment
variable BINANCE_SYM_LST
. Moreover, the API key and the secret key
are supposed to set in environment variables BINANCE_API_KEY
and
BINANCE_SECRET_KEY
. For example, the following command ingests the
daily price of bitcoin and ethereum in USD.
BINANCE_API_KEY=your_api_key BINANCE_SECRET_KEY=your_secret_key BINANCE_SYM_LST=BTCUSDT,ETHUSDT zipline ingest -b binance_daily
install.py takes the following steps to add the bundles:
-
copy extension.py into
~/.zipline/
, -
add ingester.py as well as the proper module for each bundle listed in above table into package
zipline.data.bundles
, i.e. copy the modules into where the package is located. Package location differs depending on the way zipline is installed. One way to find out the location in an environment with zipline installed is to run the following code:python -c 'import zipline.data.bundles as bdl; print(bdl.__path__)'
If only a subset of bundles is needed, one way is to keep their
registration in extension.py, their dependency in
requirements.txt and their related modules in
variable src_ing
inside install.py. Then, use the
installation script!
It is possible to define new data bundles using the structures provided by this repository. The process is explained in this post in more detail.