Clone project
git clone https://github.com/lijoabraham/mutual_fund_recommendation_etl.git
$ cd mutual_fund_recommendation_etl/airflow/
$ docker build --rm -t mutual_fund_recommendation_etl:latest .
Change the image name in the below place of docker-compose.yaml file, if you have a different image name
airflow-webserver:
image: mutual_fund_recommendation_etl:latest
$ cd mutual_fund_recommendation_etl/
$ docker-compose -f docker-compose.yml up -d
$ docker-compose run airflow-webserver airflow users create --role Admin --username admin \
--email admin --firstname admin --lastname admin --password admin
- Go to Airflow UI > Admin > Edit connections
- Edit spark_default entry:
- Connection Type: Spark
- Host: spark://spark
- Port: 7077
- Airflow: http://localhost:8080 (admin/admin)
- Spark Master: http://localhost:8081
- Apache superset - http://localhost:8088 (admin/admin)
- Jupyter Notebook - http://localhost:8888 ( follow along for token/password)
- Login to mysql docker container and import the
dump-superset-latest
SQL file fromsrc/app/sqls
folder
- Login to
mutual_fund_recommendation_etl:latest
container and run the following jobspython src/app/jobs/mfscrapper.py
spark-submit --master spark://spark:7077 --files /usr/local/spark/app/configs/scrapper.json --py-files /usr/local/spark/app/packages.zip --jars=/usr/local/spark/app/dependencies/mysql-connector-j-8.0.31.jar --name arrow-spark --verbose --queue root.default /usr/local/spark/app/jobs/etl_job.py
Medium - https://medium.com/@lijoabraham1234/finding-the-right-mutual-fund-using-spark-ml-2f3e96f2c535
https://zerodha.com/varsity/chapter/how-to-analyze-a-debt-mutual-fund/ https://www.crisil.com/content/dam/crisil/generic-images1/what-we-do/financial-products/mutual-fund-ranking/CRISIL_Mutual_Fund_Ranking_Methodology.pdf