This project aims to build an end-to-end machine learning pipeline to predict stock prices using data from the JPX Tokyo Stock Exchange. The project involves data collection, preprocessing, model training, evaluation, and deployment using AWS services.
- Project Structure
- Dataset
- AWS Setup
- Installation
- Usage
- Model Training
- Evaluation
- Deployment
- Contributing
- License
jpx-stock-prediction/
├── data/
│ ├── raw/
│ ├── processed/
├── notebooks/
├── src/
│ ├── data_preprocessing.py
│ ├── feature_engineering.py
│ ├── model_training.py
│ ├── evaluation.py
│ ├── deployment.py
├── requirements.txt
├── README.md
└── .gitignore
The dataset used for this project is obtained from the JPX Tokyo Stock Exchange Prediction competition on Kaggle. It contains historical stock prices and other relevant financial data.
- S3: For storing raw and processed data, as well as the trained model.
- EC2: For running data preprocessing and model training scripts.
- SageMaker: For managing Jupyter notebooks and training models.
- Lambda: For deploying the model as a serverless function.
- API Gateway: For creating a RESTful API to interact with the deployed model.
-
Create an S3 Bucket:
- Go to the S3 service in the AWS console.
- Create a new bucket (e.g.,
jpx-stock-data
). - Upload your dataset to this bucket.
-
Launch an EC2 Instance:
- Choose an appropriate instance type (e.g.,
t2.medium
). - Set up security groups to allow SSH access.
- Install necessary packages and clone the repository.
- Choose an appropriate instance type (e.g.,
-
Set Up SageMaker:
- Create a new notebook instance.
- Attach the appropriate IAM role with S3 access.
- Open the notebook and clone the repository.
-
Lambda Function and API Gateway:
- Write a Lambda function to load the model from S3 and handle inference requests.
- Create a new API Gateway and integrate it with the Lambda function.
To set up the project locally, follow these steps:
- Clone the repository:
git clone https://github.com/yourusername/jpx-stock-prediction.git cd jpx-stock-prediction
- Install the required packages:
pip install -r requirements.txt
- Data Preprocessing: Run the data preprocessing script to clean and prepare the data:
python src/data_preprocessing.py
- Feature Engineering: Generate features for the model:
python src/feature_engineering.py
- Model Training: Train the model using the processed data:
python src/model_training.py
- Evaluation: Evaluate the trained model:
python src/evaluation.py
The model training script includes the following steps:
- Loading the processed data
- Splitting the data into training and validation sets
- Training a linear regression model (or any other chosen model)
- Saving the trained model to S3
The evaluation script calculates performance metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R²) to assess the model's accuracy.
To deploy the model using AWS services, follow these steps:
-
Upload the Trained Model to S3:
- Save the trained model to your local system.
- Upload the model to your S3 bucket (e.g.,
s3://jpx-stock-data/model/
).
-
Set Up AWS Lambda:
- Create a new Lambda function.
- Write the function code to load the model from S3 and handle inference requests.
- Configure the function's execution role to allow S3 access.
-
Create an API Gateway:
- Create a new RESTful API.
- Integrate the API with the Lambda function.
- Deploy the API to make it accessible.
Contributions are welcome! Please follow these steps to contribute:
- Fork the repository
- Create a new branch (
git checkout -b feature-branch
) - Commit your changes (
git commit -m 'Add some feature'
) - Push to the branch (
git push origin feature-branch
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.