Cloud Computing project

How-tos

S3 bucket

Preprocessing

This script extracts the month names of the file to be preprocessed according to the 'config.txt' file. Each of the preprocessed files are renamed with a 'proc_' prefix. All processed files are then merged into a single file 'RS_data.csv'. Lastly, based on the first and the last timestamp, the file 'days_scheduler.txt' is created. To run the preprocessing and create the scheduler:

bash s3_bucket/preprocessing.sh

Model training

This script will iterate though each day (raw in scheduler.txt) and extract the rows corresponding to that day to train the model. At the end of the training the script will sleep for 10 minutes and then train the data corresponding to the next day.

bash s3_bucket/train_scheduler.sh

Cosa caricare in S3 nella working directory:

run_preprocessing.sh
train_scheduler.sh
Oct.csv, Nov.csv, Dec.csv (piccoli da sostiuire con file interi)
config.txt
create_scheduler.py
preprocessing.py
train.py

How to setup the environment

The project is built upon Python 3.8 using the PySpark package.

We recommend installing Anaconda, which comes bundled with many useful modules and tools such as the virtual environments.

After Anaconda is installed, you can install Python's dependencies with:

pip install -r requirements.txt

At this point you should have the correct environment to interact with the scripts in this project.

How to train the model

Be sure to have the dependencies installed and just type:

python train.py

How to build and run the Docker container

To build the container, from the root folder (the one with Dockerfile, requirements.txt etc) type:

bash scripts/docker_build.sh

To run the container, from the root folder type:

bash scripts/docker_run.sh

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
s3_bucket		s3_bucket
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
datasets.py		datasets.py
preprocessed_dataset.tsv		preprocessed_dataset.tsv
recommender_system.py		recommender_system.py
requirements.txt		requirements.txt
sagemaker.ipynb		sagemaker.ipynb
train.py		train.py
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloud Computing project

How-tos

S3 bucket

Preprocessing

Model training

Cosa caricare in S3 nella working directory:

How to setup the environment

How to train the model

How to build and run the Docker container

About

Releases

Packages

Contributors 4

Languages

rom42pla/cc_project

Folders and files

Latest commit

History

Repository files navigation

Cloud Computing project

How-tos

S3 bucket

Preprocessing

Model training

Cosa caricare in S3 nella working directory:

How to setup the environment

How to train the model

How to build and run the Docker container

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages