An independent study project by Derek Yuan, a student at the Loomis Chaffee School, and Dr. Mark LeBlanc as the advisor.
The objective of this project was to apply and evaluate the accuracy and performance of machine learning algorithms for short term solar power output forecasting. This repository contains all the code written and data used for this project. To read the results and process, the final project paper is available in this repository as well, with the abstract below.
To run the models for yourself, download the repository and keep its internal structure, then run the data preprocessing Jupyter Notebook first to generate to preprocess the data. The machine learning notebooks can then be run, using the preprocessed data. Jupyter Notebooks, Python, and libraries such as Pandas, Numpy, Scikit-Learn, and Plotly are required.
Abstract: The Loomis Chaffee School’s solar field is the largest solar power plant among any K-12 school in Connecticut. Only recently completed in late 2019, it has and will continue to provide a significant portion of the school’s electricity, forming a core part of Loomis’ sustainability initiatives. Like all solar fields, its power generation is highly variable and dependent on external factors such as weather. For such photovoltaic power sources, reliable and successful integration into the larger power grid system depends upon knowledge and prediction of future power output. The application of machine learning models for solar power output forecasting has thus become popular in research and literature, replacing past approaches based on statistical or physical models. This project aims to determine the feasibility and performance of applying three machine learning algorithms, support vector machines; random forests; and k-nearest neighbors, for short term solar power output forecasting, through leveraging the rich data generated by Loomis’ solar array in the first comprehensive study of that data. Only in situ data is used, and findings indicate that the training of the three models on the limited set of data features has promising utility in power prediction, with a significant improvement over a baseline persistence method for hourly prediction and best performance by support vector machines. Ideas to further their performance may include augmentation of solar plant data with local weather or time series specific data preprocessing.