Skip to content

jmcp/solar-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

About this repo

This repo is a demonstration of using Apache Spark with Python to provide analysis of CSV-delimited data from a PV solar inverter. See pvoutput.org, jfy-monitor and monitoring my inverter.

There are two file formats involved in the entire collection. The first is the output from solarmonj, and has the following schema:

Field name Datatype and units
Timestamp seconds-since-epoch
Temperature float (degrees C)
energyNow float (Watts)
energyToday float (Watt-hours)
powerGenerated float (Hertz)
voltageDC float (Volts)
current float (Amps)
energyTotal float (Watt-hours)
voltageAC float (Volts)

Due to bugs in solarmonj when combined with occasionally marginal hardware, some rows in the first version are invalid:

1370752022,1.4013e-45,-0.27184,0,-0.27184,1.4013e-45,1.3703e-40,1.36638e-40,6.43869e-41

Those records are dropped prior to creating RDDs.

The second schema is from jfy-monitor, and has this schema:

The first schema is in effect for records starting on 2013-06-04 and ending on 2018-03-26.

The second schema takes effect with the logfiles starting on 2018-03-27. There were some records from 2018-03-27/8 which have different fields, because I was updating jfyMonitor in production and breaking things. We drop those records.

We load up all the data files, and then from within a venv which has pyspark installed, we run

$ spark-submit /path/to/this/file (args)

to generate several reports: - for each year, which month had the day with the max and min energy outputs - for each month, what was the average energy generated - for each month, what was the total energy generated


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages