A tool for testing various Leak Detection algorithms for Water Distribution Networks

Quick run

Data generation

Open a terminal and navigate to the data_generator folder using the following command:

cd data_generator

From there, run:

docker-compose up

This will populate the dataset folder with 10 artificially generated leak scenarios. This takes around 5 minutes. When the data is generated the container will automatically exit.

Model generation

For each pipeline in the config.ini file, ensure that train_model is set to true so that all models are generated and trained upon starting the application. When cloned directly from the repo, this is always the case. After the first run we can set this to False to save time when testing with the same scenario.

Starting the docker containers

Start the application with the following command:

docker-compose up

On the first run all models are trained first. pipeline0 and pipeline1 train very quickly. The progress of training pipeline2 and pipeline3 can be monitored by following the output of the corresponding container. As soon as the training finishes, each pipeline starts the leak detection progress.

Data should start flowing into the database, this process can be monitored using Grafana. Grafana can be accessed at:

http://localhost:3000

With the following credentials:

username: admin
password: bitnami

InfluxDB can also be accessed directly at:

http://localhost:8086

With the following credentials:

username: admin
password: bitnami123

About

The daily operation of water distribution networks involves many things such as ensuring quality and quantity of water. A significant problem for water distribution network operators is the forming of leaks due to for example wear and tear of pipes.

There is already a great body of research dedicated to the challenge of detecting these leaks automatically. The tool at hand provides a way to experiment with different methods of leak detection with synthesized data.

Model Overview

Name	Model name	Model type	Time series compatible	Leak localization	Output	Thresholding method
pipeline0	Fault Sensitivity Matrix	Statistical	No	Yes	Correlation matrix	Static scaler
pipeline1	Random Forest Classifier	Machine Learning	No	No	Binary label	Non-applicable
pipeline2	LSTM Neural Network	Deep Learning	Yes	Yes	Flow predictions	Confidence interval
pipeline3	Facebook Prophet	Statistical	Yes	Yes	Flow predictions	Confidence interval

Architecture Overview

The tool uses a number of docker containers which communicate with each other in different ways. A visual overview of the technologies used is shown below.

Data generator

The data generator uses WNTR to synthesize a dataset, this is then fed to Kafka to manage the distribution of the data.

Kafka

Apache Kafka is an event streaming platform. In this project it serves to simulate the real-time streaming of data taken from the synthesized dataset. Kafka sends the data to the numerous pipelines in the form of messages at a set interval.

InfluxDB

InfluxDB serves as the data sink in this project. Predictions from the pipelines are saved there along with other things such as the performance over time and the generated flow and pressure data from the data generator

Grafana

Grafana is our main tool for visualizing the data. Provisioned dashboards are set up within Grafana enabling the user to view the generated predictions in real-time. There is also a separate dashboard to view the performance of the algorithms over time.

Config settings

Global settings

Config parameters

wdn_input_file_name - Name of the file to use as the input network. Input network files should be kept in the wdn_input_files folder

message_frequency - This sets the delay for streaming the data using Kafka. If it is set to 0.5, pressure and flow data will be sent every 0.5 seconds.

scenario_name - Name of the scenario to use. This is also the name used for data synthesis so it should be set in line with the scenario_path value in most cases.

experiment_start_time - This is the time from which Kafka should start streaming the data.

scenario_path - Path of the scenario that is used for all the experiments

Data synthesis method

The data is synthesized using the WNTR. WNTR is built upon EPANET which is the industry-standard for water distribution network simulation.

Config parameters

demand_input_file_path - This is the path to the file that serves as the demand pattern for the synthesized scenario

simulation_start_time - A date from which the simulation starts

train_start - The date from which the training set starts

train_end - The date that denotes the end of the training set

val_start - The date from which the validation set starts

val_end - The date that denotes the end of the validation set

test_start - The date from which the test set starts

test_end - The date that denotes the end of the test set

leak_diameter - Size of the leak, LeakDB takes this to be in the range [0.02-0.2)

skip_nodes - A list of node names describing which node data to leave out of the final file. This is helpful since reservoir nodes provide no useful data in terms of pressure and thus pollute the dataset.

synthesize_data - Whether to freshly synthesize data. If this is set to true, a new dataset will be generated upon starting the tool.

is_leak_scenario - Whether the to be synthesized dataset should be a leak scenario or not.

leak_node - The node in which the leak should occur

Leak detection methods

In total there are four implemented leak detection methods. Each one has its own docker container. Now follows a description of the different methods

`pipeline0`

This pipeline uses a method based on a paper by Puig et. al. A leak scenario is simulated for each node to generate a pressure signature for a leak at that node. We then construct a fault sensitivity matrix based on the signatures from all the nodes. Finally, to determine if there is a leak, we compute the correlation between the current pressure signature, and the fault sensitivity matrix. If the correlation exceeds a certain threshold, the algorithm classifies the current time as a leak.

Config parameters

train_model - Determines whether the algorithm should be trained again

correlation_threshold - This is used as the threshold for the correlation

`pipeline1`

A common approach seen in the literature for leak detection is the use of a machine learning classifier for the detection of leaks. This pipeline is dedicated to that type of approach. We have a basic RandomForestClassifier from the sklearn library implemented here. The idea is to train the classifier on time-series data from a leak detection scenario, treating each time point as a point for classification. The classifier is fed binary labels which simply reflect whether there is a leak or not.

Config parameters

train_scenario_path - This is used as the path for the training set. This should be set to a different scenario than the one we are experimenting with since we want the model to be tested on novel data.

`pipeline2`

This pipeline used a method based on a paper by Lee and Yoo. The methods involves prediction of flow using one of the state-of-the-art methods for time-series prediction: a long short-term memory (LTSM) neural network. The flow is first predicted by the network, then based on the performance of the network on the validation set, we generate a confidence interval for the prediction. If measured flow falls outside of the confidence interval of the prediction, we say there is a leak.

Config parameters

train_model - Determines whether the algorithm should be trained again train_scenario_path - This is used as the path for the training set. This should be set to a different scenario than the one we are experimenting with since we want the model to be tested on novel data.

train_model - Determines whether the algorithm should be trained again

z_value - This is our z-value for calculating the confidence interval. A larger z-value means a larger confidence interval.

sequence_length - This is our look-back for the time-series prediction. If it is 3, we look at the 3 previous values to predict the current value

sampling_rate - This determines how often we sample to get our series for prediction. If it is 48, we sample every 48 values. The data synthesizer generates half-hourly data so a sampling_rate of 48 means that we predict based on the previous n days.

`pipeline3`

This pipeline uses Facebook's Prophet model for flow prediction. We also use the built-in confidence interval values from the model to generate the thresholds for leak detections.

Config parameters

train_model - Determines whether the algorithm should be trained again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A tool for testing various Leak Detection algorithms for Water Distribution Networks

Quick run

Data generation

Model generation

Starting the docker containers

About

Model Overview

Architecture Overview

Data generator

Kafka

InfluxDB

Grafana

Config settings

Global settings

Config parameters

Data synthesis method

Config parameters

Leak detection methods

`pipeline0`

Config parameters

`pipeline1`

Config parameters

`pipeline2`

Config parameters

`pipeline3`

Config parameters

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
data_generator		data_generator
dataset		dataset
evaluator		evaluator
grafana		grafana
media		media
pipeline0		pipeline0
pipeline1		pipeline1
pipeline2		pipeline2
pipeline3		pipeline3
producer		producer
wdn_input_files		wdn_input_files
README.md		README.md
config.ini		config.ini
docker-compose.yml		docker-compose.yml

rug-ds-lab/2022-Intern-Marijn-WaterExample

Folders and files

Latest commit

History

Repository files navigation

A tool for testing various Leak Detection algorithms for Water Distribution Networks

Quick run

Data generation

Model generation

Starting the docker containers

About

Model Overview

Architecture Overview

Data generator

Kafka

InfluxDB

Grafana

Config settings

Global settings

Config parameters

Data synthesis method

Config parameters

Leak detection methods

pipeline0

Config parameters

pipeline1

Config parameters

pipeline2

Config parameters

pipeline3

Config parameters

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`pipeline0`

`pipeline1`

`pipeline2`

`pipeline3`

Packages