Skip to content
This repository has been archived by the owner on Apr 26, 2022. It is now read-only.
/ wmms Public archive

Weighted maximum-mean subtrees

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

ukgovdatascience/wmms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wmms

Prune a tree to be the most profitable possible subtree that is still connected to the root node. If every node has two numeric values, a and b, the most profitable subtree will be the one where sum(a) / sum(b) over all remaining nodes is the maximum it could possibly be. Minimising can be done by subtracting the a values from zero.

This package implements the algorithm in Carlson J., Eppstein D. (2006) The Weighted Maximum-Mean Subtree and Other Bicriterion Subtree Problems. In: Arge L., Freivalds R. (eds) Algorithm Theory – SWAT 2006. SWAT 2006. Lecture Notes in Computer Science, vol 4059. Springer, Berlin, Heidelberg.

Pure base R is used, yet the package is reasonably quick and has been used to find optimal trees for every LSOA11 (Lower Layer Super Output Area) in Wales in less than a minute on a standard laptop.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("ukgovdatascience/wmms")

Example

Here is an example of building a broadband network out from an existing node. The cost of extending the network to each node is a function of its distance from its parent. We want to minimise the cost of connecting each node, when averaged over all the nodes that are connected.

In this graph, the root node in the centre has no value, because it already exists. The node beside it has the value 10, because its distance from the root is 10. The b value of all nodes is 1, so that the algorithm will find an unweighted mean.

Create this graph by making a data frame of edges, one row per edge, with the columns from and to referring to node IDs, and a and b giving the values of the to node.

(One way to obtain such an edgelist from geographical coordinates of each node is the emstreeR package, which calculates a Euclidean Minimum Spanning Tree.)

library(wmms)

# This graph was invented by Jo Keefe at the UK Department for Digital, Culture,
# Media and Sport.
edges <- data.frame(from = c(1, rep(2:7, each = 5)),
                    to = c(2, 3:32),
                    a = c(10, rep(1:5, 6)),
                    b = 1)
edges
#>    from to  a b
#> 1     1  2 10 1
#> 2     2  3  1 1
#> 3     2  4  2 1
#> 4     2  5  3 1
#> 5     2  6  4 1
#> 6     2  7  5 1
#> 7     3  8  1 1
#> 8     3  9  2 1
#> 9     3 10  3 1
#> 10    3 11  4 1
#> 11    3 12  5 1
#> 12    4 13  1 1
#> 13    4 14  2 1
#> 14    4 15  3 1
#> 15    4 16  4 1
#> 16    4 17  5 1
#> 17    5 18  1 1
#> 18    5 19  2 1
#> 19    5 20  3 1
#> 20    5 21  4 1
#> 21    5 22  5 1
#> 22    6 23  1 1
#> 23    6 24  2 1
#> 24    6 25  3 1
#> 25    6 26  4 1
#> 26    6 27  5 1
#> 27    7 28  1 1
#> 28    7 29  2 1
#> 29    7 30  3 1
#> 30    7 31  4 1
#> 31    7 32  5 1

Now prune the tree to obtain the optimum possible average value over all nodes. In this example, the values a are made negative, so that a/b will be minimised rather than maximised.

root_id <- 1

edges$a <- 0 - edges$a

maximum <- weighted_maximum_mean_subtrees(edges, root_id)

maximum$a <- 0 - maximum$a
maximum$value <- 0 - maximum$value

maximum
#>    from to  a b depth ancestor    value
#> 1     1  2 10 1     1        2 2.461538
#> 2     2  3  1 1     2        2 2.461538
#> 3     2  4  2 1     2        2 2.461538
#> 4     2  5  3 1     2        2 2.461538
#> 5     2  6  4 1     2        2 2.461538
#> 6     2  7  5 1     2        7 2.666667
#> 7     3  8  1 1     3        2 2.461538
#> 8     3  9  2 1     3        2 2.461538
#> 9     3 10  3 1     3       10 3.000000
#> 10    3 11  4 1     3       11 4.000000
#> 11    3 12  5 1     3       12 5.000000
#> 12    4 13  1 1     3        2 2.461538
#> 13    4 14  2 1     3        2 2.461538
#> 14    4 15  3 1     3       15 3.000000
#> 15    4 16  4 1     3       16 4.000000
#> 16    4 17  5 1     3       17 5.000000
#> 17    5 18  1 1     3        2 2.461538
#> 18    5 19  2 1     3        2 2.461538
#> 19    5 20  3 1     3       20 3.000000
#> 20    5 21  4 1     3       21 4.000000
#> 21    5 22  5 1     3       22 5.000000
#> 22    6 23  1 1     3        2 2.461538
#> 23    6 24  2 1     3        2 2.461538
#> 24    6 25  3 1     3       25 3.000000
#> 25    6 26  4 1     3       26 4.000000
#> 26    6 27  5 1     3       27 5.000000
#> 27    7 28  1 1     3        7 2.666667
#> 28    7 29  2 1     3        7 2.666667
#> 29    7 30  3 1     3       30 3.000000
#> 30    7 31  4 1     3       31 4.000000
#> 31    7 32  5 1     3       32 5.000000

Plot the maximum subtree, and all subtrees connected to it. The blue nodes are part of the maximum subtree. Their average cost is 2.46, which is lower than the original value of the top node (10). The cost of appending any other nodes is higher than 2.46. The pink nodes comprise another subtree of nodes that have been averaged together, to achieve the value 2.67, which is lower than the original value of their top node (5).

Algorithm

A worked example is in a Google Sheet.

It is also explained in the docs ?algorithm in the file R/algorithm.R.

About

Weighted maximum-mean subtrees

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages