chore: update farthest points freeze files

jolars · Aug 29, 2023 · 2053ec9 · 2053ec9
1 parent 096c880
commit 2053ec9
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/_freeze/posts/farthest-points/index/execute-results/html.json b/_freeze/posts/farthest-points/index/execute-results/html.json
@@ -1,7 +1,7 @@
 {
   "hash": "743ce517d97a222ec7a15a4147f32616",
   "result": {
-    "markdown": "---\ntitle: Finding the Farthest Points in a Point Cloud\ndate: \"2016-10-30\"\nauthor: \"Johan Larsson\"\ndescription: |\n  An algorithm for finding the most separated points in a cloud of\n  points.\ncategories:\n  - r\n  - qualpalr\n  - data visualization\n  - geometry\nimage: farthest-points.png\n---\n\n\nMy R package [qualpalr](https://github.com/jolars/qualpalr) selects qualitative\ncolors by projecting a bunch of colors (as points) to the three-dimensional\nDIN99d color space wherein the distance between any pair colors approximate\ntheir differences in appearance. The package then tries to choose the `n`\ncolors so that the minimal pairwise distance among them is maximized, that is,\nwe want the most similar pair of colors to be as dissimilar as possible.\n\nThis turns out to be much less trivial that one would suspect, which posts on\n[Computational Science](http://scicomp.stackexchange.com/questions/20030/selecting-most-scattered-points-from-a-set-of-points),\n[MATLAB Central](https://se.mathworks.com/matlabcentral/answers/42622-how-can-i-choose-a-subset-of-k-points-the-farthest-apart), \n[Stack Overflow](http://stackoverflow.com/questions/27971223/finding-largest-minimum-distance-among-k-objects-in-n-possible-distinct-position), and\nand [Computer Science](http://cs.stackexchange.com/questions/22767/choosing-a-subset-to-maximize-the-minimum-distance-between-points)\ncan attest to.\n\nUp til now, qualpalr solved this problem with a greedy approach. If we, for instance,\nwant to find `n` points we did the following.\n\n```\nM <- Compute a distance matrix of all points in the sample\nX <- Select the two most distant points from M\nfor i = 3:n\n    X(i) <- Select point in M that maximize the\n            mindistance to all points in X\n```\n\nIn R, this code looked like this (in two dimensions):\n\n\n::: {.cell layout-ncol=\"2\" fig.lab='greedy-approach'}\n\n```{.r .cell-code}\nset.seed(1)\n# find n points\nn <- 3\nmat  <- matrix(runif(100), ncol = 2)\n\ndmat <- as.matrix(stats::dist(mat))\nind <- integer(n)\nind[1:2] <- as.vector(arrayInd(which.max(dmat), .dim = dim(dmat)))\n\nfor (i in 3:n) {\n  mm <- dmat[ind, -ind, drop = FALSE]\n  k <- which.max(mm[(1:ncol(mm) - 1) * nrow(mm) + max.col(t(-mm))])\n  ind[i] <- as.numeric(dimnames(mm)[[2]][k])\n}\n\nplot(mat, asp = 1, xlab = \"\", ylab = \"\")\n```\n\n::: {.cell-output-display}\n![Point cloud](index_files/figure-html/unnamed-chunk-1-1.png){width=384}\n:::\n\n```{.r .cell-code}\nplot(mat, asp = 1, xlab = \"\", ylab = \"\")\npoints(mat[ind, ], pch = 19)\ntext(mat[ind, ], adj = c(0, -1.5))\n```\n\n::: {.cell-output-display}\n![Points selected by the greedy algorithm](index_files/figure-html/unnamed-chunk-1-2.png){width=384}\n:::\n:::\n\n\nWhile this greedy procedure is fast and works well for large values of `n`\nit is quite inefficient in the example above. It is plain to see that there are\nother subsets of 3 points that would have a larger minimum distance but because\nwe base our selection on the previous 2 points that were selected to be\nmaximally distant, the algorithm has to pick a suboptimal third point. The\nminimum distance in our example is 0.7641338.\n\nThe solution I came up with is based on a solution from\nSchlomer et al. [@schlomer2011] who devised of an algorithm to\npartition a sets of points into subsets whilst maximizing the minimal distance.\nThey used [delaunay triangulations](https://en.wikipedia.org/wiki/Delaunay_triangulation)\nbut I decided to simply use the distance matrix instead. The algorithm works as\nfollows.\n\n```\nM <- Compute a distance matrix of all points in the sample\nS <- Sample n points randomly from M\nrepeat\n    for i = 1:n\n        M    <- Add S(i) back into M\n        S(i) <- Find point in M\\S with max mindistance to any point in S\n                until M did not change\n```\n\nIteratively, we put one point from our candidate subset (S) back into the\noriginal se (M) and check all distances between the points in S to those in\nM to find the point with the highest minimum distance. Rinse and repeat until\nwe are only putting back the same points we started the loop with, which\nalways happens. Let's see how this works on the same data set we used above.\n\n\n\n::: {.cell layout-ncol=\"2\" fig.lab='new-approach'}\n\n```{.r .cell-code}\nr <- sample.int(nrow(dmat), n)\n\nrepeat {\n  r_old <- r\n  for (i in 1:n) {\n    mm <- dmat[r[-i], -r[-i], drop = FALSE]\n    k <- which.max(mm[(1:ncol(mm) - 1) * nrow(mm) + max.col(t(-mm))])\n    r[i] <- as.numeric(dimnames(mm)[[2]][k])\n  }\n  if (identical(r_old, r)) break\n}\n\nplot(mat, asp = 1, xlab = \"\", ylab = \"\")\n```\n\n::: {.cell-output-display}\n![Point cloud](index_files/figure-html/unnamed-chunk-2-1.png){width=384}\n:::\n\n```{.r .cell-code}\nplot(mat, asp = 1, xlab = \"\", ylab = \"\")\npoints(mat[r, ], pch = 19)\ntext(mat[r, ], adj = c(0, -1.5))\n```\n\n::: {.cell-output-display}\n![Points selected by the new algorithm](index_files/figure-html/unnamed-chunk-2-2.png){width=384}\n:::\n:::\n\n\nHere, we end up with a minimum distance of 0.8619587. In\nqualpalr, this means that we now achieve slightly more distinct colors.\n\n## Performance\n\nThe new algorithm is slightly slower than the old, greedy approach and slightly\nmore verbose\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf_greedy <- function(data, n) {\n  dmat <- as.matrix(stats::dist(data))\n  ind <- integer(n)\n  ind[1:2] <- as.vector(arrayInd(which.max(dmat), .dim = dim(dmat)))\n  for (i in 3:n) {\n    mm <- dmat[ind, -ind, drop = FALSE]\n    k <- which.max(mm[(1:ncol(mm) - 1) * nrow(mm) + max.col(t(-mm))])\n    ind[i] <- as.numeric(dimnames(mm)[[2]][k])\n  }\n  ind\n}\n\nf_new <- function(dat, n) {\n  dmat <- as.matrix(stats::dist(data))\n  r <- sample.int(nrow(dmat), n)\n  repeat {\n    r_old <- r\n    for (i in 1:n) {\n      mm <- dmat[r[-i], -r[-i], drop = FALSE]\n      k <- which.max(mm[(1:ncol(mm) - 1) * nrow(mm) + max.col(t(-mm))])\n      r[i] <- as.numeric(dimnames(mm)[[2]][k])\n    }\n    if (identical(r_old, r)) return(r)\n  }\n}\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nn <- 5\ndata <- matrix(runif(900), ncol = 3)\nmicrobenchmark::microbenchmark(\n  f_greedy(data, n),\n  f_new(data, n),\n  times = 1000L\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nUnit: milliseconds\n              expr      min       lq     mean   median       uq      max neval\n f_greedy(data, n) 1.282439 1.329496 1.594981 1.339126 1.363927 45.94741  1000\n    f_new(data, n) 1.570062 1.890765 2.392704 2.134552 2.403589 37.93555  1000\n cld\n  a \n   b\n```\n:::\n:::\n\n\nThe newest development version of qualpalr now uses this updated algorithm which\nhas also been generalized and included as a new function in my R \npackage [euclidr](https://github.com/jolars/euclidr) called `farthest_points`.\n\n## References\n\n",
+    "markdown": "---\ntitle: Finding the Farthest Points in a Point Cloud\ndate: \"2016-10-30\"\nauthor: \"Johan Larsson\"\ndescription: |\n  An algorithm for finding the most separated points in a cloud of\n  points.\ncategories:\n  - r\n  - qualpalr\n  - data visualization\n  - geometry\nimage: farthest-points.png\n---\n\n\nMy R package [qualpalr](https://github.com/jolars/qualpalr) selects qualitative\ncolors by projecting a bunch of colors (as points) to the three-dimensional\nDIN99d color space wherein the distance between any pair colors approximate\ntheir differences in appearance. The package then tries to choose the `n`\ncolors so that the minimal pairwise distance among them is maximized, that is,\nwe want the most similar pair of colors to be as dissimilar as possible.\n\nThis turns out to be much less trivial that one would suspect, which posts on\n[Computational Science](http://scicomp.stackexchange.com/questions/20030/selecting-most-scattered-points-from-a-set-of-points),\n[MATLAB Central](https://se.mathworks.com/matlabcentral/answers/42622-how-can-i-choose-a-subset-of-k-points-the-farthest-apart), \n[Stack Overflow](http://stackoverflow.com/questions/27971223/finding-largest-minimum-distance-among-k-objects-in-n-possible-distinct-position), and\nand [Computer Science](http://cs.stackexchange.com/questions/22767/choosing-a-subset-to-maximize-the-minimum-distance-between-points)\ncan attest to.\n\nUp til now, qualpalr solved this problem with a greedy approach. If we, for instance,\nwant to find `n` points we did the following.\n\n```\nM <- Compute a distance matrix of all points in the sample\nX <- Select the two most distant points from M\nfor i = 3:n\n    X(i) <- Select point in M that maximize the\n            mindistance to all points in X\n```\n\nIn R, this code looked like this (in two dimensions):\n\n\n::: {.cell layout-ncol=\"2\" fig.lab='greedy-approach'}\n\n```{.r .cell-code}\nset.seed(1)\n# find n points\nn <- 3\nmat  <- matrix(runif(100), ncol = 2)\n\ndmat <- as.matrix(stats::dist(mat))\nind <- integer(n)\nind[1:2] <- as.vector(arrayInd(which.max(dmat), .dim = dim(dmat)))\n\nfor (i in 3:n) {\n  mm <- dmat[ind, -ind, drop = FALSE]\n  k <- which.max(mm[(1:ncol(mm) - 1) * nrow(mm) + max.col(t(-mm))])\n  ind[i] <- as.numeric(dimnames(mm)[[2]][k])\n}\n\nplot(mat, asp = 1, xlab = \"\", ylab = \"\")\n```\n\n::: {.cell-output-display}\n![Point cloud](index_files/figure-html/unnamed-chunk-1-1.png){width=384}\n:::\n\n```{.r .cell-code}\nplot(mat, asp = 1, xlab = \"\", ylab = \"\")\npoints(mat[ind, ], pch = 19)\ntext(mat[ind, ], adj = c(0, -1.5))\n```\n\n::: {.cell-output-display}\n![Points selected by the greedy algorithm](index_files/figure-html/unnamed-chunk-1-2.png){width=384}\n:::\n:::\n\n\nWhile this greedy procedure is fast and works well for large values of `n`\nit is quite inefficient in the example above. It is plain to see that there are\nother subsets of 3 points that would have a larger minimum distance but because\nwe base our selection on the previous 2 points that were selected to be\nmaximally distant, the algorithm has to pick a suboptimal third point. The\nminimum distance in our example is 0.7641338.\n\nThe solution I came up with is based on a solution from\nSchlomer et al. [@schlomer2011] who devised of an algorithm to\npartition a sets of points into subsets whilst maximizing the minimal distance.\nThey used [delaunay triangulations](https://en.wikipedia.org/wiki/Delaunay_triangulation)\nbut I decided to simply use the distance matrix instead. The algorithm works as\nfollows.\n\n```\nM <- Compute a distance matrix of all points in the sample\nS <- Sample n points randomly from M\nrepeat\n    for i = 1:n\n        M    <- Add S(i) back into M\n        S(i) <- Find point in M\\S with max mindistance to any point in S\n                until M did not change\n```\n\nIteratively, we put one point from our candidate subset (S) back into the\noriginal se (M) and check all distances between the points in S to those in\nM to find the point with the highest minimum distance. Rinse and repeat until\nwe are only putting back the same points we started the loop with, which\nalways happens. Let's see how this works on the same data set we used above.\n\n\n\n::: {.cell layout-ncol=\"2\" fig.lab='new-approach'}\n\n```{.r .cell-code}\nr <- sample.int(nrow(dmat), n)\n\nrepeat {\n  r_old <- r\n  for (i in 1:n) {\n    mm <- dmat[r[-i], -r[-i], drop = FALSE]\n    k <- which.max(mm[(1:ncol(mm) - 1) * nrow(mm) + max.col(t(-mm))])\n    r[i] <- as.numeric(dimnames(mm)[[2]][k])\n  }\n  if (identical(r_old, r)) break\n}\n\nplot(mat, asp = 1, xlab = \"\", ylab = \"\")\n```\n\n::: {.cell-output-display}\n![Point cloud](index_files/figure-html/unnamed-chunk-2-1.png){width=384}\n:::\n\n```{.r .cell-code}\nplot(mat, asp = 1, xlab = \"\", ylab = \"\")\npoints(mat[r, ], pch = 19)\ntext(mat[r, ], adj = c(0, -1.5))\n```\n\n::: {.cell-output-display}\n![Points selected by the new algorithm](index_files/figure-html/unnamed-chunk-2-2.png){width=384}\n:::\n:::\n\n\nHere, we end up with a minimum distance of 0.8619587. In\nqualpalr, this means that we now achieve slightly more distinct colors.\n\n## Performance\n\nThe new algorithm is slightly slower than the old, greedy approach and slightly\nmore verbose\n\n\n::: {.cell}\n\n```{.r .cell-code}\nf_greedy <- function(data, n) {\n  dmat <- as.matrix(stats::dist(data))\n  ind <- integer(n)\n  ind[1:2] <- as.vector(arrayInd(which.max(dmat), .dim = dim(dmat)))\n  for (i in 3:n) {\n    mm <- dmat[ind, -ind, drop = FALSE]\n    k <- which.max(mm[(1:ncol(mm) - 1) * nrow(mm) + max.col(t(-mm))])\n    ind[i] <- as.numeric(dimnames(mm)[[2]][k])\n  }\n  ind\n}\n\nf_new <- function(dat, n) {\n  dmat <- as.matrix(stats::dist(data))\n  r <- sample.int(nrow(dmat), n)\n  repeat {\n    r_old <- r\n    for (i in 1:n) {\n      mm <- dmat[r[-i], -r[-i], drop = FALSE]\n      k <- which.max(mm[(1:ncol(mm) - 1) * nrow(mm) + max.col(t(-mm))])\n      r[i] <- as.numeric(dimnames(mm)[[2]][k])\n    }\n    if (identical(r_old, r)) return(r)\n  }\n}\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nn <- 5\ndata <- matrix(runif(900), ncol = 3)\nmicrobenchmark::microbenchmark(\n  f_greedy(data, n),\n  f_new(data, n),\n  times = 1000L\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nUnit: milliseconds\n              expr      min       lq     mean   median       uq      max neval\n f_greedy(data, n) 1.299801 1.352480 1.707447 1.405463 1.601421 16.06815  1000\n    f_new(data, n) 1.618153 2.012054 2.597314 2.255840 2.666066 43.37420  1000\n cld\n  a \n   b\n```\n:::\n:::\n\n\nThe newest development version of qualpalr now uses this updated algorithm which\nhas also been generalized and included as a new function in my R \npackage [euclidr](https://github.com/jolars/euclidr) called `farthest_points`.\n\n## References\n\n",
     "supporting": [
       "index_files"
     ],