From cd0dedb285537bee39efb181cc6dfeccfb62b9ae Mon Sep 17 00:00:00 2001 From: Preetham Gujjula Date: Thu, 4 Apr 2024 18:03:38 -0700 Subject: [PATCH] [wip] report --- README.md | 2 + apply-merge.cabal | 1 + docs/ALGORITHM.md | 172 ++++++++++++++++++++++++++++++++++++ package.yaml | 1 + src/Data/List/ApplyMerge.hs | 11 +++ 5 files changed, 187 insertions(+) create mode 100644 docs/ALGORITHM.md diff --git a/README.md b/README.md index 92dda19..aa8e1b9 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,8 @@ SPDX-License-Identifier: BSD-3-Clause ⚠️ Under construction. No stability guarantees, even commit history may be overwritten. +See [ALGORITHM.md](docs/ALGORITHM.md) for a full description of the algorithm. + ## Licensing This project licensed under BSD-3-Clause (except for `.gitignore`, which is under CC0-1.0), and follows [REUSE](https://reuse.software) licensing diff --git a/apply-merge.cabal b/apply-merge.cabal index 2ef1200..cc0b55f 100644 --- a/apply-merge.cabal +++ b/apply-merge.cabal @@ -19,6 +19,7 @@ tested-with: extra-doc-files: README.md ChangeLog.md + docs/ALGORITHM.md source-repository head type: git diff --git a/docs/ALGORITHM.md b/docs/ALGORITHM.md new file mode 100644 index 0000000..5281e6e --- /dev/null +++ b/docs/ALGORITHM.md @@ -0,0 +1,172 @@ + + +# Motivation + +## The problem +Say that we want to compute the [3-smooth numbers](https://en.wikipedia.org/wiki/Smooth_number), +i.e., the integers that can be written in the form $2^{i} 3^{j}$, where $i$ and +$j$ are non-negative integers. + +It's easy enough to compute the powers of 2 and 3: + +```haskell +powersOf2 :: [Integer] +powersOf2 = iterate (*2) 1 + +powersOf3 :: [Integer] +powersOf3 = iterate (*3) 1 +``` + +But how do we implement the following? + +```haskell +-- take 10 smooth3 == [1, 2, 3, 4, 6, 9, 12, 16, 18, 24] +smooth3 :: [Integer] +smooth3 = .. +``` + +If we limited `powersOf2` and `powersOf3`, say to the first 10 elements, we +could write: + +```haskell +powersOf2Bounded :: [Integer] +powersOf2Bounded = take 10 (iterate (*2) 1) + +powersOf3Bounded :: [Integer] +powersOf3Bounded = take 10 (iterate (*3) 1) + +smooth3Bounded :: [Integer] +smooth3Bounded = sort (liftA2 (*) powersOf2Bounded powersOf3Bounded) +``` + +And `smooth3Bounded` would consist of $\\{2^{i} 3^{j} \mid 0 \le i, j < 10\\}$. +But this doesn't work when `powersOf2` and `powersOf3` are infinite. We need a +way to lazily generate `smooth3` given `powersOf2` and `powersOf3`. + +## A closer look +Let's lay out the elements of `smooth3` in a 2D grid: + +
+    |    1     2     4     8    ..
+ ---------------------------------
+  1 |    1     2     4     8    ..
+    |
+  3 |    3     6    12    24    ..
+    |
+  9 |    9    18    36    72    ..
+    |
+ 27 |   27    54   108   216    ..
+    |
+  : |    :     :     :     :    ॱ.
+
+ +Since `(*)` is monotonically increasing in both arguments, each element is ≤ +the elements below and to the right. Therefore, when an element appears in +`smooth3`, the element above and to the left must already have appeared. + +Let's think about `smooth3` after 3 elements have been produced: + +
+    |    1     2     4     8    ..
+ ---------------------------------
+  1 |    1     2     4     8    ..
+    |
+  3 |    3     6    12    24    ..
+    |
+  9 |    9    18    36    72    ..
+    |
+ 27 |   27    54   108   216    ..
+    |
+  : |    :     :     :     :    ॱ.
+
+After producing `1, 2, 3`, the next element in `smooth3` can only be one of `{4, 6, 9}`. We know this without preforming any comparisons, just by the positions of these elements in the grid, as these are the only elements whose up- and left-neighbors have already been produced. + +After producing the next element, `4`, our grid becomes + +
+    |    1     2     4     8    ..
+ ---------------------------------
+  1 |    1     2     4     8    ..
+    |
+  3 |    3     6    12    24    ..
+    |
+  9 |    9    18    36    72    ..
+    |
+ 27 |   27    54   108   216    ..
+    |
+  : |    :     :     :     :    ॱ.
+
+ +Now, the "frontier" of possible next elements is `{6, 8, 9}`. We added `8` to the frontier as it is no longer "blocked" by `4`. Note that we cannot yet add `12` to the frontier even though it's no longer blocked by `4`, as it's still blocked by `6`. + +This investigation suggests the following algorithm for producing `smooth3`: +``` +1. Initialize a "frontier" with the top left element +2. Delete the smallest element x from the frontier and yield it as the next element of smooth3. +3. If the down- or right-neighbors of x are unblocked, add them to the frontier. +4. Go to step 2. +``` + +# The applyMerge function + +The idea in the algorithm above is quite general, and instead of using it on +`(*)`, `powersOf2`, and `powersOf3`, we can apply it to any binary function `f` +that is monotonically increasing in both arguments, and any ordered lists `xs` +and `ys`. Then we can define a general +```haskell +applyMerge :: (Ord c) => (a -> b -> c) -> [a] -> [b] -> [c] +``` +where `applyMerge f xs ys` is an ordered list of all `f x y`, where `x ∈ xs` and `y ∈ ys`. In particular, `smooth3 = applyMerge (*) powersOf2 powersOf3`. +We can also define a more general +```haskell +applyMergeBy :: (c -> c -> Ordering) -> (a -> b -> c) -> [a] -> [b] -> [c] +``` + +## Implementation and complexity +We can use a priority queue to maintain the frontier of elements. To determine in step 3 whether a down- or right-neighbor is unblocked, we maintain two `IntSet`s, for the `x`- and `y`- indices of all elements in the frontier. Then in step 3, if the `x`- and `y`-index of a neighbor are not in the respective `IntSet`s, the neighbor is unblocked and can be added to the frontier. + +After producing $n$ elements, the size of the frontier is at most $O(\sqrt{n})$ elements. (TODO: prove this) Manipulating the priority queue and the `IntSet`s to produce the next element takes $O(\text{log } \sqrt{n}) = O(\text{log } n)$ time. Therefore, producing $n$ elements of `applyMerge f xs ys` takes $O(n \log n)$ time and $O(\sqrt{n})$ auxiliary space. + +## Other applications + +`applyMerge` is versatile! +```haskell +-- Sieve of Erastosthenes +primes :: [Int] +primes = 2 : ([3..] `minus` composites) -- `minus` from data-ordlist + +composites :: [Int] +composites = applyMerge (*) primes [2..] + +-- Square-free integers +squarefrees :: [Int] +squarefrees = [1..] `minus` applyMerge (*) (map (^2) primes) [1..] + +-- Gaussian integers in the first quadrant, ordered by norm +gaussianInts :: [Complex Int] +gaussianInts = applyMergeBy (comparing (\a b -> a*a + b*b)) (:+) [1..] [1..] +``` + +# Prior work +In [data-ordlist](https://www.stackage.org/lts/package/data-ordlist), +there is +[mergeAll](https://www.stackage.org/haddock/lts/data-ordlist/Data-List-Ordered.html#v:mergeAll) :: Ord a => [[a]] -> [a], +which merges a potentially infinite list of ordered lists, where the heads of +the inner lists are sorted. This is more general than `applyMerge`, in the sense +that we can implement `applyMerge` in terms of `mergeAll`: + +```haskell +applyMerge :: (Ord c) => (a -> b -> c) -> [a] -> [b] -> [c] +applyMerge f xs ys = + mergeAll (map (\y -> map (\x -> f x y) xs) ys) +``` + +However, `mergeAll` uses $O(n)$ auxiliary space in the worst case, while our +implementation of `applyMerge` uses just $O(\sqrt{n})$ auxiliary space. + +--- + +###### Copyright Preetham Gujjula, licensed under [CC-BY-SA-4.0](../LICENSES/CC-BY-SA-4.0.txt). diff --git a/package.yaml b/package.yaml index 400c865..bdfdf00 100644 --- a/package.yaml +++ b/package.yaml @@ -13,6 +13,7 @@ copyright: "Preetham Gujjula" extra-doc-files: - README.md - ChangeLog.md +- docs/ALGORITHM.md description: Please see the README on GitHub at diff --git a/src/Data/List/ApplyMerge.hs b/src/Data/List/ApplyMerge.hs index f7808c9..dd6ff41 100644 --- a/src/Data/List/ApplyMerge.hs +++ b/src/Data/List/ApplyMerge.hs @@ -1,9 +1,20 @@ -- SPDX-FileCopyrightText: Copyright Preetham Gujjula -- SPDX-License-Identifier: BSD-3-Clause +-- | +-- Module: Data.List.ApplyMerge +-- License: BSD-3-Clause +-- Maintainer: Preetham Gujjula +-- Stability: experimental module Data.List.ApplyMerge (applyMerge) where import Data.List.ApplyMerge.IntSet qualified +-- | Given a binary function @f@ that is monotonically increasing in each +-- argument, and ordered lists @xs@ and @ys@, +-- +-- > applyMerge f xs ys +-- +-- is an ordered list of all @f x y@, where @x ∈ xs@ and @y ∈ ys@. applyMerge :: (Ord c) => (a -> b -> c) -> [a] -> [b] -> [c] applyMerge = Data.List.ApplyMerge.IntSet.applyMerge