Skip to content

Commit

Permalink
Update documentation based on feedback from meooow
Browse files Browse the repository at this point in the history
Update documentation based on feedback from meooow on the Haskell
Discourse thread at [1]:
  * Add reference to a similar algorithm by Steven Skiena.
  * Add note about unexpected memory usage in some cases.

Also fix a typo "preforming" -> "performing".

[1]: https://discourse.haskell.org/t/apply-merge-lift-a-binary-increasing-function-onto-ordered-lists-and-produce-ordered-output/9269/4
  • Loading branch information
pgujjula committed Apr 12, 2024
1 parent 7d8bbe2 commit e54673c
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 3 deletions.
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ an ordered list of all `f x y`, for each `x` in `xs` and `y` in `ys`.

Producing $n$ elements of `applyMerge f xs ys` takes $O(n \log n)$ time and
$O(\sqrt{n})$ auxiliary space, assuming that `f` and `compare` take $O(1)$ time.
See
[docs/ALGORITHM.md#note-about-memory-usage](docs/ALGORITHM.md#note-about-memory-usage)
for caveats.

## Examples

Expand Down Expand Up @@ -72,8 +75,8 @@ from the idea that this function is equivalent to `sort (liftA2 f xs ys)` when

## Further reading

See [ALGORITHM.md](docs/ALGORITHM.md) for a full exposition of the `applyMerge`
function and its implementation.
See [docs/ALGORITHM.md](docs/ALGORITHM.md) for a full exposition of the
`applyMerge` function and its implementation.

## Licensing

Expand Down
45 changes: 44 additions & 1 deletion docs/ALGORITHM.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ Let's think about `smooth3` after 3 elements have been produced:
</pre>

After producing `1, 2, 3`, the next element in `smooth3` can only be one of
`{4, 6, 9}`. We know this without preforming any comparisons, just by the
`{4, 6, 9}`. We know this without performing any comparisons, just by the
positions of these elements in the grid, as these are the only elements whose
up- and left-neighbors have already been produced.

Expand Down Expand Up @@ -166,6 +166,20 @@ $O(\text{log } \sqrt{n}) = O(\text{log } n)$ time. Therefore, producing $n$
elements of `applyMerge f xs ys` takes $O(n \log n)$ time and $O(\sqrt{n})$
auxiliary space, assuming that `f` and `compare` take $O(1)$ time.

### Note about memory usage
Note that `applyMerge` retains the input lists in memory, which could cause
unexpected memory usage when the input lists are lazily generated. For example,
```
sum (take n (applyMerge const [1 :: Int ..] [1 :: Int ..]))
```
requires retaining the first $n$ elements of the second list, and so uses $O(n)$
space. Constrast this with
```
sum (take n (applyMerge (+) [1 :: Int ..] [1 :: Int ..]))
```
which requires retaining the first $O(\sqrt{n})$ elements of each list, and uses
$O(\sqrt{n})$ space.
## More examples
With `applyMerge`, we can implement a variety of complex algorithms succinctly.
Expand Down Expand Up @@ -198,6 +212,8 @@ squarefrees = [1..] `minus` applyMerge (*) (map (^2) primes) [1..]

# Prior work

## mergeAll from data-ordlist

In <code>[data-ordlist](https://www.stackage.org/lts/package/data-ordlist)</code>,
there is <code>[mergeAll](https://www.stackage.org/haddock/lts/data-ordlist/Data-List-Ordered.html#v:mergeAll) :: Ord a => [[a]] -> [a]</code>,
which merges a potentially infinite list of ordered lists, where the heads of
Expand All @@ -213,4 +229,31 @@ applyMerge f xs ys =
However, `mergeAll` uses $O(n)$ auxiliary space in the worst case, while our
implementation of `applyMerge` uses just $O(\sqrt{n})$ auxiliary space.

## Skiena's algorithm

In [The Algorithm Design Manual](https://doi.org/10.1007%2F978-1-84800-070-4_4),
Steven Skiena describes an algorithm for minimizing the sum of two airline
ticket fares:

> “Got it!,” I said. “We will keep track of index pairs in a priority queue,
> with the sum of the fare costs as the key for the pair. Initially we put only
> pair (1, 1) on the queue. If it proves it is not feasible, we put its two
> successors on—namely (1, 2) and (2, 1). In general, we enqueue pairs
> (i + 1, j) and (i, j + 1) after evaluating/rejecting pair (i, j). We will get
> through all the pairs in the right order if we do so.”
>
> The gang caught on quickly. “Sure. But what about duplicates? We will
> construct pair (x, y) two different ways, both when expanding (x − 1, y) and
> (x, y −1).”
>
> “You are right. We need an extra data structure to guard against duplicates.
> The simplest might be a hash table to tell us whether a given pair exists in
> the priority queue before we insert a duplicate. In fact, we will never have
> more than n active pairs in our data structure, since there can only be one
> pair for each distinct value of the first coordinate.”
This is similar to the `applyMerge` algorithm, except that `applyMerge` has an
optimization to check that we don’t add (x, y) to the priority queue when there
is already an (x′, y) with x < x′ or (x, y′) with y < y′ in the queue.

[^1]: Note that this is really the Sieve of Erastosthenes, as defined in the classic [The Genuine Sieve of Eratosthenes](https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf). Constrast this to other simple prime generation implementations, such as <pre> primes = sieve [2..] where sieve (p : xs) = p : sieve [x | x <- xs, x \`rem\` p > 0]</pre> which is actually trial division and not a faithful implementation of the Sieve of Erastosthenes.

0 comments on commit e54673c

Please sign in to comment.