refactor and enhance AdaptiveThreshold method #30

johnnychen94 · 2019-07-21T22:15:52Z

This PR co-operates #29 with some further noteworthy enhancement:

~~[Enhancement]: support n-D image by taking advantages of CartesianIndices~~ (The method is explained by https://julialang.org/blog/2016/02/iteration)
[Bug fix]: incorrect argument order for keyword constructor

-AdaptiveThreshold(; percentage::Int = 15, window_size::Int = 32) = AdaptiveThreshold(percentage, window_size)
+AdaptiveThreshold(; percentage::Int = 15, window_size::Int = 32) = AdaptiveThreshold(window_size, percentage)

[Bug fix]: according to the docstring of recommend_size, it should be round instead of div/floor

TODO:

add deprecations to binarize(alg, img) in favor of binarize(img, alg) (deprecated in [WIP] refactor codebase with functor APIs #29 )
update the test codes

RFC:

rename: recommend_size --> recommended_size
remove window_size as the property of AdaptiveThreshold:

According to my understanding, window_size requires the information of the to-be-binarized image, which makes it not the intrinsic property of AdaptiveThreshold.

So the proper usage IMO is:

f = AdaptiveThreshold(percentage = 15)
binarize(img, f, window_size)

We could add some one-liners to make the usage more convenient, i.e.,

(f::AdaptiveThreshold)(out, img) = f(out, img, recommended_size(img))

by doing this the default window_size is automatically chosen according to the input image instead of hardcoded 32.

Check psnr as an example, where peak_value doesn't belong to PSNR.

zygmuntszpak · 2019-07-22T01:19:17Z

@rjww You may be interested in these proposed changes too.

zygmuntszpak · 2019-07-22T02:03:27Z

According to my understanding, window_size requires the information of the to-be-binarized image, which makes it not the intrinsic property of AdaptiveThreshold.

The AdaptiveThreshold algorithm adapts the binarization threshold based on the intensity distribution of a sliding region-of-interest in the image. The size of the region-of-interest is determined by the window_size. I reckon that in the context of thresholding, a window_size is something that is intrinsic to an adaptive method since the existence of a window is what often differentiates global thresholding from adaptive local thresholding. A sensible choice of window_size does indeed depend on the size of the image. I would have put it as a property of AdaptiveThreshold, but I would like to understand your point of view better since I may have misunderstood your reasoning.

rename: recommend_size --> recommended_size

We went with recommend_size since we thought of it as an active process. You specify an image and it recommends the appropriate size. recommended_size is what I would have given to the variable name, i.e. recommended_size = recommend_size(img).

johnnychen94 · 2019-07-22T13:37:07Z

a window_size is something that is intrinsic to an adaptive method since the existence of a window is what often differentiates global thresholding from adaptive local thresholding.

After a second thought, speaking of the concept, I think you're right that window_size does differ the behavior of different AdaptiveThrehold objects.

I'm thinking about not re-creating a new AdatptiveThrehold object when we want to binarize a sequence of images, i.e., to avoid the following usage:

f1 = AdaptiveThreshold(window_size = recommend_size(img1))
img1_01 = binarize(img1, f)

f2 = AdaptiveThreshold(window_size = recommend_size(img2), percentage=15)
img2_02 = binarize(img2, f)

...

This won't be a performance issue for any non-trivial algorithms. But the usage isn't so concise to me.

Also, I think manually calling recommend_size is tedious and trivial. The following line makes users no longer need to call it by themselves.

(f::AdaptiveThreshold)(out, img) = f(out, img, recommended_size(img))

and the usage becomes much easier:

f = AdaptiveThreshold()
img1_01 = binarize(img1, f)

img2_02 = binarize(img2, f)

f_30 = AdaptiveThreshold(percentage=30)
img3_03 = binarize(img3, f_30)

I guess this's a trade-off between conceptually-right and engineering-friendly, how do you think?

johnnychen94 · 2019-07-22T13:50:29Z

We went with recommend_size since we thought of it as an active process.

Good point. I was mainly writing python codes in last two weeks and there're functions like repeated, sorted that create a new object instead of doing an in-place operation repeat and sort. But it seems that Julia doesn't use this naming convention.

Another change I forgot to mention: to avoid unnecessary memory allocation, I choose to not creating an Array{Gray{Bool}, 2} and return it. For example, all the following usage should be permitted and doesn't create a new array IMO:

binarize!(out::AbstractArray{Bool}, img::AbstractArray{<:Number}, f)::AbstractArray{Bool}
binarize!(out::AbstractArray{Bool}, img::AbstractArray{<:Colorant}, f)::AbstractArray{Bool}
binarize!(out::AbstractArray{Gray{Bool}}, img::AbstractArray{<:Number}, f)::AbstractArray{Gray{Bool}}
binarize!(out::AbstractArray{Gray{Bool}}, img::AbstractArray{<:Colorant}, f)::AbstractArray{Gray{Bool}}

I'm planning to do it in the coming commits.

Changes: * fixes a keywords constructor bug introduced by incorrect order it should be `AdaptiveThreshold(percentage, window_size)` instead of `AdaptiveThreshold(window_size, percentage)` * support n-D images with the usage of CartesianIndices * generalize the type annotation of percentage from `Int` to `Float64` * move argument validation of AdaptiveThreshold to its inner constructor * correct the result of recommend_size from `floor` to `round` -- the closest integer

johnnychen94 · 2019-07-22T20:09:35Z

Update:

This PR is ready to be reviewed.

Actually, n-D array isn't supported yet since boxdiff is limited to 2D array.

@zygmuntszpak Regardless of the window_size issue, can you have a look at the style of docstring and test case? If you're okay with it I can merge it and refactor other algorithms in the same way. Then we can create another PR for window_size if you like it.

About docstring, I treat it as a cheat sheet explaining how to use it and what we can expect on its output, instead of the full explanation on its theoretical and coding details. Unfortunately, I find many of your docstrings are too long to read and understand even though they're written in good quality.

codecov · 2019-07-22T21:32:52Z

Codecov Report

Merging #30 into api will increase coverage by 20.86%.
The diff coverage is 93.33%.

@@             Coverage Diff             @@
##              api      #30       +/-   ##
===========================================
+ Coverage   16.56%   37.42%   +20.86%     
===========================================
  Files          19       20        +1     
  Lines         163      171        +8     
===========================================
+ Hits           27       64       +37     
+ Misses        136      107       -29

Impacted Files	Coverage Δ
src/ImageBinarization.jl	`100% <ø> (ø)`	⬆️
src/adaptive_threshold.jl	`100% <100%> (+100%)`	⬆️
src/deprecations.jl	`66.66% <100%> (+66.66%)`	⬆️
src/compat.jl	`33.33% <33.33%> (ø)`
src/integral_image.jl	`75% <0%> (+6.25%)`	⬆️
src/BinarizationAPI/binarize.jl	`100% <0%> (+100%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca471b8...8e1a2f4. Read the comment docs.

zygmuntszpak · 2019-07-23T07:41:59Z

Also, I think manually calling recommend_size is tedious and trivial. The following line makes users no longer need to call it by themselves.

The recommend_size function was only meant to be a convenience and a rule-of-thumb. There may be many instances where you don't want to use the recommended size, so we should still permit the user to specify a manual size. If I understood correctly your proposed solution will always use the recommend_size function?

Perhaps one possible path is to let window_size be a function

Base.@kwdef struct AdaptiveThreshold{T <: Function} <: AbstractImageBinarizationAlgorithm
           percentage::Float64 = 15.0
           window_size::T = recommend_size
end

and then have a constructor which takes a number and creates an anonymous function that returns the user-specified size.

AdaptiveThreshold(percentage::Number, window_size::Number) = AdaptiveThreshold(percentage, x-> window_size)

johnnychen94 · 2019-07-23T07:47:19Z

If I understood correctly your proposed solution will always use the recommend_size function?

No. The signature of binarize for AdaptiveThreshold is:

	binarize(img, f [, window_size])

and usage examples are:

img = testimage("lena")
f = AdaptiveThreshold()
binarize(img, f) # infer window_size according to img

binarize(img, f, 16) # explicitly provide window_size

zygmuntszpak · 2019-07-23T08:12:45Z

About docstring, I treat it as a cheat sheet explaining how to use it and what we can expect on its output, instead of the full explanation on its theoretical and coding details. Unfortunately, I find many of your docstrings are too long to read and understand even though they're written in good quality.

I have a different philosophy. I don't intend for the docstrings to be a cheat sheet, but rather an explanation of what the algorithm does, what assumptions it makes, what effect different options have etc.
I'm not suggesting that the documentation that we have written attains all of those goals, but I would rather move towards that goal than away from it. I believe that the theoretical details are also useful if one wants to dig into the actual code and follow the implementation. I have kept the structure of the documentation headings in this package the same as in my other package: https://zygmuntszpak.github.io/ImageContrastAdjustment.jl/dev/ because I want to create a consistent look and feel.

The detailed explanation is meant to go in the section "Details" so that anyone that is not interested can just skip it. I understand that this causes a lot of scrolling if you consult the documentation from the REPL. I read the documentation predominantly in a browser, in Jupyter or in Atom, so I am prioritizing that user experience.

The headings "Options" are supposed to give you a "cheatsheet" of what you can fiddle with etc. I suppose that often the options can be written in more "bullet" point style. Certainly the ones in this package can be written in bullet points. There will be cases, however, where an explanation of an algorithm option may not fit as a single sentence in a bullet point. I had therefore opted to write everything as a sentence in order not to have to mix styles. We could use bullet points in this package instead, but I would still like to keep the top-level headings: Output, Details, Options etc in accordance with the other package. The inspiration for detailed documentation comes from Mathematica documentation, e.g.: https://reference.wolfram.com/language/ref/ImagePyramid.html

Changes: * roll back to the previous docstring style * `window_size` is no longer a field of `AdaptiveThreshold`, instead it's a keyword argument in `binarize` * there's no need to export `recommend_size` since it's automatically called if not specified

johnnychen94 · 2019-07-23T12:41:15Z

Update:

roll back to the previous docstring style

Deprecations:

window_size is no longer a field of AdaptiveThreshold, instead
it's a keyword argument in binarize
there's no need to export recommend_size since it's automatically
called if not specified. So I'd like to unexport it in the future.

Any further comment?

zygmuntszpak · 2019-07-24T03:38:44Z

Project.toml

@@ -11,6 +11,7 @@ HistogramThresholding = "2c695a8d-9458-5d45-9878-1b8a99cf7853"
 ImageContrastAdjustment = "f332f351-ec65-5f6a-b3d1-319c6670881a"
 ImageCore = "a09fc81d-aa75-5fe9-8630-4744c3626534"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
+MappedArrays = "dbb5928d-eab1-5f90-85c2-b9b0edb7c900"


Where do we use this package?

of_eltype(Gray, img) performs almost the same as Gray.(img) except that it doesn't allocate additional memory.

julia> @benchmark of_eltype(Gray, img) BenchmarkTools.Trial: memory estimate: 16 bytes allocs estimate: 1 -------------- minimum time: 29.858 ns (0.00% GC) median time: 31.279 ns (0.00% GC) mean time: 37.766 ns (14.93% GC) maximum time: 40.667 μs (99.91% GC) -------------- samples: 10000 evals/sample: 994 julia> @benchmark Gray.(img) BenchmarkTools.Trial: memory estimate: 508.19 KiB allocs estimate: 4 -------------- minimum time: 31.353 μs (0.00% GC) median time: 38.761 μs (0.00% GC) mean time: 50.567 μs (22.40% GC) maximum time: 42.538 ms (99.85% GC) -------------- samples: 10000 evals/sample: 1

zygmuntszpak · 2019-07-24T03:41:40Z

test/adaptive_threshold.jl

+        type_list = generate_test_types([Float32, N0f8], [Gray])
+        for T in type_list
+            img = T.(img_gray)
+            @test_reference "References/AdaptiveThreshold_Gray.png" Gray.(binarize(img, f))


I didn't know there was such a thing as @test_reference. Its a great macro! However, I can't see where you added using ReferenceTests and if you have added it the package the Project.toml ?

This is a PR to the api branch, and I've updated some codes there so it's not shown in diff. ReferenceTests is one of them. Spliting them apart make it easier to review.

zygmuntszpak · 2019-07-24T03:42:56Z

Its looking great, thank you very much Johnny!

johnnychen94 · 2019-07-24T23:51:14Z

As the very first PR in #29, I'm merging this now. The future PRs will be created parallelly using this as a template.

* move implementations to AdaptiveThreshold functor * Rewrite AdaptiveThreshold with CartesianIndices to support n-D images * update and simplify the docstring * enhance the test codes and fix several bugs * add CartetianIndex compat to Julia 1.0 * deprecate `window_size` and `recommend_size`

Changes: * refactor the codebase using the functor API discussed in #26 * enhance the API by introducing a submodule `BinarizationAPI` * add in-place function `binarize!` * support Color3 inputs * add more test codes * slightly enhance the documentation Breaking changes (Deprecated in 0.3): * swap the argument order discussed in #23 ( d1f8309) * unexport `recommend_size` in favor of #41, i.e., `AdaptiveThreshold(img)` instead of `recommend_size(img)` ( PR: #30 #45) * made `window_size` of `AdaptiveThreshold` not an optional argument. ( PR: #45 )

johnnychen94 mentioned this pull request Jul 21, 2019

[WIP] refactor codebase with functor APIs #29

Merged

16 tasks

johnnychen94 added 6 commits July 23, 2019 03:19

fix syntax error and test utils.jl

3678ab3

move implementations to AdaptiveThreshold functor

4f9b2bd

update and simplify the docstring

1d70211

enhance the test codes and fix several bugs

9a2430c

Merge branch 'api' into adaptive_threshold

eb5cf16

add CartetianIndex compat to Julia 1.0

4c17767

johnnychen94 changed the title ~~[WIP] refactor and enhance AdaptiveThreshold method~~ refactor and enhance AdaptiveThreshold method Jul 22, 2019

johnnychen94 added 2 commits July 23, 2019 19:09

Merge branch 'api' into adaptive_threshold

f11bd38

zygmuntszpak reviewed Jul 24, 2019

View reviewed changes

add Argument section to docstring

8e1a2f4

johnnychen94 merged commit 5ab30c7 into JuliaImages:api Jul 24, 2019

johnnychen94 deleted the adaptive_threshold branch July 24, 2019 23:52

This was referenced Jul 26, 2019

RFC: a new API that plans algorithm with prior information before binarize #41

Closed

replace recommend_size with function AdaptiveThreshold (closes #41) #45

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor and enhance AdaptiveThreshold method #30

refactor and enhance AdaptiveThreshold method #30

johnnychen94 commented Jul 21, 2019 •

edited

Loading

zygmuntszpak commented Jul 22, 2019

zygmuntszpak commented Jul 22, 2019

johnnychen94 commented Jul 22, 2019 •

edited

Loading

johnnychen94 commented Jul 22, 2019 •

edited

Loading

johnnychen94 commented Jul 22, 2019 •

edited

Loading

codecov bot commented Jul 22, 2019 •

edited

Loading

zygmuntszpak commented Jul 23, 2019

johnnychen94 commented Jul 23, 2019 •

edited

Loading

zygmuntszpak commented Jul 23, 2019

johnnychen94 commented Jul 23, 2019 •

edited

Loading

zygmuntszpak Jul 24, 2019

johnnychen94 Jul 24, 2019

zygmuntszpak Jul 24, 2019

johnnychen94 Jul 24, 2019 •

edited

Loading

zygmuntszpak commented Jul 24, 2019

johnnychen94 commented Jul 24, 2019

refactor and enhance AdaptiveThreshold method #30

refactor and enhance AdaptiveThreshold method #30

Conversation

johnnychen94 commented Jul 21, 2019 • edited Loading

zygmuntszpak commented Jul 22, 2019

zygmuntszpak commented Jul 22, 2019

johnnychen94 commented Jul 22, 2019 • edited Loading

johnnychen94 commented Jul 22, 2019 • edited Loading

johnnychen94 commented Jul 22, 2019 • edited Loading

codecov bot commented Jul 22, 2019 • edited Loading

Codecov Report

zygmuntszpak commented Jul 23, 2019

johnnychen94 commented Jul 23, 2019 • edited Loading

zygmuntszpak commented Jul 23, 2019

johnnychen94 commented Jul 23, 2019 • edited Loading

zygmuntszpak Jul 24, 2019

Choose a reason for hiding this comment

johnnychen94 Jul 24, 2019

Choose a reason for hiding this comment

zygmuntszpak Jul 24, 2019

Choose a reason for hiding this comment

johnnychen94 Jul 24, 2019 • edited Loading

Choose a reason for hiding this comment

zygmuntszpak commented Jul 24, 2019

johnnychen94 commented Jul 24, 2019

johnnychen94 commented Jul 21, 2019 •

edited

Loading

johnnychen94 commented Jul 22, 2019 •

edited

Loading

johnnychen94 commented Jul 22, 2019 •

edited

Loading

johnnychen94 commented Jul 22, 2019 •

edited

Loading

codecov bot commented Jul 22, 2019 •

edited

Loading

johnnychen94 commented Jul 23, 2019 •

edited

Loading

johnnychen94 commented Jul 23, 2019 •

edited

Loading

johnnychen94 Jul 24, 2019 •

edited

Loading