-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FR: Mann-Whitney-U test #182
base: developer
Are you sure you want to change the base?
Conversation
open FSharp.Stats | ||
open FSharp.Stats.Testing | ||
|
||
// TO DO: Bergmann et al. (2000) showed that there are different implementations of this test that lead to different results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if your computations are correct before PR can be merged. Additionally, check your results against established packages in R, SPSS or puplications rather than volatile wikipedia chapters.
// let abundance = // method for equal ranks instead of mean ranks when identical values occur. | ||
// sortedMerge | ||
// |> Array.map ( | ||
// fun v -> Array.filter (fun v2 -> v2 = v) sortedMerge | ||
// >> Array.length | ||
// ) | ||
// let myMap = sortedMerge |> Array.mapi (fun i x -> x, i + 2 - Array.item i abundance) |> Map // wrong: must return mean of ranksums with equal ranks, not always the same rank! | ||
// let rankedMerge = sortedMerge |> Array.map (fun (v,group) -> float myMap.[(v,group)],v,group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comments
let uTestTests = | ||
// taken from https://de.wikipedia.org/wiki/Wilcoxon-Mann-Whitney-Test#Beispiel | ||
let testList1 = | ||
([0;400;500;550;600;650;750;800;900;950;1000;1100;1200;1500;1600;1800;1900;2000;2200;3500 ],["M";"W";"M";"W";"M";"W";"M";"M";"W";"W";"M";"M";"W";"M";"W";"M";"M";"M";"M";"M"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add duplicates to check ranking procedure
let u1 = seq1Length * seq2Length + (seq1Length * (seq1Length + 1.) / 2.) - rankSumSeq1 | ||
let u2 = seq1Length * seq2Length + (seq2Length * (seq2Length + 1.) / 2.) - rankSumSeq2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temporary results should not be calculated twice
|> Array.filter (fun (rank,v,group') -> group' = group) | ||
|> Array.fold (fun state (rank,v,group') -> state + rank) 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with countBy
|> Array.filter (fun (rank,v,group') -> group' = group) | ||
|> Array.fold (fun state (rank,v,group') -> state + rank) 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with countBy
let sortedMerge = | ||
(seq1 |> Seq.map (fun v -> float v, 0), seq2 |> Seq.map (fun v -> float v, 1)) // 0 = first group; 1 = second group | ||
||> Seq.append | ||
|> Seq.sortByDescending (fun (v,groupIndex) -> v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary that the sequences are sorted? The ranking is order-independent and as I can see there is no need for sorting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check and correct the comments I suggested.
} | ||
let createUTest statistic : UTestTestStatistics = | ||
let cdf = Distributions.Continuous.Normal.CDF 0. 1. statistic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approximation via the z-Distribution is only valid for large samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the distribution is discrete, I cannot find any other (continuous) approximation than the one via normal distribution.
https://stats.stackexchange.com/a/251734
So, what do you suggest? Delete the whole test because of normal approximation might be too inaccurate at low sample sizes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should aim to implement the exact distribution for n+m<100 and the approximation for larger values
see #213 (comment)
Please reference the issue(s) this PR is related to
Tackles #117
Please list the changes introduced in this PR
Description
Mann-Whitney-U test added to the Testing portfolio. Corresponding unit test as well.