Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LeetCode] 1093. Statistics from a Large Sample #1093

Open
grandyang opened this issue May 30, 2019 · 0 comments
Open

[LeetCode] 1093. Statistics from a Large Sample #1093

grandyang opened this issue May 30, 2019 · 0 comments

Comments

@grandyang
Copy link
Owner

grandyang commented May 30, 2019

You are given a large sample of integers in the range [0, 255]. Since the sample is so large, it is represented by an array count where count[k] is the number of times that k appears in the sample.

Calculate the following statistics:

  • minimum: The minimum element in the sample.
  • maximum: The maximum element in the sample.
  • mean: The average of the sample, calculated as the total sum of all elements divided by the total number of elements.
  • median:
    • If the sample has an odd number of elements, then the median is the middle element once the sample is sorted.
    • If the sample has an even number of elements, then the median is the average of the two middle elements once the sample is sorted.
  • mode: The number that appears the most in the sample. It is guaranteed to be unique.

Return  the statistics of the sample as an array of floating-point numbers [minimum, maximum, mean, median, mode]. Answers within  10-5  of the actual answer will be accepted.

Example 1:

Input: count = [0,1,3,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Output: [1.00000,3.00000,2.37500,2.50000,3.00000]
Explanation: The sample represented by count is [1,2,2,2,3,3,3,3].
The minimum and maximum are 1 and 3 respectively.
The mean is (1+2+2+2+3+3+3+3) / 8 = 19 / 8 = 2.375.
Since the size of the sample is even, the median is the average of the two middle elements 2 and 3, which is 2.5.
The mode is 3 as it appears the most in the sample.

Example 2:

Input: count = [0,4,3,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Output: [1.00000,4.00000,2.18182,2.00000,1.00000]
Explanation: The sample represented by count is [1,1,1,1,2,2,2,3,3,4,4].
The minimum and maximum are 1 and 4 respectively.
The mean is (1+1+1+1+2+2+2+3+3+4+4) / 11 = 24 / 11 = 2.18181818... (for display purposes, the output shows the rounded number 2.18182).
Since the size of the sample is odd, the median is the middle element 2.
The mode is 1 as it appears the most in the sample.

Constraints:

  • count.length == 256
  • 0 <= count[i] <= 109
  • 1 <= sum(count) <= 109
  • The mode of the sample that count represents is unique.

这道题说是有很多在0到 255 中的整数,由于重复的数字太多了,所以这里采用的是统计每个数字出现的个数的方式,用数组 count 来表示,其中 count[i] 表示数字i出现的次数。现在让统计原始数组中的最大值,最小值,平均值,中位数,和众数。这里面的最大最小值很好求,最小值就是 count 数组中第一个不为0的位置,最大值就是 count 数组中最后一个不为0的位置。最小值 mn 初始化为 256,在遍历 count 数组的过程中,遇到不为0的数字时,若此时 mn 为 256,则更新为坐标i。最大值 mx 直接每次更新为值不为0的坐标i即可。平均值也好求,只要求出所有的数字之和,跟数字的个数相除就行了,注意由于数字之和可能很大,需要用 double 来表示。众数也不难求,只要找出 count 数组中的最大值,则其坐标就是众数。比较难就是中位数了,由于数组的个数可奇可偶,中位数的求法不同,这里为了统一,采用一个小 trick,比如数组 1,2,3 和 1,2,3,4,可以用坐标为 (n-1)/2 和 n/2 的两个数字求平均值得到,对于长度为奇数的数组,这两个坐标表示的是相同的数字。这里由于是统计数组,所以要找的两个位置是 (cnt+1)/2 和 (cnt+2)/2,其中 cnt 是所有数字的个数。再次遍历 count 数组,使用 cur 来累计当前经过的数字个数,若 cur 小于 first,且 cur 加上 count[i] 大于等于 first,说明当前数字i即为所求,加上其的一半到 median。同理,若 cur 小于 second,cur 加上 count[i] 大于等于 second,说明当前数字i即为所求,加上其的一半到 median 即可,参见代码如下:

class Solution {
public:
    vector<double> sampleStats(vector<int>& count) {
        double mn = 256, mx = 0, mean = 0, median = 0, sum = 0;
        int cnt = 0, mode = 0;
        for (int i = 0; i < count.size(); ++i) {
            if (count[i] == 0) continue;
            if (mn == 256) mn = i;
            mx = i;
            sum += (double)i * count[i];
            cnt += count[i];
            if (count[i] > count[mode]) mode = i;
        }
        mean = sum / cnt;
        int first = (cnt + 1) / 2, second = (cnt + 2) / 2, cur = 0;
        for (int i = 0; i < count.size(); ++i) {
            if (cur < first && cur + count[i] >= first) median += i / 2.0;
            if (cur < second && cur + count[i] >= second) median += i / 2.0;
            cur += count[i];
        }
        return {mn, mx, sum / cnt, median, (double)mode};
    }
};

Github 同步地址:

#1093

参考资料:

https://leetcode.com/problems/statistics-from-a-large-sample/

https://leetcode.com/problems/statistics-from-a-large-sample/discuss/317626/Python-Solution

https://leetcode.com/problems/statistics-from-a-large-sample/discuss/317857/Java-Simple-2-pass-code-w-comments-and-explanation.

LeetCode All in One 题目讲解汇总(持续更新中...)

@grandyang grandyang changed the title [LeetCode] 1093. Missing Problem [LeetCode] 1093. Statistics from a Large Sample May 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant