Add other stats for low-order moments #2006

xwu99 · 2021-11-29T02:32:36Z

We are using oneDAL distr algos to optimize Spark ML. Some metrics are missing and Could you check if you can add the following stats in distributed low-order moments (basic stats) ?

count
numNonzeros
weightSum
normL1
normL2

Check for details: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/mllib/stat/MultivariateStatisticalSummary.html

makart19 · 2021-12-13T10:27:35Z

Clarification details per our discussion with Xiaochang:

Count: [Xiaochang]: User usually get several metrics instead of single one, it's convenient for them to get observations’ count from result along with other metrics. Otherwise, user needs extra coding effort.
numNonzeroes: [Xiaochang]: just count the number of non 0.0
weightSum: [Xiaochang]: there is a separate column called weight in Spark's dataframe for each row.
Need to investigate possibility of adding corresponding API into compute_inpute and compute_result.

Also, need to check how much adding all these metrics will affect performance of default case (when all metrics are calculated).

xwu99 · 2021-12-13T13:41:16Z

Thanks @makart19.
For weight column, Could also consider a general support for weighted points as a general feature for all algorithms, such as weighted points for kmeans etc. Check Spark's Kmeans, there is a optional weightCol to be set.
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.clustering.KMeans.html

makart19 · 2021-12-13T14:47:03Z

Ok, we will consider weights support for other algorithms

napetrov added the enhancement label Dec 9, 2021

xwu99 mentioned this issue Dec 13, 2021

[Low-Order Moments] Add other stats for low-order moments oap-project/oap-mllib#151

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add other stats for low-order moments #2006

Add other stats for low-order moments #2006

xwu99 commented Nov 29, 2021

makart19 commented Dec 13, 2021

xwu99 commented Dec 13, 2021

makart19 commented Dec 13, 2021

Add other stats for low-order moments #2006

Add other stats for low-order moments #2006

Comments

xwu99 commented Nov 29, 2021

makart19 commented Dec 13, 2021

xwu99 commented Dec 13, 2021

makart19 commented Dec 13, 2021