Necessity of SOAP normalization #140

cswann24 · 2024-04-27T16:27:56Z

cswann24
Apr 27, 2024

I am using the SOAP output of the DScribe library in combination with the sklearn library for kernel based methods. As provided in the documentation, the original definition of the SOAP kernel is a normalized polynomial kernel. As far as I understand, the output is not normalized by default in DScribe, which I was not initially aware of. Since the prediction errrors I obtain with the normalized and unnormalized output are comparable, I am now wondering if it is even necessary to stick to the original SOAP kernel definition. To be more precise: I am not using some implementation of the SOAP kernel, but directly use the SOAP output from the DScribe library as input for the (exponentiated) DotProduct kernel implemented in sklearn. I would appreciate any advice !

lauri-codes · 2024-05-13T16:23:57Z

lauri-codes
May 13, 2024
Maintainer

Good question.

As far as I'm aware it is completely fine to use un-normalized vectors. Even the choice of what kernel to use seems a bit arbitrary in many cases, and I don't see a particularly good reason to stick just with polynomial kernels, you could just as well use a Gaussian kernel.

DScribe was designed in a way that it only deals with producing high-dimensional vectors whose inner product is a meaningful measure of the similarity of two samples - it is up to you to then decide how to use these vectors. You most probably need to benchmark different methods and hyperparameters to find out what produces good results for your particular data.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Necessity of SOAP normalization #140

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Necessity of SOAP normalization #140

cswann24 Apr 27, 2024

Replies: 1 comment

lauri-codes May 13, 2024 Maintainer

cswann24
Apr 27, 2024

lauri-codes
May 13, 2024
Maintainer