Calculate gene similarity on the HPO #60

stefanucci-luca · 2020-11-02T10:50:26Z

Dear Kevin,

I would like to calculate the similarity for a few genes (~2000).
I annotated these genes with the HPO codes from the human phenotype ontology webpage (http://compbio.charite.de/jenkins/job/hpo.annotations/lastSuccessfulBuild/artifact/util/annotation/genes_to_phenotype.txt).

I obtained reshaped and got a file like this:

A4GALT	.	HP:0010970|HP:0000006
AAAS	.	HP:0040281|HP:0040282|HP:0040283|HP:0011463|HP:0001278|HP:0000972|HP:0012332|HP:0008259|HP:0004322|HP:0001251|HP:0000648|HP:0000007|HP:0002571|HP:0004319|HP:0001263|HP:0008163|HP:0001249|HP:0009916|HP:0003487|HP:0007002|HP:0000252|HP:0001347|HP:0000522|HP:0003676|HP:0000649|HP:0001324|HP:0000953|HP:0001260|HP:0000846|HP:0001250|HP:0007440|HP:0000505|HP:0000982|HP:0001761|HP:0010486|HP:0000830|HP:0007556|HP:0002093|HP:0001430|HP:0001252|HP:0002376|HP:0000612|HP:0000407
AASS	.	HP:0000119|HP:0000752|HP:0001083|HP:0001903|HP:0003593|HP:0001250|HP:0002161|HP:0000736|HP:0001252|HP:0100543|HP:0000007|HP:0001256|HP:0000750|HP:0001249
ABAT	.	HP:0025356|HP:0000278|HP:0000098|HP:0007291|HP:0000007|HP:0002415|HP:0001321|HP:0000494|HP:0001347|HP:0006829|HP:0001263|HP:0001274|HP:0001250|HP:0001254|HP:0025430|HP:0003819
ABCA4	.	HP:0040280|HP:0040281|HP:0040282|HP:0040283|HP:0040284|HP:0000006|HP:0007663|HP:0000662|HP:0001133|HP:0000608|HP:0000512|HP:0000543|HP:0000007|HP:0007737|HP:0007722|HP:0000510|HP:0007984|HP:0007843|HP:0000548|HP:0000580|HP:0000572|HP:0008035|HP:0000639|HP:0000618|HP:0000405|HP:0000603|HP:0000135|HP:0000493|HP:0000463|HP:0001249|HP:0007703|HP:0000613|HP:0000987|HP:0030329|HP:0000649|HP:0000648|HP:0000551|HP:0008046|HP:0000407|HP:0007704|HP:0007814|HP:0008736|HP:0000035|HP:0008002|HP:0007675|HP:0000431|HP:0000610|HP:0000518|HP:0000602|HP:0001513|HP:0008059|HP:0000501|HP:0000563|HP:0000842|HP:0030500|HP:0001347|HP:0000505|HP:0005978|HP:0011504|HP:0011462|HP:0011463|HP:0003621|HP:0007994
ABCB11	.	HP:0040283|HP:0000989|HP:0002014|HP:0003155|HP:0000952|HP:0001081|HP:0003593|HP:0001394|HP:0001744|HP:0001046|HP:0002240|HP:0002630|HP:0002908|HP:0000007|HP:0003819|HP:0004322|HP:0001508|HP:0001406|HP:0001402

which I think is the correct format for phenopy. I then used the command:

phenopy score gene_lists_with_HPO.txt --threads 12 --self

and I got as output something like this:

#query	entity_id	score
A4GALT	A4GALT	1.0
A4GALT	ABCD1	0.0
A4GALT	ACAT1	0.010405043493187662
A4GALT	ACVRL1	0.03336405048957507
A4GALT	ADGRG1	0.0
A4GALT	AGXT	0.009234121604447244
A4GALT	AKT1	0.003509945769583653
A4GALT	ALG1	0.0
A4GALT	AMER1	0.0

However, the identity for some genes are not 1 as I was expecting. For instance:

ABCB7 ABCB7 0.5558528984777618

Would you expect something like this? How would you explain it?
Should I use a different --summarization-method ?

Best regards,

Luca

The text was updated successfully, but these errors were encountered:

arvkevi · 2020-11-03T02:06:11Z

Hi Luca,

Thank you for checking out the repo. It looks like you have successfully run phenopy on your input files, that's great! The behavior you describe is expected. It's a property of the HRSS semantic similarity scoring algorithm. It's a way to scale similarity scores by rewarding nodes being compared further down the ontology. The way the algorithm is implemented here, even a phenotype-to-itself is only ever 1.0 by HRSS when the beta_ic is 0.0. This is the case in leaf nodes. Does this explanation help?

viktorzou · 2024-05-17T18:17:34Z

so how would i set a network-cutoff value then, if same terms might not result in 1.0? Also is there any possibility to introduce my own scores, if I have some frequency values attached to Phenotypes?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate gene similarity on the HPO #60

Calculate gene similarity on the HPO #60

stefanucci-luca commented Nov 2, 2020

arvkevi commented Nov 3, 2020

viktorzou commented May 17, 2024

Calculate gene similarity on the HPO #60

Calculate gene similarity on the HPO #60

Comments

stefanucci-luca commented Nov 2, 2020

arvkevi commented Nov 3, 2020

viktorzou commented May 17, 2024