Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are PHI-base 4 interaction IDs violating their own uniqueness rules? #14

Open
jseager7 opened this issue Aug 25, 2022 · 1 comment
Open
Labels
question Further information is requested

Comments

@jseager7
Copy link
Contributor

The PHI-base curator guidelines state the following about PHI-base accession IDs in PHI-base 4:

One number corresponding to one gene of one organism (even if there is more than
one interaction) from one paper.

My understanding is that there is a one-to-one mapping between a PHI-base accession ID (PHI ID) and the following triple of data types:

(PMID, Pathogen NCBI Taxonomy ID, UniProtKB accession number)

So, when grouping by the above data types, I would expect there to be one PHI ID for each group. Instead, there are over 100 groups where there is more than one PHI ID (a one-to-many mapping), as shown in the table below.

@martin2urban Is there some other criteria for assigning PHI IDs that I'm missing, or is the data below a result of an error in the logic used to assign the IDs?

PMID Pathogen ID Protein ID PHI ID
PMID:10383767 5693 D3JLB9 PHI:2574, PHI:2595
PMID:10807578 5476 Q9P8U9 PHI:162, PHI:485
PMID:11038529 5116 Q00580 PHI:49, PHI:770
PMID:11310744 40559 Q9UW03 PHI:202, PHI:1160
PMID:12514128 318829 Q875L7 PHI:322, PHI:7226
PMID:12744465 40559 Q9C2Y1 PHI:278, PHI:1028
PMID:12828637 317 Q79LY0 PHI:992, PHI:7237
PMID:15306011 13684 Q5J4D6 PHI:365, PHI:2267
PMID:15811992 5518 I1RF62 PHI:712, PHI:3991
PMID:16113260 28901 H9L495 PHI:607, PHI:613
PMID:16113260 28901 Q8Z4N6 PHI:609, PHI:616
PMID:16113260 28901 Q8Z6M8 PHI:610, PHI:623
PMID:16272431 5270 Q705V7 PHI:526, PHI:1071
PMID:16278459 5518 I1RHS7 PHI:723, PHI:727
PMID:16278459 5518 I1RSU2 PHI:724, PHI:730
PMID:16353549 5507 Q2KN79 PHI:522, PHI:2808
PMID:16593517 317 P11437 PHI:3372, PHI:3442
PMID:16622070 13684 Q1L2E2 PHI:595, PHI:2248
PMID:17020577 13684 A9Z1V6 PHI:1083, PHI:2272
PMID:17020577 13684 Q00LS5 PHI:1082, PHI:2271
PMID:17189344 318829 Q3Y5V5 PHI:1018, PHI:2042
PMID:17250832 29003 A0ST42 PHI:737, PHI:2329
PMID:17353894 318829 G4MQ72 PHI:774, PHI:792
PMID:17353894 318829 G4MRZ0 PHI:773, PHI:791
PMID:17353894 318829 G4MSX7 PHI:772, PHI:797
PMID:17353894 318829 G4NF05 PHI:776, PHI:802
PMID:17353894 318829 G5EH19 PHI:780, PHI:787, PHI:808
PMID:17353894 318829 O42622 PHI:775, PHI:794
PMID:17379549 1047171 A5H456 PHI:867, PHI:1159
PMID:17511023 1047171 F9XG32 PHI:838, PHI:839, PHI:840, PHI:841, PHI:842, PHI:843, PHI:844, PHI:845, PHI:846
PMID:17555268 85558 Q7BT38 PHI:1001, PHI:1003
PMID:17560817 5111 A3KLI8 PHI:1038, PHI:1039
PMID:17624327 5693 H2DQH1 PHI:2576, PHI:3501
PMID:17722701 5507 A8QJI7 PHI:1020, PHI:1022, PHI:2362
PMID:18034832 318829 G4NDE1 PHI:1017, PHI:2067
PMID:18705871 5507 A6N6J8 PHI:1021, PHI:2981
PMID:19161356 5507 J9MAX2 PHI:1107, PHI:1108
PMID:19454732 318829 C4B8B9 PHI:2137, PHI:3498, PHI:3499
PMID:19459949 1047171 F9XG32 PHI:1149, PHI:1150, PHI:1152, PHI:1153, PHI:1154, PHI:1155, PHI:1156, PHI:1157, PHI:1158
PMID:19520179 1047171 C5J0G7 PHI:1075, PHI:2126
PMID:19520179 1047171 C5MK57 PHI:1072, PHI:2124
PMID:19520179 1047171 C6K2F1 PHI:1074, PHI:2125
PMID:19520179 1047171 C6KEF4 PHI:1073, PHI:2123
PMID:19520179 1047171 C6KEF5 PHI:1076, PHI:2127
PMID:19909822 5518 I1RKF3 PHI:2418, PHI:2419
PMID:20153837 5270 D2EAX7 PHI:2582, PHI:2603
PMID:20447276 5518 I1RJS9 PHI:2326, PHI:2491
PMID:20601497 272952 Q4VKJ6 PHI:4253, PHI:4254, PHI:4255, PHI:4256, PHI:4257
PMID:20618707 5518 I1RM09 PHI:2325, PHI:2502
PMID:20675574 318829 G4MS03 PHI:2006, PHI:2008
PMID:22028654 5518 A0A098D1L0 PHI:1499, PHI:1501
PMID:22028654 5518 A0A098DDX5 PHI:1648, PHI:1650
PMID:22028654 5518 A0A098E396 PHI:1505, PHI:1624
PMID:22028654 5518 A0A1C3YJ08 PHI:1544, PHI:1545
PMID:22028654 5518 A0A1C3YMR5 PHI:1903, PHI:1905
PMID:22416226 5270 Q4P380 PHI:2586, PHI:3510
PMID:22827542 5507 H9C592 PHI:2590, PHI:2611
PMID:22835272 5599 F8R4Y0 PHI:2587, PHI:2608
PMID:22841690 5037 B2CQJ9 PHI:2588, PHI:2609
PMID:22902811 5599 I3QHH8 PHI:2585, PHI:2606
PMID:23211925 272952 G3C9S3 PHI:4766, PHI:4767
PMID:23734779 272952 G3C9N8 PHI:2946, PHI:4774
PMID:23734779 272952 G3C9Q9 PHI:2945, PHI:4775
PMID:23734779 272952 G3C9T3 PHI:2947, PHI:4773
PMID:23734779 272952 G3C9T8 PHI:2944, PHI:4772
PMID:23883358 100787 S6G070 PHI:3706, PHI:3707
PMID:23937726 552 D4HUY4 PHI:3674, PHI:3680
PMID:23937726 552 D4HUY5 PHI:3673, PHI:3679
PMID:23937726 552 D4HX89 PHI:3675, PHI:3681
PMID:23937726 552 D4I0C5 PHI:3676, PHI:3682
PMID:23937726 552 D4IAW2 PHI:3672, PHI:3678
PMID:23937726 552 Q9X3T0 PHI:3671, PHI:3677
PMID:24261846 31870 C9W7X1 PHI:3933, PHI:9080
PMID:24473076 5270 G0X840 PHI:3130, PHI:4051
PMID:24722578 5518 I1RA07 PHI:4209, PHI:4236
PMID:25166864 287 Q9HWS6 PHI:3211, PHI:3212
PMID:25299517 318829 G4MZS3 PHI:3316, PHI:5605
PMID:25299517 318829 G4NGB1 PHI:3311, PHI:5661
PMID:25299517 318829 G4NII8 PHI:3307, PHI:3313
PMID:26368514 305 Q8XPQ6 PHI:5129, PHI:5166
PMID:26368514 305 Q8XRK9 PHI:5121, PHI:5163
PMID:26368514 305 Q8XYB9 PHI:5141, PHI:5177
PMID:26368514 305 Q8XYE3 PHI:5143, PHI:5179
PMID:26368514 305 Q8XYF8 PHI:5133, PHI:5172
PMID:26368514 305 Q8Y164 PHI:5139, PHI:5175
PMID:26764912 106654 A0A2T7FJE6 PHI:5507, PHI:5522
PMID:27226300 777 Q83FB9 PHI:6350, PHI:6355
PMID:27322386 34373 N1J7E2 PHI:6352, PHI:6357
PMID:27322386 34373 N1JJH4 PHI:6351, PHI:6356
PMID:27613851 317 Q887C1 PHI:6727, PHI:6728
PMID:27911947 632 P17778 PHI:6824, PHI:6830
PMID:28715477 287 A0A0H2ZGI5 PHI:7297, PHI:7304
PMID:28970272 813 A0A0H3MCG4 PHI:10157, PHI:10158
PMID:29109173 28901 D0ZWU0 PHI:9831, PHI:9832, PHI:9833
PMID:29970468 1311 Q8E3H1 PHI:8189, PHI:8190
PMID:30042200 287 A0A0H2Z8M3 PHI:8252, PHI:8254
PMID:30370586 27334 A0A0A2JZB1 PHI:8722, PHI:8724
PMID:30379939 1314 D4QE70 PHI:8604, PHI:8605
PMID:30642903 1280 W8U4S5 PHI:10293, PHI:10292
PMID:30828283 347 A0A0K0GGA9 PHI:9012, PHI:9013
PMID:30828283 347 A0A0K0GHK1 PHI:9016, PHI:9017
PMID:30833360 1781 B2HE54 PHI:10154, PHI:10155
PMID:31802604 5270 A0A0D1E1M6 PHI:11413, PHI:11415
PMID:32678853 5476 Q5ANH2 PHI:10624, PHI:11159
PMID:33200669 318829 G4N713 PHI:10874, PHI:10925
PMID:33475797 746128 B0XMW7 PHI:11453, PHI:11454
PMID:34151378 287 Q9HX66 PHI:11622, PHI:11623
PMID:9100386 317 O08243 PHI:971, PHI:972
PMID:9724634 5693 Q94795 PHI:2580, PHI:2601
PMID:9768518 40559 O94100 PHI:103, PHI:1027
@jseager7 jseager7 added the question Further information is requested label Aug 25, 2022
@jseager7 jseager7 changed the title PHI-base 4 interaction IDs violating their own uniqueness rules? Are PHI-base 4 interaction IDs violating their own uniqueness rules? Aug 25, 2022
@martin2urban
Copy link
Member

PHI-base 4 is literature centric. This means if one gene is reported in two or more articles, than there could be many PHI-IDs per UniprotKB ID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants