Skip to content

Commit

Permalink
some progress
Browse files Browse the repository at this point in the history
  • Loading branch information
lskatz committed Oct 23, 2024
1 parent 56f4d71 commit f7728af
Show file tree
Hide file tree
Showing 6 changed files with 31 additions and 28 deletions.
2 changes: 2 additions & 0 deletions src/provenance/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,4 +66,6 @@ Add a sources field to the chromosomes.tsv from entrez search.

```bash
cat chromosomes.tsv | perl provenance.pl > sources.tsv
# or also to keep track of unknowns
cat chromosomes.insdc.tsv | perl provenance.pl | tee sources.tsv | grep UNKNOWN > unknown.tsv
```
9 changes: 8 additions & 1 deletion src/provenance/SME.acc
Original file line number Diff line number Diff line change
Expand Up @@ -83,4 +83,11 @@ Yersinia_enterocolitica AM286415 630 629 UNKNOWN
Yersinia_intermedia CP009801 631 629 UNKNOWN
Yersinia_kristensenii CP054049 631 629 UNKNOWN
Yersinia_massiliensis CP054048 33060 629 UNKNOWN
Yersinia_pseudotuberculosis CP009712 502800 633 UNKNOWN
Yersinia_pseudotuberculosis CP009712 502800 633 UNKNOWN


Neisseria_gonorrhoeae AE004969 485 482 UNKNOWN
Neisseria_meningitidis AE002098 487 482 UNKNOWN

# Gulvik
Leptospira_biflexa CP000777 355278 145259 UNKNOWN
2 changes: 1 addition & 1 deletion src/provenance/chromosomes.insdc.tsv
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
scientificName nuccoreAcc taxid parent
Acholeplasma_laidlawii CP000896 2148 2147
Acinetobacter_baumannii CU468230 509170 470
Acinetobacter_baumannii CP045110 509170 470
Acinetobacter_pittii CP002177 48296 909768
Aeromonas_hydrophila CP000462 644 642
Agrobacterium_fabrum AE007869 1176649 1183400
Expand Down
Binary file modified src/provenance/ncbi_ref.acc.gz
Binary file not shown.
40 changes: 20 additions & 20 deletions src/provenance/sources.tsv
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
scientificName nuccoreAcc taxid parent source
Acholeplasma_laidlawii CP000896 2148 2147 NCBI-REF
Acinetobacter_baumannii CU468230 509170 470 UNKNOWN
Acinetobacter_baumannii CP045110 509170 470 NCBI-REF
Acinetobacter_pittii CP002177 48296 909768 UNKNOWN
Aeromonas_hydrophila CP000462 644 642 UNKNOWN
Agrobacterium_fabrum AE007869 1176649 1183400 NCBI-REF
Expand Down Expand Up @@ -79,7 +79,7 @@ Clostridium_botulinum_groupI CP001078 9000005 1491 SME
Clostridium_botulinum_groupII CP000939 9000004 1491 SME
Clostridium_butyricum CP013239 1492 1485 SME
Corynebacterium_diphtheriae CP091095 1717 1716 UNKNOWN
Clostridium_perfringens CP000312 1502 1485 UNKNOWN
Clostridium_perfringens CP000312 1502 1485 SME
Corynebacterium_urealyticum AM942444 43771 1716 UNKNOWN
Corynebacterium_glutamicum BA000036 1718 1716 NCBI-REF
Cronobacter_condimenti CP012264 1163710 413496 UNKNOWN
Expand Down Expand Up @@ -151,19 +151,19 @@ Lactococcus_lactis AE005176 1358 1357 UNKNOWN
Lactobacillus_salivarius CP000233 1624 1578 UNKNOWN
Lactuca_sativa MK820672 75943 4235 UNKNOWN
Legionella_pneumophila AE017354 446 445 UNKNOWN
Leptospira_biflexa CP000777 355278 145259 UNKNOWN
Leptospira_biflexa CP000777 355278 145259 SME
Leptothrix_cholodnii CP001013 34029 88 NCBI-REF
Leptospira_interrogans AE010300 173 171 UNKNOWN
Leptospira_interrogans CP020414 173 171 FDA-ARGOS
Leuconostoc_citreum DQ489736 349519 33964 UNKNOWN
Listeria_grayi LR134483 1641 1637 NCTC3000
Listeria_innocua CP045743 1642 1637 UNKNOWN
Listeria_ivanovii FR687253 1638 1637 UNKNOWN
Listeria_innocua CP045743 1642 1637 SME
Listeria_ivanovii FR687253 1638 1637 SME
Listeria_marthii CM001047 529731 1637 NCBI-GEN:Life Technologies
Listeria_monocytogenes_I CP054040 9000000 1639 UNKNOWN
Listeria_monocytogenes_II CP054042 9000001 1639 UNKNOWN
Listeria_monocytogenes_III CP054039 9000002 1639 UNKNOWN
Listeria_monocytogenes_IV CP054041 9000003 1639 UNKNOWN
Listeria_seeligeri FN557490 1640 1637 UNKNOWN
Listeria_monocytogenes_I CP054040 9000000 1639 SME
Listeria_monocytogenes_II CP054042 9000001 1639 SME
Listeria_monocytogenes_III CP054039 9000002 1639 SME
Listeria_monocytogenes_IV CP054041 9000003 1639 SME
Listeria_seeligeri FN557490 1640 1637 SME
Listeria_welshimeri LT906444 1643 1637 NCTC3000
Lysinibacillus_sphaericus CP000817 444177 1421 UNKNOWN
Mesoplasma_florum AE017263 2151 46239 NCBI-REF
Expand All @@ -179,8 +179,8 @@ Mycobacterium_smegmatis CP000480 1772 1866885 UNKNOWN
Mycobacterium_tuberculosis AL123456 1773 77643 NCBI-REF
Mycoplasma_mycoides BX293980 2102 656088 UNKNOWN
Mycoplasma_pneumoniae U00089 2104 2093 UNKNOWN
Neisseria_gonorrhoeae AE004969 485 482 UNKNOWN
Neisseria_meningitidis AE002098 487 482 UNKNOWN
Neisseria_gonorrhoeae AE004969 485 482 SME
Neisseria_meningitidis AE002098 487 482 SME
Neomysis_japonica KR006340 1676841 223649 UNKNOWN
Ochrobactrum_anthropi CP000758 529 528 UNKNOWN
Ochrobactrum_anthropi CP000759 529 528 UNKNOWN
Expand Down Expand Up @@ -274,16 +274,16 @@ Vicia_faba KC189947 3906 3904 UNKNOWN
Xanthomonas_campestris AE008922 339 338 UNKNOWN
Xylella_fastidiosa CP000941 405440 2371 UNKNOWN
Yersinia_aldovae CP009781 29483 629 NCBI-REF
Yersinia_bercovieri CP054044 634 629 UNKNOWN
Yersinia_enterocolitica CP002246 630 629 UNKNOWN
Yersinia_bercovieri CP054044 634 629 SME
Yersinia_enterocolitica CP002246 630 629 SME
Yersinia_canariae CP043727 2607663 629 NCBI-REF
Yersinia_frederiksenii CP023962 29484 629 FDA-ARGOS
Yersinia_enterocolitica AM286415 630 629 UNKNOWN
Yersinia_enterocolitica AM286415 630 629 SME
Yersinia_hibernica CP032487 2339259 629 NCBI-REF
Yersinia_intermedia CP009801 631 629 UNKNOWN
Yersinia_kristensenii CP054049 631 629 UNKNOWN
Yersinia_massiliensis CP054048 33060 629 UNKNOWN
Yersinia_intermedia CP009801 631 629 SME
Yersinia_kristensenii CP054049 631 629 SME
Yersinia_massiliensis CP054048 33060 629 SME
Yersinia_mollaretii CP054043 33060 629 NCBI-REF
Yersinia_pseudotuberculosis CP009712 502800 633 UNKNOWN
Yersinia_pseudotuberculosis CP009712 502800 633 SME
Yersinia_rochesterensis CP032482 1604335 629 NCBI-REF
Yersinia_rohdei CP009787 29485 629 NCBI-REF
6 changes: 0 additions & 6 deletions src/provenance/unknown.tsv
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
Acinetobacter_baumannii CU468230 509170 470 UNKNOWN
Acinetobacter_pittii CP002177 48296 909768 UNKNOWN
Aeromonas_hydrophila CP000462 644 642 UNKNOWN
Amycolatopsis_mediterranei CP002000 33910 1813 UNKNOWN
Expand All @@ -22,7 +21,6 @@ Chlamydia_pneumoniae AE001363 83558 810 UNKNOWN
Caulobacter_vibrioides CP001340 155892 75 UNKNOWN
Chlamydomonas_reinhardtii AF008237 3055 3052 UNKNOWN
Corynebacterium_diphtheriae CP091095 1717 1716 UNKNOWN
Clostridium_perfringens CP000312 1502 1485 UNKNOWN
Corynebacterium_urealyticum AM942444 43771 1716 UNKNOWN
Cronobacter_condimenti CP012264 1163710 413496 UNKNOWN
Coxiella_burnetii AE016828 777 776 UNKNOWN
Expand Down Expand Up @@ -51,8 +49,6 @@ Lactococcus_lactis AE005176 1358 1357 UNKNOWN
Lactobacillus_salivarius CP000233 1624 1578 UNKNOWN
Lactuca_sativa MK820672 75943 4235 UNKNOWN
Legionella_pneumophila AE017354 446 445 UNKNOWN
Leptospira_biflexa CP000777 355278 145259 UNKNOWN
Leptospira_interrogans AE010300 173 171 UNKNOWN
Leuconostoc_citreum DQ489736 349519 33964 UNKNOWN
Lysinibacillus_sphaericus CP000817 444177 1421 UNKNOWN
Mesorhizobium_ciceri CP002447 39645 68287 UNKNOWN
Expand All @@ -64,8 +60,6 @@ Mycobacterium_leprae AL450380 1769 1763 UNKNOWN
Mycobacterium_smegmatis CP000480 1772 1866885 UNKNOWN
Mycoplasma_mycoides BX293980 2102 656088 UNKNOWN
Mycoplasma_pneumoniae U00089 2104 2093 UNKNOWN
Neisseria_gonorrhoeae AE004969 485 482 UNKNOWN
Neisseria_meningitidis AE002098 487 482 UNKNOWN
Neomysis_japonica KR006340 1676841 223649 UNKNOWN
Ochrobactrum_anthropi CP000758 529 528 UNKNOWN
Ochrobactrum_anthropi CP000759 529 528 UNKNOWN
Expand Down

0 comments on commit f7728af

Please sign in to comment.