Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Population must be a sequence. For dicts or sets, use sorted(d). in line 83" [Python 3.11 compatibility?] #154

Open
prototaxites opened this issue Feb 27, 2023 · 4 comments
Assignees
Labels

Comments

@prototaxites
Copy link

Trying to run CAMISIM to generate a very small test metagenome data set with a mixture of eukaryotic and prokaryotic genomes to test a pipeline with, following the example in the usage guide. I am getting the following error:

2023-02-27 10:28:52 INFO: [MetagenomeSimulationPipeline] Metagenome simulation starting
2023-02-27 10:28:52 INFO: [MetagenomeSimulationPipeline] Validating Genomes
2023-02-27 10:28:52 INFO: [MetadataReader] Reading file: '/nfshome/store04/users/b.jmd20jns/camisim/genome_to_id.tsv'
2023-02-27 10:28:53 INFO: [MetagenomeSimulationPipeline] Design Communities
2023-02-27 10:28:53 INFO: [CommunityDesign] Drawing strains.
2023-02-27 10:28:53 INFO: [MetadataReader 31395689975] Reading file: '/nfshome/store04/users/b.jmd20jns/camisim/metadata.tsv'
2023-02-27 10:28:53 ERROR: [MetagenomeSimulationPipeline] Population must be a sequence.  For dicts or sets, use sorted(d). in line 83
2023-02-27 10:28:53 INFO: [MetagenomeSimulationPipeline] Metagenome simulation aborted

Any idea what's going on and how to fix it?

metadata.tsv:

genome_ID	OTU	NCBI_ID	novelty_category
Pseudomicrostroma_glucosiphilum	1	1684307	known_strain
Aureobasidium_pullulans	2	 5580	known_strain
Anaeromicropila_populeti	3	 37658	known_strain
Bacillus_subtilis	4	 1423	known_strain
Erwinia_billingiae	5	 182337	known_strain
Frondihabitans_PhB188	6	2485200	known_strain
Pseudarthrobacter_scleromae	7	158897	known_strain
Pseudomonas_fluorescens	8	294	known_strain
Variovorax_boronicumulans	9	436515	known_strain

genome_to_id.tsv

Pseudomicrostroma_glucosiphilum	genomes/GCA_003144135.1_Rhodsp1_genomic.fna
Aureobasidium_pullulans	genomes/GCA_000721785.1_Aureobasidium_pullulans_var._pullulans_EXF-150_assembly_version_1.0_genomic.fna
Anaeromicropila_populeti	genomes/GCA_900112775.1_IMG-taxon_2599185221_annotated_assembly_genomic.fna
Bacillus_subtilis	genomes/GCA_000009045.1_ASM904v1_genomic.fna
Erwinia_billingiae	genomes/GCA_000196615.1_ASM19661v1_genomic.fna
Frondihabitans_PhB188	genomes/GCA_003752365.1_ASM375236v1_genomic.fna
Pseudarthrobacter_scleromae	genomes/GCA_014644515.1_ASM1464451v1_genomic.fna
Pseudomonas_fluorescens	genomes/GCA_900215245.1_IMG-taxon_2617270901_annotated_assembly_genomic.fna
Variovorax_boronicumulans	genomes/GCA_009811375.1_ASM981137v1_genomic.fna
@AlphaSquad AlphaSquad added the bug label Feb 27, 2023
@AlphaSquad AlphaSquad self-assigned this Feb 27, 2023
@AlphaSquad AlphaSquad changed the title "Population must be a sequence. For dicts or sets, use sorted(d). in line 83" "Population must be a sequence. For dicts or sets, use sorted(d). in line 83" [Python 3.11 compatibility?] Feb 27, 2023
@AlphaSquad
Copy link
Collaborator

AlphaSquad commented Feb 27, 2023

Hey, thanks for bringing this to my attention. Are you by any chance using python>=3.11? Python 3.11 removed the automatic conversion of sets to lists as population of random samples and there is one instance of CAMISIM using the keys of a dict for random sampling.
For compatibility with Python 3.11 there are two changes which need to be performed for CAMISIM to run:

  1. In scripts/configparserwrapper.py line 5: from collections import Iterable needs to be changed to from collections.abc import Iterable (since CAMISIM does not run without that change I assume you already did this?)
  2. In scripts/StrainSelector/strainselector.py line 253: for otu_id in random.sample(self._otu_list.keys(), len(self._otu_list)): to for otu_id in random.sample(list(self._otu_list.keys()), len(self._otu_list)): making the conversion explicit.

After this, CAMISIM runs on my end. I have not pushed these changes since I want to check that it keeps everything else intact and to ensure backward compatibility, but it should let you run CAMISIM.
If you are not using Python 3.11 then I am sorry and will have to check things again, in the meantime I changed the title so other people using it can find the solution in this Issue.

@prototaxites
Copy link
Author

Hey, thanks for the very quick reply! Yes, I was using Python 3.11 (though I'm currently spinning up a 3.9 conda environment). I did figure out the first change but not the second - I'll see how I get on with the 3.9 environment in the first instance, but if that fails I'll give the above a go.

@prototaxites
Copy link
Author

Hi, Python 3.9 did the trick! For anyone else stumbling across this, the following conda environment works to run Camisim quite happily:

conda create -n camisim python=3.9 perl matplotlib-base numpy biopython biom-format scikit-learn configparser ete3 perl-xml-simple

@AlphaSquad
Copy link
Collaborator

Glad that it works, we tested CAMISIM mainly on Python 3.7. I hope that most of these environment and version problems will be solved once we move to CAMISIM2.0 (coming soon™)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants