You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've run into a situation where the standard library pickle is successful in serializing an object, but dill has a bug.
In [7]: dill.loads(dill.dumps(iterable[0]))
---------------------------------------------------------------------------
UnpicklingError Traceback (most recent call last)
File <ipython-input-7-fc7606c36bd3>:1
----> 1 dill.loads(dill.dumps(iterable[0]))
File ~/.local/virtualenvs/stantz-2022-update/lib/python3.10/site-packages/dill/_dill.py:301, in loads(str, ignore, **kwds)
290 """
291 Unpickle an object from a string.
292
(...)
298 Default values for keyword arguments can be set in :mod:`dill.settings`.
299 """
300 file = StringIO(str)
--> 301 return load(file, ignore, **kwds)
File ~/.local/virtualenvs/stantz-2022-update/lib/python3.10/site-packages/dill/_dill.py:287, in load(file, ignore, **kwds)
281 def load(file, ignore=None, **kwds):
282 """
283 Unpickle an object from a file.
284
285 See :func:`loads` for keyword arguments.
286 """
--> 287 return Unpickler(file, ignore=ignore, **kwds).load()
File ~/.local/virtualenvs/stantz-2022-update/lib/python3.10/site-packages/dill/_dill.py:442, in Unpickler.load(self)
441 def load(self): #NOTE: if settings change, need to update attributes
--> 442 obj = StockUnpickler.load(self)
443 if type(obj).__module__ == getattr(_main_module, '__name__', '__main__'):
444 if not self._ignore:
445 # point obj class to main
UnpicklingError: NEWOBJ class argument must be a type, not NoneType
In [8]: import pickle
In [9]: pickle.loads(pickle.dumps(iterable[0]))
Out[9]: <cdcdvh.ghost.preprocess.Preprocess |CasperStrategy| out: |preprocess/GHOST_EP10.gh5|>
here is the data serialized by dill
b'\x80\x04\x95\xbf\x10\x00\x00\x00\x00\x00\x00\x8c\x17cdcdvh.ghost.preprocess\x94\x8c\nPreprocess\x94\x93\x94)\x81\x94}\x94(\x8c\x08seq_file\x94\x8c\x1acdcdvh.ghost.util.inputset\x94\x8c\x0cPairedEndSet\x94\x93\x94)\x81\x94}\x94(
\x8c\x05_open\x94\x8c\x17cdcdvh.ghost.util.files\x94\x8c\x0fopen_compressed\x94\x93\x94\x8c\x06format\x94\x8c\x05fastq\x94\x8c\x07r1_file\x94\x8ck/scicomp/groups-pure/OID/NCHHSTP/DVH/testdata/TrainingDataset_A/rawfiles/GHOST_EP10_S1_L001
_R1_001.fastq.gz\x94\x8c\x07r2_file\x94\x8ck/scicomp/groups-pure/OID/NCHHSTP/DVH/testdata/TrainingDataset_A/rawfiles/GHOST_EP10_S1_L001_R2_001.fastq.gz\x94ub\x8c\x0boutput_path\x94N)\x81\x94}\x94(\x8c\x04path\x94\x8c\x19preprocess/GHOST_
EP10.gh5\x94\x8c\x06kwargs\x94}\x94\x8c\x0bsample_name\x94\x8c\nGHOST_EP10\x94s\x8c\x04mode\x94\x8c\x01w\x94ub\x8c\x05clean\x94\x8c\x19cdcdvh.ghost.clean.casper\x94\x8c\x0eCasperStrategy\x94\x93\x94)\x81\x94}\x94(\x8c\x14min_major_propor
tion\x94N\x8c\x05steps\x94(\x8c\x1acdcdvh.ghost.util.seqtools\x94\x8c\x0efastq_id_match\x94\x93\x94)\x81\x94h(\x8c\x18drop_ambiguous_sequences\x94\x93\x94)\x81\x94}\x94(\x8c\x07maximum\x94G?\xef\xae\x14z\xe1G\xae\x8c\x06reason\x94\x8c\x1
2more than 0.99% Ns\x94ubh(\x8c\x0bphix_filter\x94\x93\x94)\x81\x94}\x94\x8c\x07ref_dir\x94\x8c_/scicomp/home-pure/xzy3/.cache/ghost_reference_db/compiled-refs/phix174-ref-gj0drr_e-BWA/bwa-db\x94sbh(\x8c\x14short_product_filter\x94\x93\x
94)\x81\x94}\x94(\x8c\rforward_regex\x94\x8c\x0cregex._regex\x94\x8c\x07compile\x94\x93\x94(\x8c+(GGATATGATGATGAACTGGT){s<=2,i<=1,d<=1,e<=3}\x94M 0C-,\x1e\x01\x01\x01\x1b\x00\x00\x01\x00\x01\x00\x02\x00\x03\x01\x01\x01\x03J\x04\x14GGATAT
GATGATGAACTGGT\x14\x14\x01\x94}\x94}\x94}\x94]\x94K\x00)K\x00K\x01t\x94R\x94\x8c\rreverse_regex\x94h@(\x8c-(ATGTGCCAGCTGCCGTTGGTGT){s<=2,i<=1,d<=1,e<=3}\x94M 0C/.\x1e\x01\x01\x01\x1b\x00\x00\x01\x00\x01\x00\x02\x00\x03\x01\x01\x01\x03J\x04\x16ATGTGCCAGCTGCCGTTGGTGT\x14\x14\x01\x94}\x94}\x94}\x94]\x94K\x00)K\x00K\x01t\x94R\x94\x8c\x08min_size\x94G@g\x19\x99\x99\x99\x99\x99ubh(\x8c\x16remove_short_sequences\x94\x93\x94)\x81\x94}\x94(hRG@g\x19\x99\x99\x99\x99\x99h1\x8c+se$uence is shorter than 184.79999999999998\x94ubh(\x8c\x13mid_distance_filter\x94\x93\x94)\x81\x94}\x94(\x8c\x08mid_list\x94]\x94(\x8c\nACGAGTGCGT\x94\x8c\nACGCTCGACA\x94\x8c\nAGACGCACTC\x94\x8c\nAGCACTGTAG\x94\x8c\nATCAGACACG\x94\x8c\nAT$TCGCGAG\x94\x8c\nCGTGTCTCTA\x94\x8c\nCTCGCGTGTC\x94\x8c\nTAGTATCAGC\x94\x8c\nTCTCTATGCG\x94\x8c\nTGATACGTCT\x94\x8c\nTACTGAGCTA\x94\x8c\nCATAGTAGTG\x94\x8c\nCGAGAGATAC\x94\x8c\nATACGACGTA\x94\x8c\nTCACGTACTA\x94\x8c\nCGTCTAGTAC\x94\x8c\$TCTACGTAGC\x94\x8c\nTGTACTACTC\x94\x8c\nACGACTACAG\x94\x8c\nCGTAGACTAG\x94\x8c\nTACGAGTATG\x94\x8c\nTACTCTCGTG\x94\x8c\nTAGAGACGAG\x94\x8c\nTCGTCGCTCG\x94\x8c\nACATACGCGT\x94\x8c\nACGCGAGTAT\x94\x8c\nACTACTATGT\x94\x8c\nACTGTACAGT\x94\x$c\nAGACTATACT\x94\x8c\nAGCGTCGTCT\x94\x8c\nAGTACGCTAT\x94\x8c\nATAGAGTACT\x94\x8c\nCACGCTACGT\x94\x8c\nCAGTAGACGT\x94\x8c\nCGACGTGACT\x94\x8c\nTACACACACT\x94\x8c\nTACACGTGAT\x94\x8c\nTACAGATCGT\x94\x8c\nTACGCTGTCT\x94\x8c\nTAGTGTAGAT\x9$\x8c\nTCGATCACGT\x94\x8c\nTCGCACTAGT\x94\x8c\nTCTAGCGACT\x94\x8c\nTCTATACTAT\x94\x8c\nTGACGTATGT\x94\x8c\nTGTGAGTAGT\x94\x8c\nACAGTATATA\x94\x8c\nACGCGATCGA\x94\x8c\nACTAGCAGTA\x94\x8c\nAGCTCACGTA\x94\x8c\nAGTATACATA\x94\x8c\nAGTCGAGAGA$x94\x8c\nAGTGCTACGA\x94\x8c\nCGATCGTATA\x94\x8c\nCGCAGTACGA\x94\x8c\nCGCGTATACA\x94\x8c\nCGTACAGTCA\x94\x8c\nCGTACTCAGA\x94\x8c\nCTACGCTCTA\x94\x8c\nCTATAGCGTA\x94\x8c\nTACGTCATCA\x94\x8c\nTAGTCGCATA\x94\x8c\nTATATATACA\x94\x8c\nTATGCTA$TA\x94\x8c\nTCACGCGAGA\x94\x8c\nTCGATAGTGA\x94\x8c\nTCGCTGCGTA\x94\x8c\nTCTGACGTCA\x94\x8c\nTGAGTCAGTA\x94\x8c\nTGTAGTGTGA\x94\x8c\nTGTCACACGA\x94\x8c\nTGTCGTCGCA\x94\x8c\nACACATACGC\x94\x8c\nACAGTCGTGC\x94\x8c\nACATGACGAC\x94\x8c\nACGA$AGCTC\x94\x8c\nACGTCTCATC\x94\x8c\nACTCATCTAC\x94\x8c\nACTCGCGCAC\x94\x8c\nAGAGCGTCAC\x94\x8c\nAGCGACTAGC\x94\x8c\nAGTAGTGATC\x94\x8c\nAGTGACACAC\x94\x8c\nAGTGTATGTC\x94\x8c\nATAGATAGAC\x94\x8c\nATATAGTCGC\x94\x8c\nATCTACTGAC\x94\x8c\nC$CGTAGATC\x94\x8c\nCACGTGTCGC\x94\x8c\nCATACTCTAC\x94\x8c\nCGACACTATC\x94\x8c\nCGAGACGCGC\x94\x8c\nCGTATGCGAC\x94\x8c\nCGTCGATCTC\x94\x8c\nCTACGACTGC\x94\x8c\nCTAGTCACTC\x94\x8c\nCTCTACGCTC\x94\x8c\nCTGTACATAC\x94\x8c\nTAGACTGCAC\x94\x8c$nTAGCGCGCGC\x94\x8c\nTAGCTCTATC\x94\x8c\nTATAGACATC\x94\x8c\nTATGATACGC\x94\x8c\nTCACTCATAC\x94\x8c\nTCATCGAGTC\x94\x8c\nTCGAGCTCTC\x94\x8c\nTCGCAGACAC\x94\x8c\nTCTGTCTCGC\x94\x8c\nTGAGTGACGC\x94\x8c\nTGATGTGTAC\x94\x8c\nTGCTATAGAC\x94\$8c\nTGCTCGCTAC\x94\x8c\nACGTGCAGCG\x94\x8c\nACTCACAGAG\x94\x8c\nAGACTCAGCG\x94\x8c\nAGAGAGTGTG\x94\x8c\nAGCTATCGCG\x94\x8c\nAGTCTGACTG\x94\x8c\nAGTGAGCTCG\x94\x8c\nATAGCTCTCG\x94\x8c\nATCACGTGCG\x94\x8c\nATCGTAGCAG\x94\x8c\nATCGTCTGTG\x$4\x8c\nATGTACGATG\x94\x8c\nATGTGTCTAG\x94\x8c\nCACACGATAG\x94\x8c\nCACTCGCACG\x94\x8c\nCAGACGTCTG\x94\x8c\nCAGTACTGCG\x94\x8c\nCGACAGCGAG\x94\x8c\nCGATCTGTCG\x94\x8c\nCGCGTGCTAG\x94\x8c\nCGCTCGAGTG\x94\x8c\nCGTGATGACG\x94\x8c\nCTATGTACA$\x94\x8c\nCTCGATATAG\x94\x8c\nCTCGCACGCG\x94\x8c\nCTGCGTCACG\x94\x8c\nCTGTGCGTCG\x94\x8c\nTAGCATACTG\x94\x8c\nTATACATGTG\x94\x8c\nTATCACTCAG\x94\x8c\nTATCTGATAG\x94\x8c\nTCGTGACATG\x94\x8c\nTCTGATCGAG\x94\x8c\nTGACATCTCG\x94\x8c\nTGAGCT$GAG\x94\x8c\nTGATAGAGCG\x94\x8c\nTGCGTGTGCG\x94\x8c\nTGCTAGTCAG\x94\x8c\nTGTATCACAG\x94\x8c\nTGTGCGCGTG\x94e\x8c\x06metric\x94\x8c\x1acdcdvh.pyseqdist.cDistance\x94\x8c\redit_distance\x94\x93\x94\x8c\x07mid_len\x94K\n\x8c\x08max_dist\x9$K\x00\x8c\x12disable_mid_filter\x94\x89ubh(\x8c\x0fmost_common_mid\x94\x93\x94)\x81\x94}\x94(\x8c\x17contamination_threshold\x94G?\xd0\x00\x00\x00\x00\x00\x00h\xfd\x89ubh(\x8c\x11reservoir_sampler\x94\x93\x94)\x81\x94}\x94(\x8c\x04size\$94M NhRM\x88\x13ubh(\x8c\x12canonify_read_pair\x94\x93\x94)\x81\x94}\x94(h=hHhIhQ\x8c\x0camplicon_len\x94M\x08\x01ubh!\x8c\x06Casper\x94\x93\x94)\x81\x94}\x94(h1\x8c#casper too much mismatch in overlap\x94\x8c\x0equal_threshold\x94K\x0f$x8c\x11kmer_neighborhood\x94K\x08\x8c\x08kmer_len\x94K\x11\x8c\x12mismatch_threshold\x94G?\xa9\x99\x99\x99\x99\x99\x9a\x8c\x13minimum_overlap_len\x94K\n\x8c\x10max_assembly_len\x94M\x0e\x01ubh(\x8c\x19filter_nonsense_sequences\x94\x93\x$4)\x81\x94h(\x8c\x13collapse_haplotypes\x94\x93\x94h(\x8c\x08genotype\x94\x93\x94)\x81\x94}\x94j\x1d\x01\x00\x00\x8c#cdcdvh.ghost.genotyping.blasttyping\x94\x8c\nBlastTyper\x94\x93\x94)\x81\x94}\x94(\x8c\nblast_args\x94]\x94(\x8c\x03-db$x94\x8cq/scicomp/home-pure/xzy3/.cache/ghost_reference_db/compiled-refs/ghost-hcv-genotyping-5pasbb4g-GENOTYPING/blast-db\x94e\x8c\x11reference_version\x94\x8c(02b66e58e1e2586830018776c43c172b688e9514\x94\x8c\x13unmatched_threshold\x94J$\xff\xff\xffubsbt\x94h\x1a}\x94ub\x8c\x12alignment_strategy\x94\x8c\x11profile-and-align\x94h\x1a}\x94ub.'
The text was updated successfully, but these errors were encountered:
Can you post code that reproduces the error you are seeing?
I tried a few guesses at what iterable is, and the code works as expected.
Python 3.10.13 (main, Aug 25 2023, 02:21:32) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> iterable = [0,1,2,3,4,5]
>>> dill.loads(dill.dumps(iterable[0]))
0
>>> iterable = 'GATTACA'
>>> dill.loads(dill.dumps(iterable[0]))
'G'
I'm going to assume what you are experiencing is a case where pickle is serializing something in iterable by reference, while dill is storing the same object's contents. A minimal example to reproduce the error you are seeing would enable me to test it out and potentially do something.
Can you also try running with dill.settings['byref'] = True, and alternately, with dill.settings['recurse'] = True?
It's actually from uqfoundation's multiprocess Pool.imap_unordered adding a work unit to the queue. But I think I found the problem.
I had added some code quite a while ago to hack around dill issue #332. It is apparently not needed anymore and is causing this new issue now. I commented that code out while working on a minimal example it resolved things.
dill version 0.3.7
centos stream
python 3.10.4
I've run into a situation where the standard library pickle is successful in serializing an object, but dill has a bug.
here is the data serialized by dill
The text was updated successfully, but these errors were encountered: