Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnpicklingError using dill but not stdlib pickle #639

Closed
xzy3 opened this issue Jan 5, 2024 · 2 comments
Closed

UnpicklingError using dill but not stdlib pickle #639

xzy3 opened this issue Jan 5, 2024 · 2 comments
Labels
Milestone

Comments

@xzy3
Copy link

xzy3 commented Jan 5, 2024

dill version 0.3.7
centos stream
python 3.10.4

I've run into a situation where the standard library pickle is successful in serializing an object, but dill has a bug.

In [7]:  dill.loads(dill.dumps(iterable[0]))
---------------------------------------------------------------------------
UnpicklingError                           Traceback (most recent call last)
File <ipython-input-7-fc7606c36bd3>:1
----> 1 dill.loads(dill.dumps(iterable[0]))

File ~/.local/virtualenvs/stantz-2022-update/lib/python3.10/site-packages/dill/_dill.py:301, in loads(str, ignore, **kwds)
    290 """
    291 Unpickle an object from a string.
    292
   (...)
    298 Default values for keyword arguments can be set in :mod:`dill.settings`.
    299 """
    300 file = StringIO(str)
--> 301 return load(file, ignore, **kwds)

File ~/.local/virtualenvs/stantz-2022-update/lib/python3.10/site-packages/dill/_dill.py:287, in load(file, ignore, **kwds)
    281 def load(file, ignore=None, **kwds):
    282     """
    283     Unpickle an object from a file.
    284
    285     See :func:`loads` for keyword arguments.
    286     """
--> 287     return Unpickler(file, ignore=ignore, **kwds).load()

File ~/.local/virtualenvs/stantz-2022-update/lib/python3.10/site-packages/dill/_dill.py:442, in Unpickler.load(self)
    441 def load(self): #NOTE: if settings change, need to update attributes
--> 442     obj = StockUnpickler.load(self)
    443     if type(obj).__module__ == getattr(_main_module, '__name__', '__main__'):
    444         if not self._ignore:
    445             # point obj class to main

UnpicklingError: NEWOBJ class argument must be a type, not NoneType

In [8]:  import pickle

In [9]: pickle.loads(pickle.dumps(iterable[0]))
Out[9]: <cdcdvh.ghost.preprocess.Preprocess |CasperStrategy| out: |preprocess/GHOST_EP10.gh5|>

here is the data serialized by dill

b'\x80\x04\x95\xbf\x10\x00\x00\x00\x00\x00\x00\x8c\x17cdcdvh.ghost.preprocess\x94\x8c\nPreprocess\x94\x93\x94)\x81\x94}\x94(\x8c\x08seq_file\x94\x8c\x1acdcdvh.ghost.util.inputset\x94\x8c\x0cPairedEndSet\x94\x93\x94)\x81\x94}\x94(
\x8c\x05_open\x94\x8c\x17cdcdvh.ghost.util.files\x94\x8c\x0fopen_compressed\x94\x93\x94\x8c\x06format\x94\x8c\x05fastq\x94\x8c\x07r1_file\x94\x8ck/scicomp/groups-pure/OID/NCHHSTP/DVH/testdata/TrainingDataset_A/rawfiles/GHOST_EP10_S1_L001
_R1_001.fastq.gz\x94\x8c\x07r2_file\x94\x8ck/scicomp/groups-pure/OID/NCHHSTP/DVH/testdata/TrainingDataset_A/rawfiles/GHOST_EP10_S1_L001_R2_001.fastq.gz\x94ub\x8c\x0boutput_path\x94N)\x81\x94}\x94(\x8c\x04path\x94\x8c\x19preprocess/GHOST_
EP10.gh5\x94\x8c\x06kwargs\x94}\x94\x8c\x0bsample_name\x94\x8c\nGHOST_EP10\x94s\x8c\x04mode\x94\x8c\x01w\x94ub\x8c\x05clean\x94\x8c\x19cdcdvh.ghost.clean.casper\x94\x8c\x0eCasperStrategy\x94\x93\x94)\x81\x94}\x94(\x8c\x14min_major_propor
tion\x94N\x8c\x05steps\x94(\x8c\x1acdcdvh.ghost.util.seqtools\x94\x8c\x0efastq_id_match\x94\x93\x94)\x81\x94h(\x8c\x18drop_ambiguous_sequences\x94\x93\x94)\x81\x94}\x94(\x8c\x07maximum\x94G?\xef\xae\x14z\xe1G\xae\x8c\x06reason\x94\x8c\x1
2more than 0.99% Ns\x94ubh(\x8c\x0bphix_filter\x94\x93\x94)\x81\x94}\x94\x8c\x07ref_dir\x94\x8c_/scicomp/home-pure/xzy3/.cache/ghost_reference_db/compiled-refs/phix174-ref-gj0drr_e-BWA/bwa-db\x94sbh(\x8c\x14short_product_filter\x94\x93\x
94)\x81\x94}\x94(\x8c\rforward_regex\x94\x8c\x0cregex._regex\x94\x8c\x07compile\x94\x93\x94(\x8c+(GGATATGATGATGAACTGGT){s<=2,i<=1,d<=1,e<=3}\x94M 0C-,\x1e\x01\x01\x01\x1b\x00\x00\x01\x00\x01\x00\x02\x00\x03\x01\x01\x01\x03J\x04\x14GGATAT
GATGATGAACTGGT\x14\x14\x01\x94}\x94}\x94}\x94]\x94K\x00)K\x00K\x01t\x94R\x94\x8c\rreverse_regex\x94h@(\x8c-(ATGTGCCAGCTGCCGTTGGTGT){s<=2,i<=1,d<=1,e<=3}\x94M 0C/.\x1e\x01\x01\x01\x1b\x00\x00\x01\x00\x01\x00\x02\x00\x03\x01\x01\x01\x03J\x04\x16ATGTGCCAGCTGCCGTTGGTGT\x14\x14\x01\x94}\x94}\x94}\x94]\x94K\x00)K\x00K\x01t\x94R\x94\x8c\x08min_size\x94G@g\x19\x99\x99\x99\x99\x99ubh(\x8c\x16remove_short_sequences\x94\x93\x94)\x81\x94}\x94(hRG@g\x19\x99\x99\x99\x99\x99h1\x8c+se$uence is shorter than 184.79999999999998\x94ubh(\x8c\x13mid_distance_filter\x94\x93\x94)\x81\x94}\x94(\x8c\x08mid_list\x94]\x94(\x8c\nACGAGTGCGT\x94\x8c\nACGCTCGACA\x94\x8c\nAGACGCACTC\x94\x8c\nAGCACTGTAG\x94\x8c\nATCAGACACG\x94\x8c\nAT$TCGCGAG\x94\x8c\nCGTGTCTCTA\x94\x8c\nCTCGCGTGTC\x94\x8c\nTAGTATCAGC\x94\x8c\nTCTCTATGCG\x94\x8c\nTGATACGTCT\x94\x8c\nTACTGAGCTA\x94\x8c\nCATAGTAGTG\x94\x8c\nCGAGAGATAC\x94\x8c\nATACGACGTA\x94\x8c\nTCACGTACTA\x94\x8c\nCGTCTAGTAC\x94\x8c\$TCTACGTAGC\x94\x8c\nTGTACTACTC\x94\x8c\nACGACTACAG\x94\x8c\nCGTAGACTAG\x94\x8c\nTACGAGTATG\x94\x8c\nTACTCTCGTG\x94\x8c\nTAGAGACGAG\x94\x8c\nTCGTCGCTCG\x94\x8c\nACATACGCGT\x94\x8c\nACGCGAGTAT\x94\x8c\nACTACTATGT\x94\x8c\nACTGTACAGT\x94\x$c\nAGACTATACT\x94\x8c\nAGCGTCGTCT\x94\x8c\nAGTACGCTAT\x94\x8c\nATAGAGTACT\x94\x8c\nCACGCTACGT\x94\x8c\nCAGTAGACGT\x94\x8c\nCGACGTGACT\x94\x8c\nTACACACACT\x94\x8c\nTACACGTGAT\x94\x8c\nTACAGATCGT\x94\x8c\nTACGCTGTCT\x94\x8c\nTAGTGTAGAT\x9$\x8c\nTCGATCACGT\x94\x8c\nTCGCACTAGT\x94\x8c\nTCTAGCGACT\x94\x8c\nTCTATACTAT\x94\x8c\nTGACGTATGT\x94\x8c\nTGTGAGTAGT\x94\x8c\nACAGTATATA\x94\x8c\nACGCGATCGA\x94\x8c\nACTAGCAGTA\x94\x8c\nAGCTCACGTA\x94\x8c\nAGTATACATA\x94\x8c\nAGTCGAGAGA$x94\x8c\nAGTGCTACGA\x94\x8c\nCGATCGTATA\x94\x8c\nCGCAGTACGA\x94\x8c\nCGCGTATACA\x94\x8c\nCGTACAGTCA\x94\x8c\nCGTACTCAGA\x94\x8c\nCTACGCTCTA\x94\x8c\nCTATAGCGTA\x94\x8c\nTACGTCATCA\x94\x8c\nTAGTCGCATA\x94\x8c\nTATATATACA\x94\x8c\nTATGCTA$TA\x94\x8c\nTCACGCGAGA\x94\x8c\nTCGATAGTGA\x94\x8c\nTCGCTGCGTA\x94\x8c\nTCTGACGTCA\x94\x8c\nTGAGTCAGTA\x94\x8c\nTGTAGTGTGA\x94\x8c\nTGTCACACGA\x94\x8c\nTGTCGTCGCA\x94\x8c\nACACATACGC\x94\x8c\nACAGTCGTGC\x94\x8c\nACATGACGAC\x94\x8c\nACGA$AGCTC\x94\x8c\nACGTCTCATC\x94\x8c\nACTCATCTAC\x94\x8c\nACTCGCGCAC\x94\x8c\nAGAGCGTCAC\x94\x8c\nAGCGACTAGC\x94\x8c\nAGTAGTGATC\x94\x8c\nAGTGACACAC\x94\x8c\nAGTGTATGTC\x94\x8c\nATAGATAGAC\x94\x8c\nATATAGTCGC\x94\x8c\nATCTACTGAC\x94\x8c\nC$CGTAGATC\x94\x8c\nCACGTGTCGC\x94\x8c\nCATACTCTAC\x94\x8c\nCGACACTATC\x94\x8c\nCGAGACGCGC\x94\x8c\nCGTATGCGAC\x94\x8c\nCGTCGATCTC\x94\x8c\nCTACGACTGC\x94\x8c\nCTAGTCACTC\x94\x8c\nCTCTACGCTC\x94\x8c\nCTGTACATAC\x94\x8c\nTAGACTGCAC\x94\x8c$nTAGCGCGCGC\x94\x8c\nTAGCTCTATC\x94\x8c\nTATAGACATC\x94\x8c\nTATGATACGC\x94\x8c\nTCACTCATAC\x94\x8c\nTCATCGAGTC\x94\x8c\nTCGAGCTCTC\x94\x8c\nTCGCAGACAC\x94\x8c\nTCTGTCTCGC\x94\x8c\nTGAGTGACGC\x94\x8c\nTGATGTGTAC\x94\x8c\nTGCTATAGAC\x94\$8c\nTGCTCGCTAC\x94\x8c\nACGTGCAGCG\x94\x8c\nACTCACAGAG\x94\x8c\nAGACTCAGCG\x94\x8c\nAGAGAGTGTG\x94\x8c\nAGCTATCGCG\x94\x8c\nAGTCTGACTG\x94\x8c\nAGTGAGCTCG\x94\x8c\nATAGCTCTCG\x94\x8c\nATCACGTGCG\x94\x8c\nATCGTAGCAG\x94\x8c\nATCGTCTGTG\x$4\x8c\nATGTACGATG\x94\x8c\nATGTGTCTAG\x94\x8c\nCACACGATAG\x94\x8c\nCACTCGCACG\x94\x8c\nCAGACGTCTG\x94\x8c\nCAGTACTGCG\x94\x8c\nCGACAGCGAG\x94\x8c\nCGATCTGTCG\x94\x8c\nCGCGTGCTAG\x94\x8c\nCGCTCGAGTG\x94\x8c\nCGTGATGACG\x94\x8c\nCTATGTACA$\x94\x8c\nCTCGATATAG\x94\x8c\nCTCGCACGCG\x94\x8c\nCTGCGTCACG\x94\x8c\nCTGTGCGTCG\x94\x8c\nTAGCATACTG\x94\x8c\nTATACATGTG\x94\x8c\nTATCACTCAG\x94\x8c\nTATCTGATAG\x94\x8c\nTCGTGACATG\x94\x8c\nTCTGATCGAG\x94\x8c\nTGACATCTCG\x94\x8c\nTGAGCT$GAG\x94\x8c\nTGATAGAGCG\x94\x8c\nTGCGTGTGCG\x94\x8c\nTGCTAGTCAG\x94\x8c\nTGTATCACAG\x94\x8c\nTGTGCGCGTG\x94e\x8c\x06metric\x94\x8c\x1acdcdvh.pyseqdist.cDistance\x94\x8c\redit_distance\x94\x93\x94\x8c\x07mid_len\x94K\n\x8c\x08max_dist\x9$K\x00\x8c\x12disable_mid_filter\x94\x89ubh(\x8c\x0fmost_common_mid\x94\x93\x94)\x81\x94}\x94(\x8c\x17contamination_threshold\x94G?\xd0\x00\x00\x00\x00\x00\x00h\xfd\x89ubh(\x8c\x11reservoir_sampler\x94\x93\x94)\x81\x94}\x94(\x8c\x04size\$94M NhRM\x88\x13ubh(\x8c\x12canonify_read_pair\x94\x93\x94)\x81\x94}\x94(h=hHhIhQ\x8c\x0camplicon_len\x94M\x08\x01ubh!\x8c\x06Casper\x94\x93\x94)\x81\x94}\x94(h1\x8c#casper too much mismatch in overlap\x94\x8c\x0equal_threshold\x94K\x0f$x8c\x11kmer_neighborhood\x94K\x08\x8c\x08kmer_len\x94K\x11\x8c\x12mismatch_threshold\x94G?\xa9\x99\x99\x99\x99\x99\x9a\x8c\x13minimum_overlap_len\x94K\n\x8c\x10max_assembly_len\x94M\x0e\x01ubh(\x8c\x19filter_nonsense_sequences\x94\x93\x$4)\x81\x94h(\x8c\x13collapse_haplotypes\x94\x93\x94h(\x8c\x08genotype\x94\x93\x94)\x81\x94}\x94j\x1d\x01\x00\x00\x8c#cdcdvh.ghost.genotyping.blasttyping\x94\x8c\nBlastTyper\x94\x93\x94)\x81\x94}\x94(\x8c\nblast_args\x94]\x94(\x8c\x03-db$x94\x8cq/scicomp/home-pure/xzy3/.cache/ghost_reference_db/compiled-refs/ghost-hcv-genotyping-5pasbb4g-GENOTYPING/blast-db\x94e\x8c\x11reference_version\x94\x8c(02b66e58e1e2586830018776c43c172b688e9514\x94\x8c\x13unmatched_threshold\x94J$\xff\xff\xffubsbt\x94h\x1a}\x94ub\x8c\x12alignment_strategy\x94\x8c\x11profile-and-align\x94h\x1a}\x94ub.'
@mmckerns
Copy link
Member

mmckerns commented Jan 6, 2024

Can you post code that reproduces the error you are seeing?
I tried a few guesses at what iterable is, and the code works as expected.

Python 3.10.13 (main, Aug 25 2023, 02:21:32) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> iterable = [0,1,2,3,4,5]
>>> dill.loads(dill.dumps(iterable[0]))
0
>>> iterable = 'GATTACA'
>>> dill.loads(dill.dumps(iterable[0]))
'G'

I'm going to assume what you are experiencing is a case where pickle is serializing something in iterable by reference, while dill is storing the same object's contents. A minimal example to reproduce the error you are seeing would enable me to test it out and potentially do something.

Can you also try running with dill.settings['byref'] = True, and alternately, with dill.settings['recurse'] = True?

@xzy3
Copy link
Author

xzy3 commented Jan 8, 2024

It's actually from uqfoundation's multiprocess Pool.imap_unordered adding a work unit to the queue. But I think I found the problem.

I had added some code quite a while ago to hack around dill issue #332. It is apparently not needed anymore and is causing this new issue now. I commented that code out while working on a minimal example it resolved things.

@xzy3 xzy3 closed this as completed Jan 8, 2024
@mmckerns mmckerns added this to the dill-0.3.8 milestone Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants