Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with gzipped FASTQ generated by bcl2fastq #3

Closed
vaofford opened this issue Oct 26, 2021 · 3 comments
Closed

Error with gzipped FASTQ generated by bcl2fastq #3

vaofford opened this issue Oct 26, 2021 · 3 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@vaofford
Copy link
Contributor

vaofford commented Oct 26, 2021

Command:

pycroquet single-guide -g library.tsv -q oe_g2_g6_1_S1_R1_001.fastq.gz -s Sample_1 -o Sample1.pycroquet.tsv

Compressed FASTQ:

pycroquet_fastq_for_keiran/oe_g2_g6_1_S1_R1_001.fastq.gz

Error:

INFO: Number of duplicate guides: 0
INFO: Total unique guides: 120
INFO: uncompressed data (assume fastq)
Traceback (most recent call last):
  File "/opt/wsi-t78/venv/bin/pycroquet", line 33, in <module>
    sys.exit(load_entry_point('pycroquet==1.2.0', 'console_scripts', 'pycroquet')())
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/pycroquet-1.2.0-py3.9.egg/pycroquet/cli.py", line 216, in wrapper
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/pycroquet-1.2.0-py3.9.egg/pycroquet/cli.py", line 129, in wrapper
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/pycroquet-1.2.0-py3.9.egg/pycroquet/cli.py", line 236, in wrapper
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/pycroquet-1.2.0-py3.9.egg/pycroquet/cli.py", line 265, in single_guide
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/pycroquet-1.2.0-py3.9.egg/pycroquet/singleguide.py", line 63, in run
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/pycroquet-1.2.0-py3.9.egg/pycroquet/main.py", line 143, in process_reads
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/pycroquet-1.2.0-py3.9.egg/pycroquet/readparser.py", line 137, in parse_reads
  File "/opt/wsi-t78/venv/lib/python3.9/site-packages/pycroquet-1.2.0-py3.9.egg/pycroquet/readparser.py", line 269, in parse_fastq
  File "/usr/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Worked fine once FASTQ was uncompressed. This was unprocessed output from bcl2fastq version 2.20.0 (/software/CASM/modules/modulefiles/bcl2fastq/2.20.0). Was run as an interactive job on farm5.

@keiranmraine
Copy link
Contributor

It didn't recognise the file as compressed:

INFO: uncompressed data (assume fastq)

Can you run file pycroquet_fastq_for_keiran/oe_g2_g6_1_S1_R1_001.fastq.gz?

(I couldn't access the file you provided the full path for, but also removed the path as it exposed your internal username and data structures on the HPC).

@vaofford
Copy link
Contributor Author

vaofford commented Nov 1, 2021

That gives me:

pycroquet_fastq_for_keiran/oe_g2_g6_1_S1_R1_001.fastq.gz: gzip compressed data, extra field

@keiranmraine
Copy link
Contributor

Okay, so I've emulated what 1.2.0 does on the command line and I can seewhy it doesn't work:

$ module load pycroquet/1.2.0
$ pycroquet-shell 
Singularity> python3
Python 3.9.7 (default, Sep  9 2021, 23:20:13) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> bool("gzip compressed data" in magic.from_file("oe_g2_g6_1_S1_R1_001.fastq.gz"))
False
>>> magic.from_file("oe_g2_g6_1_S1_R1_001.fastq.gz")
'Blocked GNU Zip Format (BGZF; gzip compatible), block length 15484'

While on older python/magic library I get:

$ python3
Python 3.7.4 (default, Aug 13 2019, 14:20:12) 
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> bool("gzip compressed data" in magic.from_file("oe_g2_g6_1_S1_R1_001.fastq.gz"))
True

Just needs an extra check to allow for gzip compatible.

@keiranmraine keiranmraine added bug Something isn't working enhancement New feature or request labels Nov 1, 2021
keiranmraine added a commit that referenced this issue Nov 1, 2021
Fix #3 issue with BGZIP files not being detected as gzip style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants