Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't read HydraHarpV2T3 header #35

Closed
Tomkaehst opened this issue Apr 6, 2019 · 5 comments
Closed

Can't read HydraHarpV2T3 header #35

Tomkaehst opened this issue Apr 6, 2019 · 5 comments

Comments

@Tomkaehst
Copy link

Tomkaehst commented Apr 6, 2019

Hello phconvert developers,

I'm trying to read a .ptu file from a PicoQuant HydraHarp2 (record type: 16843524) using load_ptu() and get this error in _ptu_read_tag()

~/miniconda3/lib/python3.6/site-packages/phconvert/pqreader.py in _ptu_read_tag(s, offset, tag_type_r)
    658     # Some tag types have additional data
    659     if tag['type'] == 'tyAnsiString':
--> 660         tag['data'] = s[offset: offset + tag['value']].rstrip(b'\0').decode()
    661         offset += tag['value']
    662     elif tag['type'] == 'tyFloat8Array':

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 32: invalid start byte

Using the readPTU script from PicoQuants Github page I had a similar error and resolved it by changing the encoding from utf-8 to utf-16. This did not help this time.

Does anyone know what might cause the issue?

Thanks in advance,
Tom

@Tomkaehst Tomkaehst changed the title Can't read HydraHarpT2V2 header Can't read HydraHarpV2T3 header Apr 6, 2019
@tritemio
Copy link
Contributor

tritemio commented Apr 7, 2019

@Tomkaehst, thanks for the report. Please provide the example data file so I can look into it.

@Tomkaehst
Copy link
Author

Tomkaehst commented Apr 7, 2019

Hi @tritemio ,

you can find an example file here: https://upload.uni-jena.de/data/5caa54291eaf20.02785906/Coumarin6_in_EtOH_2_1.ptu

In the meantime, I tried to comment out the ANSIString assignment to tag['data'] and everything now works as expected.
The decoding of the "File_Comment" tag seems to be the problem.

@tritemio
Copy link
Contributor

tritemio commented Apr 7, 2019

@Tomkaehst, right, the File_Comment contains this binary-encoded string:

b'LAS X 2.0.1.14392\r\n\r\nPinhole: 58.69 \xb5m\r\nObjective: HC FLUOTAR L 25.0 WATER\r\nImage Format: 512 x 512\r\nScan Speed: 100 Hz\r\nZoom: 1.4\r\nFrame Average: 100\r\nDirection: Unidirectional\r\n\r\nWLL\r\n LaserLine 488: 75.0\r\n Laser Shutter: Open\r\n\r\nLaser (WLL, WLL) On 70.0\r\nLaser (Argon, visible) Off 0.0\r\nLaser (IR, MP) On\r\nLaser (IR2, FSOPO) On\r\nMFP Filter: Substrate \r\nPolarization Filter: NF 488\r\nNotch Filter: Empty\r\nX1-Port: Mirror \r\nScan Mode: xyt\r\nZPosition: -1.60 \xb5m\r\nTime Cycle Count: 25 ; Cycle Time: 600.0 s ; Complete Time: 14916.0 s\r\nSpectral detection range\r\nSP PMT 1: 500...550nm \r\n\r\nFLIM Detector: Intern\r\nAcquisition Mode: Frame Repetition 100\r\n\x00'

This is not properly encoded in UTF-8. In fact, if you try to decode it as UTF8 you get the error you reported for byte 0xb5 in position 32.

The byte is printed as \xb5 in the string above and it clearly should be a μ.

We can ask python, what is the correct byte encoding for μ in UTF8:

>>> 'μ'.encode()
b'\xce\xbc'

Asking google I found this:

Unicode string:
  '\xb5'
UTF8 bytestring:
  b'\xc2\xb5'

And if I try to decode this in python:

>>> b'\xc2\xb5'.decode()
'µ'

this is a kind of slanted µ, (in the notebooks looks slanted but here on github no, so it is font-dependent).

Bottomline, I think PicoQuant here saved a broken string here... or maybe they are not using the UTF8 but some ancient encoding. Let me try, they are from Germany, so let's try latin1:

>>> print(s.rstrip(b'\0').decode('latin1'))
LAS X 2.0.1.14392

Pinhole: 58.69 µm
Objective: HC FLUOTAR L 25.0 WATER
Image Format: 512 x 512
Scan Speed: 100 Hz
Zoom: 1.4
Frame Average: 100
Direction: Unidirectional

WLL
 LaserLine 488: 75.0
 Laser Shutter: Open

Laser (WLL, WLL) On 70.0
Laser (Argon, visible) Off 0.0
Laser (IR, MP) On
Laser (IR2, FSOPO) On
MFP Filter: Substrate 
Polarization Filter: NF 488
Notch Filter: Empty
X1-Port: Mirror 
Scan Mode: xyt
ZPosition: -1.60 µm
Time Cycle Count: 25 ; Cycle Time: 600.0 s ; Complete Time: 14916.0 s
Spectral detection range
SP PMT 1: 500...550nm 

FLIM Detector: Intern
Acquisition Mode: Frame Repetition 100

Bingo, string decoded.

Bottomline: PQ uses here latin1 string encoding. I don't know if they use latin1 everywhere. Unless PQ confirms that they always use and continue to use latin1, I would put a try..except to first try UTF8 and falling back to latin1 on error.

@Tomkaehst
Copy link
Author

Thank you very much for the quick response @tritemio !

@tritemio
Copy link
Contributor

tritemio commented Apr 9, 2019

Closed by #36

@tritemio tritemio closed this as completed Apr 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants