-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User group, tcspc offset and markers #46
base: 0.5.dev
Are you sure you want to change the base?
Conversation
About the interpretation of "detector ID": not all counts from a detector is a real photon, therefore it might be preferable to talk about detector pulse (or count). Regarding the /user/ group: the openendedness of the specification seems like an invitation to utter confusion. In a very extreme case, I can imagine someone dumping the raw data file as a byte array "for the record". This is of course unlikely to happen, but you get the drift. Consider replacing "he/she" by "she/they/he". |
How about rewording this section to: "Simply put a detector ID corresponds to an event that results from a pulse (real photon or background/afterpulse) at one of the detectors (pixels)." This way we don't ensure that it is a real photon, but also makes clear that if it is not, the instrument is "interpreting" it as a photon, as opposed to markers, which are most definitely not photons.
If you look at the phconvert notebooks that Antonio published, they actually use the /user group to store the header metadata. Would additional text discouraging "data dumping" be helpful? Or additional emphasis that the core interpretation of data should not require the /user group? It should more be a semi-structured place to store metadata that currently has no official place in the spec. |
Agreed with the wording, although I have reservations with introducing markers in 0.5. My bad, I did not dig into phconvert and missed Antonio's usage of an undocumented feature (I couldn't find any mention of that group in the docs or in the multispot photon-hdf5 files we released with the Methods paper for that matter). There probably should be guidelines in such a "no rule" section, but my preliminary thinking was, if anything, to impose some kind of json or similar string-based structure. If I understand what you are saying, Antonio just dumped the header of whatever file he was converting into this /user/ group (presumably as a string?). |
The way Antonio stored the metadata was not as a string, but rather in a series of separate arrays. Basically both Picoquant and B&H headers are dictionaries, and so Antonio just made fields names the same as the dictionary keys, and values equivalent to the values of the dictionary (so basically /user/picoquant or /user/becker_hickl would have a bunch of arrays, most of them single element, named after the keys in the header files). The main purpose of storing this metadata is that these headers often contain information about the offsets, powers etc used for the detectors lasers etc. None of this data is used by FRETBursts, but I think it is good to still store it as reference for an intrepid person to investigate just in case. For instance some B&H cards (like the one I'm using) can specify the TAC window in ways that phconvert still doesn't quite parse correctly (I've been studying, and still can't perfectly match what I know is the TAC window with the various fields in the B&H .set file) so sometimes I wind up with files where the nanotimes unit is wrong. Having the original metadata there is useful in case someone wants to check that the conversion processes went correctly. It shouldn't be the case, but it is a good check. |
I understand the points and I am not trying to be difficult, but just to instill the sense of caution that should be present in the mind of someone trying to develop a useful tool. You have an excitation_input_powers parameter in the setup group that would address the laser power point. My take on the other problem you are describing, is that if a photon-hdf5 file does not produce the results that the original creator is trying to achieve, the file is incorrect and should not be released by the original creator. Note that I am not against dumping header files from vendors but then my suggestion would be to call this that, for instance as a source_file_header field in the provenance group, not in a /user group open to all abuses :-). Note that since there is no guarantee vendors will stick to one format (assuming it can easily be converted for a given version), my preference would be to dump the raw header as a string, leaving it up to the motivated user to do whatever forensic analysis they wish with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are my comments; sorry for taking my time.
None of the changes I commented on are dealbreakers for me, only suggestions for (what I hope is) improvement, so counterarguments are welcome.
docs/phdata.rst
Outdated
.. _record_ids: | ||
|
||
Detector pixel IDs | ||
^^^^^^^^^^^^^^^^^^ | ||
Record IDs | ||
^^^^^^^^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HDF5 field names already use "detector" for all of these IDs (the /photon_data/detectors
array contains both photon and non-photon), so I wonder if it will be less confusing if the general name just remains "detector ID"; we could use "photon detector ID" and "non-photon detector ID" when only one kind is meant.
I understand it's a tradeoff between using terms that better describe the thing vs terms that better match the HDF5 fields.
(I'm also slightly uncomfortable with using "record" for this, because that word is often used for a single TTTR record (often 32 bits) -- a photon, marker(s), or counter overflow.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smXplorer @talaurence
What do you think? I have no particular attachment to the term markers or records.
So the choices would be
- Current pull request says, ie entries are called "records"
- Mark's suggestion, go back to the older terminology of detectors, but distinguish between "photon detector ID
and
non-photon detector ID" - We call these "events" so we will "event IDs" and we can refer to "photon events" and "non-photon events"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are discussing adding "markers/non-detector" flags and are wondering how to shove them into a array that was explicitly designed to store information on which detector a timestamp is coming from.
No surprise finding the proper terminology is confusing. :-)
When encountering such a problem, try to think out of the box.
My radical suggestion is to postpone introducing this to until after we have understood each other and weighted the pros and cons of adding this level of complexity (i.e. until 0.6).
But as an alternative to the above suggestion of trying to insert information of a different nature in an existing structure, I'll bring up the possibility to add an optional array of "markers" to the photon_data group(s).
If there is no markers (current files), there is no array.
If there are markers (future files), there is an array. It will mostly be filled with '0' (no marker), and those entry that are non '0' will be associated with a dummy "detector" entry (i.e. any value works as it will be ignored, but one could use the convention that 2^16-1 will be the equivalent of an integer NaN), since there is a priori no reason to have any specific detector associated with such a marker.
If I am not mistaken, HDF5 should be able to compress sparse "marker" arrays very efficiently.
So to answer your question: if there is no marker introduction in this version (0.5), then the question becomes moot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a huge problem with "detector" not necessarily meaning "photon detector" (hence my suggestion above). It can be an (electrical pulse) edge detector or whatever. All we need is to explicitly define the term "detector" to be more general (as @harripd has already done under different terminology). IMHO "detector" is actually a pretty good term to mean "something that produces timestamps (records) for observed events that are otherwise indistinguishable".
My concern, to reiterate, was with letting the terminology diverge between the HDF5 names and the spec, which makes it harder to keep the definitions straight in one's head.
(Using code-font for HDF5 fields and quotes for spec terms) if the thing is called detector
in the HDF5, you need to think in your head that detector
can be either photon or non-photon anyway. I fear that adding to this situation a different name "record" for the same concept as detector
, and especially the same name "detector (ID)" for a different concept (a detector
that is not a "marker"), is going to increase the risk of people misinterpreting the spec.
I don't think using a separate markers
array is going to win us much, I'm afraid. It is overall more complicated, it still requires the detectors
array to contain a value that photon-only readers need to skip, and (at least as proposed) it imposes the irregular requirement that ID 0 cannot be a marker, which will require special-casing in all handling code (or bugs if not done right).
Regarding using "event" instead of "record" -- I think I have the same problem as with "record": both words suggest to me a single (photon or marker) event rather than a channel of input data.
("Event" and "record" would both be good words for talking about a single entry across the timestamps
and detectors
(and nanotimes
) arrays, when not distinguishing between photons and non-photons. The term "record" could also be applied to other places where we have parallel arrays of the same length that act as the columns of a table (such as /setup/detectors
): a record would denote a row in the conceptual table. That's clearly a concept in this spec but we don't have a word for it currently. I'd suggest saving these words for improving the readability of the spec in the future.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding a separate array: being sparse and for instance using 0 as the "no marker" value, it makes it extremely easy and fast to slice the actual photon_data array (and its associated detectors array, if any).
The MAIN advantage I see in it, is that it does not confuse a reader that is not aware of those additional markers (as I mentioned, since the "detector" of a "marker" is irrelevant, it can be anything : no need for a special value as I was musing about).
Yes, you will add "dummy" photons in your processed data, but it probably won't matter if there are very few of them.
This is kind of the essence of a backward compatible format.
You can ignore what you don't know about and be fine.
With a "detectors" array containing values that are not detector IDs, but pass as such (for an unsuspecting code), a software trying to interpret this will hallucinate detectors which will eventually contain very few time stamps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a huge problem with "detector" not necessarily meaning "photon detector" (hence my suggestion above). It can be an (electrical pulse) edge detector or whatever. ... IMHO "detector" is actually a pretty good term to mean "something that produces timestamps (records) for observed events that are otherwise indistinguishable".
This is a really good point, the relevant fields are all called detector ...
in the HDF5 file, I'll update to keep the term detector
as the "universal" term, and we can then distinguish between photon and non-photon detectors.
The MAIN advantage I see in it, is that it does not confuse a reader that is not aware of those additional markers (as I mentioned, since the "detector" of a "marker" is irrelevant, it can be anything : no need for a special value as I was musing about).
I don't see a separate array as providing any real advantage, because you need to make some correlation between the indexes of the markers array and the detectors
and timestamps
arrays, so either you introduce 2 arrays, a markers_id
and a markers_timestamp
into the data, where now they will likely be very small, or you make markers
the same size as detectors
and you still have markers in the detectors
array, defeating the whole point of keeping backwards compatibility.
I also checked over the FRETBursts code, it already looks specifically at the detectors_specs/spectral_ch1/2
fields to identify donor/acceptor photons, and detector ids not in those fields will be ignored if unintentionally. I agree we want to keep things backwards compatible, but if you don't have any markers a v0.5 file will be readable by all current readers anyways.
In the end there is never full backwards compatibility, it winds up being a pipe dream. I think there is value in maintaining backwards compatibility, but a new feature will always create some level of a break in backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just as a datapoint, markers are not always vastly less frequent than photons in a FLIM dataset. In fact, pixel markers (if recorded) can be more frequent than photons depending on specimen and conditions. Not that I'm primarily worried about performance.
Regarding backward compatibility: readers that interpret the photon detector IDs based on the existing Photon-HDF5 fields (such as spectral_ch1
, split_ch1
) will (hopefully) not be bothered by extra (non-photon) detector IDs. Readers that only handle a single detector ID are also safe in principle. There may be edge cases (such as readers that don't check their assumptions), but I don't think the compatibility problem is that severe. Admittedly a reader that does not fully interpret the data but somehow still assumes photons (say, code that computes simple statistics) will not be able to distinguish between photon and non-photon channels.
docs/phdata.rst
Outdated
- **markers1** | ||
- **markers2** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshedding the names.
-
Might it be more uniform to use the
_ch1
,_ch2
suffixes to indicate that that these contain the same types of ids asspectral_ch1
,split_ch1
, etc? (Counterargument would be that "channel" is to be used only for detectors that observe something split from a common photon source. If so, perhaps this should be defined explicitly.) -
Having thought about this a bit more, if the intent is to use this as a mechanism to segregate out all timestamps that are not photons (including things like a laser SYNC or raw (pre-CFD) photon falling/rising edges recorded by Swabian Time Tagger), it might be better to use a more direct name such as
non_photon_ch1
,non_photon_ch2
, etc., because "marker" has a fairly specific connotation (at least to those familiar with PQ and BH data). We probably don't want to keep adding categories that a photon-only reader needs to skip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ch1
_ch1
I have no strong thoughts one way or another, I've though the same as you that having_ch1
_ch2
serves as a way to set apart enumerations of IDs, but it also suggests they are photons- I think I agree with your thoughts here, I would agree
non_photon
is a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been giving the naming some more thought, and the reality is there are at least two distinct categories of non-photon detectors, those that correspond to some form of detected event, ie a sync signal, a Swabian rising/falling edge tag etc. and those that are produced internally by the software, ie markers indicating the advance of a pixel or a line.
I'm looking to 0.6 when we introduce a FLIM standard, how will we define which non_photon/marker
channels correspond to which marker (pixel/line advance). The /setup/scan/
group could contain a mapping, but it would be less intuitive for a person reading the file. Another strategy might be to say that ```detectors_specs/non_photonXmeans a detector with a purpose defined in the
/user/`` group, and in 0.6 we will introduce specific fields ``detectors_specs/scan_chX`` and maybe ``detectors_specs/sync_chX`` to more concretely define functions of specific categories of non-photon events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like your latter strategy is what I was envisioning (same in spirit as #45). That is, we specify detector IDs for known/standardized roles; not the other way around.
This follows how the photon detector IDs are specified. It also allows for nonstandard detector IDs (photon or non-photon) whose role is only documented in /user
(and probably not recommended for final publication of data, but useful in early stages of a project). General-purpose readers should not interpret such detector IDs.
(I would note that frame/line/pixel markers (or a subset of those) are usually recorded by hardware, although software could, say, generate pixel markers from line markers if it wants to.)
docs/phdata.rst
Outdated
All previous fields are arrays containing one or more :ref:`record IDs<record_ids>` | ||
(Detector IDs for all ``spectral_chX``, ``polarization_ch1`` and ``split_chX``, | ||
and :ref:`marker IDs <marker_ids>` for ``markersX``). | ||
For example, a 2-color smFRET measurement without polarization or split channels | ||
(2 detectors) will have only one value in ``spectral_ch1`` (donor) and one value in | ||
``spectral_ch2`` (acceptor). A 2-color smFRET measurement with polarization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this part applies to the markers (non-photons), so perhaps this paragraph should come before the markers (and not mention markers).
(And perhaps a single field, marker_ids
(or non_photon_ids
) is sufficient, because we do not anticipate these forming an axis in an N-dimensional space in the way that the spectral/pol/split ones do.)
If the acquisition software also records events such as sync signals or markers indicating | ||
a change in position of a piezo or galvo scanner, these events should also be assigned a unique | ||
pixel ID. These events are all considered markers. To distinguish between detector (i.e. real photons) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might I suggest "a unique record ID" (or whatever we decide on if my other comment gets any traction) and not "pixel ID" here, because the use of "pixel" to mean a detector element in this raster-scan-related context is potentially confusing?
@smXplorer @marktsuchida
I encourage everyone to go over this carefully, |
@harripd and @smXplorer, Regarding the notes (and points raised in email) about backward compatibility: it might help to define what exactly we mean by backward compatibility (even if in an unofficial way to begin with). There are broadly two possible policies:
There could also be a mix of the two policies, such as readers rejecting files if the major version number is newer than known (I think this might be reasonable). Also, validators (as opposed to mere readers) will necessarily have to reject files of a newer version, or at least warn that they are not performing a full validation. Without at least some stated policy, I suspect it will be hard to agree on what constitutes backward compatibility. For the specific case of adding the non-photon IDs, I do not think that the addition is backward incompatible, even under the second interpretation above. Prior to the addition of non-photon IDs, data from each detector ID was already only interpretable if somewhere it is specified what the detectors are (such as with With the addition of the non-photon designation, old (0.4, or earlier 0.5dev) readers should still be able to read files containing non-photon channels, as long as they ignore the newly added fields. I think @harripd made a similar point specifically for FRETBursts in an email. I do not have any issue with discouraging the use of non-photon channels in general, though. It remains more a feature for internal intermediate datastores than for final ("publication-quality") sharing, at least until we standardize ways to attach semantics (such as related to raster scanning) to these channels. |
@marktsuchida I agree with your main points. Most importantly, the term backwards compatible was not precisely defined, and we need a concrete definition. Upon giving it some thought, I think was we often call "backwards compatibility" is actually "forwards compatibility" so let me define them bellow: Backward compatibility means that you can read and interpret files of previous versions using newer versions of the software. This is fairly easy, and more closely aligns with Mark's first principle Forward compatibility means that you can read and interpret files of newer versions using older versions of the software. This is much more difficult. In fact it is impossible, as any new addition will add something that, unless it is entirely redundant (and thus what is the point), previous versions of the software will not know about and thus be unable to implement. Therefore forward compatibility inevitably means that new features are implemented in a "minimally breaking" way, in other words, as long as a particular file written in a new version doesn't require a new feature, it will be readable by older versions of the software. This is basically Mark's second definition. Ideally we strive for both, insisting on backward compatibility, and implementing forward compatibility as much as possible. Bellow I outline some more specific principles for how we can achieve this inside of photon-HDF5:
I also like Mark's point about major and minor versions. I suggest we follow how many software packages, including Python and Numpy, as well as file formats handle their versions: major versions (the number before the . ) can introduce changes that break backwards compatibility (although trying to minimize this) while minor versions should only add new features, ie they must be fully backwards compatible, and as forwards compatible as possible. Regarding non-photon IDs, as written they fully follow the above principles, and discouraging their inclusion further makes clear that they are internal, and intermediate, preparing for when we standardize more specific types of non-photon IDs like markers and sync signals etc. Regarding Marks point that they do not break either forwards or backwards compatibility, it is a bit of a gray area. |
Summary of changes
This PR introduces a small number of actual additions to the format, while also clarifying a number of points without changing the actual format.
Additions to the format
/photon_data/measurement_specs/markersX
groups (X
being a positive integer). These denote ids in the/photon_data/detectors
arrays that are assigned not to photons but any arbitrary "event" marker. The meaning of markers is not defined anywhere in the spec, but should be specified by the file creator inside the user-defined/user/
group (see point number 3). As no additional support for markers is specified, non-custom software should ignore photons designated as markers. (Note future versions may provide for defining markers for certain types of FLIM etc., but these are in future versions, not 0.5)/user/
group- this is not a change per-se, but still a major clarification. The/user/
group is now on official part of the documentation, and is specifically designated as being open for the file creator (the user) to define as he/she wishes. Although there are some suggested ways of using it:/user/[vendor]
group to store raw metadata loaded from the header of the file, this ensures preservation of data/user/experimental_settings
to store various experimentally determined values like FRET correction factors etc. as well as an explanation of any marker ids in the dataClarifications
measurement_type
adding emphasis that "generic" is now the preferred type and while we may still add a new types , there will need to be a very compelling reason to do so./photon_data measurement_specs
to hopefully make the spec more clear/human readable.num_pixel
andsetup/detectors/id
)/
/
/