-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement ORSO filewriter #27
Conversation
611bf76
to
3099be6
Compare
3099be6
to
d901722
Compare
events = dg['instrument']['multiblade_detector']['data'].copy(deep=False) | ||
events.bins.coords['tof'] = events.bins.coords.pop('event_time_offset') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the copy
is meant to prevent in-place modification, note that is does not affect the bin content, i.e., the line second line modifies the buffer of the original. See scipp/scipp#2773.
src/essreflectometry/amor/orso.py
Outdated
# TODO populate timestamp | ||
# doesn't work with a local file because we need the timestamp of the original, | ||
# SciCat can provide that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference to the experiment time above? Isn't the time in NXentry actually a measurement time, not an experiment time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference to the experiment time above?
There is only one experiment time (sample run) but we have multiple files that each have a creation time.
Isn't the time in NXentry actually a measurement time, not an experiment time?
What is the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know how they define it, but an experiment could consist of multiple measurements (e.g., many samples).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we need to extend how we handle this here. E.g., use the max
of all times in sample files.
We can change this here to use the time recorded in the NXentry instead of the creation time of the file. These are not the same but, as long as there is only one entry, they are essentially equivalent.
src/essreflectometry/amor/orso.py
Outdated
"""Parse ORSO sample data from raw Amor NeXus data.""" | ||
if not raw_data.get('sample'): | ||
return OrsoSample(data_source.Sample.empty()) | ||
raise NotImplementedError('Amor NsXus sample parsing is not implemented') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise NotImplementedError('Amor NsXus sample parsing is not implemented') | |
raise NotImplementedError('Amor NeXus sample parsing is not implemented') |
orso.data_source.measurement.scheme = 'angle- and energy-dispersive' | ||
orso.reduction.software = fileio.reduction.Software( | ||
'scipp-ess', __version__, platform.platform() | ||
"""ORSO utilities for Amor.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear why all or most of the below is in the Amor submodule. It looks pretty generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It parses stuff from the loaded NeXus. I don't feel confident enough to know what an ESTIA NeXus file looks like to put this into a common module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is part of the NeXus standard, so generally all those fields should exist.
src/essreflectometry/amor/orso.py
Outdated
from ..types import Filename, RawData, Reference, Run, Sample | ||
|
||
|
||
def parse_orso_experiment(raw_data: RawData[Run]) -> OrsoExperiment[Run]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are concerned about keeping alive RawData
longer than necessary, consider extracting meta data first, or loading neutron data and meta data independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider extracting meta data first,
How would that work?
loading neutron data and meta data independently.
I considered it. But that would either mean loading the whole file twice which is slow. Or writing two more complicated loaders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is quite simple to write such a loader. We can chat tomorrow.
src/essreflectometry/orso.py
Outdated
class OrsoExperiment( | ||
sciline.Scope[Run, data_source.Experiment], data_source.Experiment | ||
): | ||
"""ORSO experiment for a run.""" | ||
|
||
|
||
class OrsoInstrument( | ||
sciline.Scope[Run, data_source.InstrumentSettings], data_source.InstrumentSettings | ||
): | ||
"""ORSO instrument settings for a run.""" | ||
|
||
|
||
class OrsoOwner(sciline.Scope[Run, orso_base.Person], orso_base.Person): | ||
"""ORSO owner of a file.""" | ||
|
||
|
||
class OrsoReduction(sciline.Scope[Run, reduction.Reduction], reduction.Reduction): | ||
"""ORSO measurement for a run.""" | ||
|
||
|
||
class OrsoSample(sciline.Scope[Run, data_source.Sample], data_source.Sample): | ||
"""ORSO sample of a run.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to support all this for the Reference
measurement? Or is it recording only for the sample run (i.e., can we use NewType
instead of a "generic"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on whether we want to process a file in isolation and store it as ort. Or if the rest of the pipeline needs to use the metadata. See, e.g., #27 (comment) Though that would change if we implement scipp/scippneutron#473. With this, the ORSO stuff could be more specialised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, I still do not see the case for parametrizing all of the above with the run type. I feel it would be simpler and clear to just use the sample everywhere.
- The sample and reference are part of the same experiment.
- The same instrument is used, it is always a reference measurement at that instrument.
- No one cares about the owner of the reference file, I'd say. I think typically it is done by the same user during their experiment.
- There is just one reduction, not a reduction for the sample and one for the reference (or if there is, the reference run would be the sample run?).
src/essreflectometry/orso.py
Outdated
# We simply assume that the owner of the reference measurement | ||
# has no claim on this data. | ||
if (sample_experiment.facility != reference_experiment.facility) or ( | ||
sample_experiment.instrument != reference_experiment.instrument | ||
): | ||
raise ValueError( | ||
'The sample and reference experiments were done at different instruments' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like an odd place to check this. Should this be done as part of the "data" path in the task graph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. But it should still be checked here, no? Otherwise the metadata providers depend on using specific data providers.
def _ascii_unit(unit: sc.Unit) -> str: | ||
unit = str(unit) | ||
if unit == 'Å': | ||
return 'angstrom' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha no love for the Swedish A here I see... But I think it's good to use the international version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ORSO requires ASCII strings for units. But nowadays, utf-8 should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I guess the big difficulty left here is how to specify what corrections were applied without inspecting the TaskGraph
. Maybe the user has to manually enter those to be able to build the Orso file?
I think it is difficult to get this 'perfect' now without any reflectometry user input. So in my opinion this can be merged as is. But I would still like to hear what you think about adding file hashes or some other identifier that is more robust than the file name.
src/essreflectometry/io.py
Outdated
qz = sc.midpoints(qz) | ||
r = sc.values(iofq.data) | ||
sr = sc.stddevs(iofq.data) | ||
if sigma_q is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need two cases here? Wouldn't it be easier to just either get sigma_Q
from the data or from an extra argument to this provider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be. And I'm going to write a provider that creates an OrsoDataset which will then request all it needs.
return OrsoMeasurement( | ||
data_source.Measurement( | ||
instrument_settings=instrument, | ||
data_files=[orso_base.File(file=os.path.basename(sample_filename))], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a limitation of the Orso standard, so probably little we can do about it. But, identifying a file by the file name seems an unreliable way to achieve reproducibility.
Best way would of course be to store the file itself, second best would be to store a hash and preferably also instructions on how to access the file.
Do you think there is anything we can do about that? Or do we just follow the standard for now? I looked quickly at the orso standard spec but could not find any metadata
field where that kind of extra information could be stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ORSO is not like NeXus where you can just write whatever you want. So we are limited by the standard. The only thing we can do (in the short term) is to store any additional info in the comment field. As I understand, this is a free form text filed. But it's meant to be read by humans, not programs...
If you think this is important enough (and I think it is), consider raising it with ORSO: https://github.com/reflectivity/file_format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a temporary solution we could just append
sha256:fff4b52fd585a4ca3e9709a64da93209f32db841a07ff7a1cde12db2f8bfe0a3
(or maybe md5 but it probably doesn't matter) to the end of the comment.
I'd happily raise this with ORSO and ask if they can add an optional hash
field to the File
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue: reflectivity/file_format#15
src/essreflectometry/orso.py
Outdated
data_files=[orso_base.File(file=os.path.basename(sample_filename))], | ||
additional_files=[ | ||
orso_base.File( | ||
file=os.path.basename(reference_filename), comment='supermirror' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so you can add comment
s. Then we can add a file hash and maybe even instructions for how to access the file.
ORSO also allows specifying the machine that the software runs on and the old code used this: https://github.com/scipp/ess/blob/5372ef6e91f2ee8bf4903c3db081b949e76952fd/src/ess/amor/orso.py#L47 I omitted this here because storing a user computer's name doesn't help much and seems like exposing private information. What do you think @jokasimr ? |
Yes I noticed that as well. It seems unnecessary. |
It is now complete except for listing corrections. We still need to figure out how do do that. The approach with tags that I outlined in the issue #6 will stop working when scipp/sciline#116 is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
I forgot to add tests. Did that now |
src/essreflectometry/orso.py
Outdated
return OrsoExperiment( | ||
data_source.Experiment( | ||
title=raw_data['title'], | ||
instrument=raw_data['name'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name
appears to be in NXinstrument, not NXentry: https://manual.nexusformat.org/classes/base_classes/NXinstrument.html#nxinstrument-name-field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we do need to make this Amor-specific because the files don't follow the standard. (#27 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in our test file, the name shows up in both places
src/essreflectometry/orso.py
Outdated
data_source.Experiment( | ||
title=raw_data['title'], | ||
instrument=raw_data['name'], | ||
facility=raw_data['facility'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not in the NeXus standard. Should we hardcode it to ESS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to use get
. We should only hard code anything in an instrument-specific module. With Amor, ESS would be wrong.
# TODO populate timestamp | ||
# doesn't work with a local file because we need the timestamp of the original, | ||
# SciCat can provide that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use the NXentry/start_time
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Because the timestamp is the last modification time. I.e., earliest the end time of the experiment, not the start time. And possibly later if the file was modified for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then use the end time? Modifying Raw files is not allowed, to my knowledge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no end time in our Amor file. I could add something now that gets the time if it exists and falls back to None
otherwise. But I would rather wait for a more general mechanism in scippneutron that could also handle scicat timestamps and select the most appropriate one.
src/essreflectometry/orso.py
Outdated
class OrsoExperiment( | ||
sciline.Scope[Run, data_source.Experiment], data_source.Experiment | ||
): | ||
"""ORSO experiment for a run.""" | ||
|
||
|
||
class OrsoInstrument( | ||
sciline.Scope[Run, data_source.InstrumentSettings], data_source.InstrumentSettings | ||
): | ||
"""ORSO instrument settings for a run.""" | ||
|
||
|
||
class OrsoOwner(sciline.Scope[Run, orso_base.Person], orso_base.Person): | ||
"""ORSO owner of a file.""" | ||
|
||
|
||
class OrsoReduction(sciline.Scope[Run, reduction.Reduction], reduction.Reduction): | ||
"""ORSO measurement for a run.""" | ||
|
||
|
||
class OrsoSample(sciline.Scope[Run, data_source.Sample], data_source.Sample): | ||
"""ORSO sample of a run.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, I still do not see the case for parametrizing all of the above with the run type. I feel it would be simpler and clear to just use the sample everywhere.
- The sample and reference are part of the same experiment.
- The same instrument is used, it is always a reference measurement at that instrument.
- No one cares about the owner of the reference file, I'd say. I think typically it is done by the same user during their experiment.
- There is just one reduction, not a reduction for the sample and one for the reference (or if there is, the reference run would be the sample run?).
src/essreflectometry/orso.py
Outdated
sample_experiment: Optional[OrsoExperiment[Sample]], | ||
reference_experiment: Optional[OrsoExperiment[Reference]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, I don't think the reference measurement is or should be an "experiment", i.e., I'd remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you use the opposite definition of 'experiment' and 'measurement' as ORSO. An experiment is one run at an instrument, i.e., sample and reference are separate experiments. And a measurement is a combination of any number of experiments.
But yes, we could ignore the reference experiment. build_orso_measurement
already only includes the sample instrument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a reference? Google found this: https://www.reflectometry.org/projects/file_formats/dictionaries/, and it seems to be consistent with "my" (or rather NeXus') definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual specification is here: https://github.com/reflectivity/file_format/blob/master/specification.md
But you are right. 'Experiment' refers to a 'series of measurements'. However, the 'measurement' key also refers to a series.
src/essreflectometry/amor/orso.py
Outdated
orso.data_source.measurement.scheme = 'angle- and energy-dispersive' | ||
orso.reduction.software = fileio.reduction.Software( | ||
'scipp-ess', __version__, platform.platform() | ||
wavelength = events_in_wavelength.coords['wavelength'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that there are wavelength coords, which may not be the case if we do not start with a TOF coords, or do not have wavelength masks. If not found, fall back to events_in_wavelength.bins.coords['wavelength']
.
src/essreflectometry/amor/orso.py
Outdated
return OrsoInstrument( | ||
orso_data_source.InstrumentSettings( | ||
wavelength=orso_base.ValueRange( | ||
min=float(wavelength.min().value), | ||
max=float(wavelength.max().value), | ||
unit=_ascii_unit(wavelength.unit), | ||
), | ||
incident_angle=orso_base.ValueRange( | ||
min=float(incident_angle.min().value), | ||
max=float(incident_angle.max().value), | ||
unit=_ascii_unit(incident_angle.unit), | ||
), | ||
polarization=None, # TODO how can we determine this from the inputs? | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part here is Amor specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Knowing which inputs to use. This depends on the specific pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, build_orso_measurement
is also specific to the pipeline because it sets the reference file as 'supermirror'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying the other ESS instruments will operate differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. Our OFFSPEC workflow normalises by direct beam: https://scipp.github.io/ess/instruments/external/offspec/offspec_reduction.html#Normalisation-of-sample-by-direct-beam Maybe there will be differences between instruments or even operating modes (collimated vs divergent).
) | ||
|
||
|
||
def build_orso_iofq_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part here is Amor specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Knowing which columns to save. I guess that is not so much amor-specific but workflow-specific. Do we have an established pattern for handling this? Should the providers simply not be part of a providers
tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any reason to believe the workflows at ESS will be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how many different types of workflows we will have. Will the only thing we ever save to ORSO be I(Q)?
To prevent accidental modification in user code
d2948ed
to
a0dbe24
Compare
Fixes #6Most of #6 except tracking corrections.This is still missing saving the actual data, tracking 'corrections', and filling in the remaining 'instrument' data. But it implements the basic mechanism and this part is ready for review.
For reference, adding it to the default pipeline like this:
results in
With this graph: