Reimplement ORSO filewriter #27

jl-wynen · 2024-02-02T15:01:05Z

~~Fixes #6~~ Most of #6 except tracking corrections.

This is still missing saving the actual data, tracking 'corrections', and filling in the remaining 'instrument' data. But it implements the basic mechanism and this part is ready for review.

For reference, adding it to the default pipeline like this:

from essreflectometry import orso as ess_orso
from essreflectometry.amor import orso as amor_orso
from essreflectometry.amor import providers, default_parameters
from essreflectometry.orso import providers as orso_providers
from essreflectometry.types import *
from orsopy import fileio

providers = (
    *providers,
    *amor_orso.providers,
    *orso_providers,
)

params = {
    **default_parameters,
    Filename[Sample]: "sample.nxs",
    Filename[Reference]: "reference.nxs",
    ess_orso.OrsoCreator: ess_orso.OrsoCreator(
        fileio.base.Person(
            name='Jan-Lukas',
            affiliation='ESS',
        )
    ),
}
pipeline = sciline.Pipeline(providers, params=params)

print(pipeline.compute((ess_orso.OrsoDataSource, ess_orso.OrsoReduction)))

results in

DataSource(
           owner=Person(name='J. Stahn', affiliation=None, contact='[email protected]'),
           experiment=Experiment(title='commissioning', instrument='AMOR', start_date=datetime.datetime(2020, 11, 25, 16, 3, 10), probe='neutron', facility='SINQ'),
           sample=Sample(name=None),
           measurement=Measurement(
                                   instrument_settings=None,
                                   data_files=[File(file='sample.nxs')],
                                   additional_files=[File(file='reference.nxs', comment='supermirror')],
                                   ),
           )
Reduction(
          software=Software(name='ess.reflectometry', version='23.11.1.dev18+gd1cfaa6.d20240124', platform='Linux'),
          timestamp=datetime.datetime(2024, 2, 2, 14, 58, 14, 919380, tzinfo=datetime.timezone.utc),
          creator=Person(name='Jan-Lukas', affiliation='ESS'),
          corrections=[],
          )

With this graph:

SimonHeybrock · 2024-02-05T05:10:43Z

src/essreflectometry/amor/load.py

+    events = dg['instrument']['multiblade_detector']['data'].copy(deep=False)
    events.bins.coords['tof'] = events.bins.coords.pop('event_time_offset')


If the copy is meant to prevent in-place modification, note that is does not affect the bin content, i.e., the line second line modifies the buffer of the original. See scipp/scipp#2773.

SimonHeybrock · 2024-02-05T05:12:58Z

src/essreflectometry/amor/orso.py

+    # TODO populate timestamp
+    #      doesn't work with a local file because we need the timestamp of the original,
+    #      SciCat can provide that


What is the difference to the experiment time above? Isn't the time in NXentry actually a measurement time, not an experiment time?

What is the difference to the experiment time above?

There is only one experiment time (sample run) but we have multiple files that each have a creation time.

Isn't the time in NXentry actually a measurement time, not an experiment time?

What is the difference?

I do not know how they define it, but an experiment could consist of multiple measurements (e.g., many samples).

Then we need to extend how we handle this here. E.g., use the max of all times in sample files.

We can change this here to use the time recorded in the NXentry instead of the creation time of the file. These are not the same but, as long as there is only one entry, they are essentially equivalent.

SimonHeybrock · 2024-02-05T05:13:12Z

src/essreflectometry/amor/orso.py

+    """Parse ORSO sample data from raw Amor NeXus data."""
+    if not raw_data.get('sample'):
+        return OrsoSample(data_source.Sample.empty())
+    raise NotImplementedError('Amor NsXus sample parsing is not implemented')


Suggested change

raise NotImplementedError('Amor NsXus sample parsing is not implemented')

raise NotImplementedError('Amor NeXus sample parsing is not implemented')

SimonHeybrock · 2024-02-05T05:15:39Z

src/essreflectometry/amor/orso.py

-    orso.data_source.measurement.scheme = 'angle- and energy-dispersive'
-    orso.reduction.software = fileio.reduction.Software(
-        'scipp-ess', __version__, platform.platform()
+"""ORSO utilities for Amor."""


It is unclear why all or most of the below is in the Amor submodule. It looks pretty generic?

It parses stuff from the loaded NeXus. I don't feel confident enough to know what an ESTIA NeXus file looks like to put this into a common module.

It is part of the NeXus standard, so generally all those fields should exist.

SimonHeybrock · 2024-02-05T05:17:02Z

src/essreflectometry/amor/orso.py

+from ..types import Filename, RawData, Reference, Run, Sample
+
+
+def parse_orso_experiment(raw_data: RawData[Run]) -> OrsoExperiment[Run]:


If you are concerned about keeping alive RawData longer than necessary, consider extracting meta data first, or loading neutron data and meta data independently.

consider extracting meta data first,

How would that work?

loading neutron data and meta data independently.

I considered it. But that would either mean loading the whole file twice which is slow. Or writing two more complicated loaders.

I think it is quite simple to write such a loader. We can chat tomorrow.

SimonHeybrock · 2024-02-05T05:19:42Z

src/essreflectometry/orso.py

+class OrsoExperiment(
+    sciline.Scope[Run, data_source.Experiment], data_source.Experiment
+):
+    """ORSO experiment for a run."""
+
+
+class OrsoInstrument(
+    sciline.Scope[Run, data_source.InstrumentSettings], data_source.InstrumentSettings
+):
+    """ORSO instrument settings for a run."""
+
+
+class OrsoOwner(sciline.Scope[Run, orso_base.Person], orso_base.Person):
+    """ORSO owner of a file."""
+
+
+class OrsoReduction(sciline.Scope[Run, reduction.Reduction], reduction.Reduction):
+    """ORSO measurement for a run."""
+
+
+class OrsoSample(sciline.Scope[Run, data_source.Sample], data_source.Sample):
+    """ORSO sample of a run."""


Do we need to support all this for the Reference measurement? Or is it recording only for the sample run (i.e., can we use NewType instead of a "generic"?

Depends on whether we want to process a file in isolation and store it as ort. Or if the rest of the pipeline needs to use the metadata. See, e.g., #27 (comment) Though that would change if we implement scipp/scippneutron#473. With this, the ORSO stuff could be more specialised.

Looking at this again, I still do not see the case for parametrizing all of the above with the run type. I feel it would be simpler and clear to just use the sample everywhere.

The sample and reference are part of the same experiment.

The same instrument is used, it is always a reference measurement at that instrument.

No one cares about the owner of the reference file, I'd say. I think typically it is done by the same user during their experiment.

There is just one reduction, not a reduction for the sample and one for the reference (or if there is, the reference run would be the sample run?).

SimonHeybrock · 2024-02-05T05:21:58Z

src/essreflectometry/orso.py

+    # We simply assume that the owner of the reference measurement
+    # has no claim on this data.
+    if (sample_experiment.facility != reference_experiment.facility) or (
+        sample_experiment.instrument != reference_experiment.instrument
+    ):
+        raise ValueError(
+            'The sample and reference experiments were done at different instruments'
+        )


Feels like an odd place to check this. Should this be done as part of the "data" path in the task graph?

True. But it should still be checked here, no? Otherwise the metadata providers depend on using specific data providers.

jokasimr · 2024-02-06T08:44:36Z

src/essreflectometry/amor/orso.py

+def _ascii_unit(unit: sc.Unit) -> str:
+    unit = str(unit)
+    if unit == 'Å':
+        return 'angstrom'


Haha no love for the Swedish A here I see... But I think it's good to use the international version.

ORSO requires ASCII strings for units. But nowadays, utf-8 should be fine

jokasimr

Looks good. I guess the big difficulty left here is how to specify what corrections were applied without inspecting the TaskGraph. Maybe the user has to manually enter those to be able to build the Orso file?

I think it is difficult to get this 'perfect' now without any reflectometry user input. So in my opinion this can be merged as is. But I would still like to hear what you think about adding file hashes or some other identifier that is more robust than the file name.

jokasimr · 2024-02-06T12:14:18Z

src/essreflectometry/io.py

+        qz = sc.midpoints(qz)
+    r = sc.values(iofq.data)
+    sr = sc.stddevs(iofq.data)
+    if sigma_q is not None:


Why do we need two cases here? Wouldn't it be easier to just either get sigma_Q from the data or from an extra argument to this provider?

It would be. And I'm going to write a provider that creates an OrsoDataset which will then request all it needs.

jokasimr · 2024-02-06T12:24:07Z

src/essreflectometry/orso.py

+    return OrsoMeasurement(
+        data_source.Measurement(
+            instrument_settings=instrument,
+            data_files=[orso_base.File(file=os.path.basename(sample_filename))],


This seems to be a limitation of the Orso standard, so probably little we can do about it. But, identifying a file by the file name seems an unreliable way to achieve reproducibility.
Best way would of course be to store the file itself, second best would be to store a hash and preferably also instructions on how to access the file.

Do you think there is anything we can do about that? Or do we just follow the standard for now? I looked quickly at the orso standard spec but could not find any metadata field where that kind of extra information could be stored.

ORSO is not like NeXus where you can just write whatever you want. So we are limited by the standard. The only thing we can do (in the short term) is to store any additional info in the comment field. As I understand, this is a free form text filed. But it's meant to be read by humans, not programs...

If you think this is important enough (and I think it is), consider raising it with ORSO: https://github.com/reflectivity/file_format

As a temporary solution we could just append
sha256:fff4b52fd585a4ca3e9709a64da93209f32db841a07ff7a1cde12db2f8bfe0a3 (or maybe md5 but it probably doesn't matter) to the end of the comment.

I'd happily raise this with ORSO and ask if they can add an optional hash field to the File object.

Created an issue: reflectivity/file_format#15

jokasimr · 2024-02-06T12:27:32Z

src/essreflectometry/orso.py

+            data_files=[orso_base.File(file=os.path.basename(sample_filename))],
+            additional_files=[
+                orso_base.File(
+                    file=os.path.basename(reference_filename), comment='supermirror'


Okay so you can add comments. Then we can add a file hash and maybe even instructions for how to access the file.

jl-wynen · 2024-02-06T12:52:10Z

ORSO also allows specifying the machine that the software runs on and the old code used this: https://github.com/scipp/ess/blob/5372ef6e91f2ee8bf4903c3db081b949e76952fd/src/ess/amor/orso.py#L47 I omitted this here because storing a user computer's name doesn't help much and seems like exposing private information. What do you think @jokasimr ?

jokasimr · 2024-02-06T13:16:54Z

ORSO also allows specifying the machine that the software runs on and the old code used this: https://github.com/scipp/ess/blob/5372ef6e91f2ee8bf4903c3db081b949e76952fd/src/ess/amor/orso.py#L47 I omitted this here because storing a user computer's name doesn't help much and seems like exposing private information. What do you think @jokasimr ?

Yes I noticed that as well. It seems unnecessary.

jl-wynen · 2024-02-06T14:07:52Z

It is now complete except for listing corrections.

We still need to figure out how do do that. The approach with tags that I outlined in the issue #6 will stop working when scipp/sciline#116 is merged.

jokasimr

Looks good to me!

jl-wynen · 2024-02-06T15:17:13Z

I forgot to add tests. Did that now

SimonHeybrock · 2024-02-07T07:08:54Z

src/essreflectometry/orso.py

+    return OrsoExperiment(
+        data_source.Experiment(
+            title=raw_data['title'],
+            instrument=raw_data['name'],


The name appears to be in NXinstrument, not NXentry: https://manual.nexusformat.org/classes/base_classes/NXinstrument.html#nxinstrument-name-field

So we do need to make this Amor-specific because the files don't follow the standard. (#27 (comment))

Actually, in our test file, the name shows up in both places

SimonHeybrock · 2024-02-07T07:12:43Z

src/essreflectometry/orso.py

+        data_source.Experiment(
+            title=raw_data['title'],
+            instrument=raw_data['name'],
+            facility=raw_data['facility'],


This is not in the NeXus standard. Should we hardcode it to ESS?

I changed it to use get. We should only hard code anything in an instrument-specific module. With Amor, ESS would be wrong.

SimonHeybrock · 2024-02-07T07:15:46Z

src/essreflectometry/orso.py

+    # TODO populate timestamp
+    #      doesn't work with a local file because we need the timestamp of the original,
+    #      SciCat can provide that


Can't we use the NXentry/start_time?

No. Because the timestamp is the last modification time. I.e., earliest the end time of the experiment, not the start time. And possibly later if the file was modified for some reason.

Then use the end time? Modifying Raw files is not allowed, to my knowledge.

There is no end time in our Amor file. I could add something now that gets the time if it exists and falls back to None otherwise. But I would rather wait for a more general mechanism in scippneutron that could also handle scicat timestamps and select the most appropriate one.

SimonHeybrock · 2024-02-07T07:22:06Z

src/essreflectometry/orso.py

+class OrsoExperiment(
+    sciline.Scope[Run, data_source.Experiment], data_source.Experiment
+):
+    """ORSO experiment for a run."""
+
+
+class OrsoInstrument(
+    sciline.Scope[Run, data_source.InstrumentSettings], data_source.InstrumentSettings
+):
+    """ORSO instrument settings for a run."""
+
+
+class OrsoOwner(sciline.Scope[Run, orso_base.Person], orso_base.Person):
+    """ORSO owner of a file."""
+
+
+class OrsoReduction(sciline.Scope[Run, reduction.Reduction], reduction.Reduction):
+    """ORSO measurement for a run."""
+
+
+class OrsoSample(sciline.Scope[Run, data_source.Sample], data_source.Sample):
+    """ORSO sample of a run."""


Looking at this again, I still do not see the case for parametrizing all of the above with the run type. I feel it would be simpler and clear to just use the sample everywhere.

The sample and reference are part of the same experiment.

The same instrument is used, it is always a reference measurement at that instrument.

No one cares about the owner of the reference file, I'd say. I think typically it is done by the same user during their experiment.

There is just one reduction, not a reduction for the sample and one for the reference (or if there is, the reference run would be the sample run?).

SimonHeybrock · 2024-02-07T07:23:31Z

src/essreflectometry/orso.py

+    sample_experiment: Optional[OrsoExperiment[Sample]],
+    reference_experiment: Optional[OrsoExperiment[Reference]],


See above, I don't think the reference measurement is or should be an "experiment", i.e., I'd remove this.

I think you use the opposite definition of 'experiment' and 'measurement' as ORSO. An experiment is one run at an instrument, i.e., sample and reference are separate experiments. And a measurement is a combination of any number of experiments.

But yes, we could ignore the reference experiment. build_orso_measurement already only includes the sample instrument.

Can you provide a reference? Google found this: https://www.reflectometry.org/projects/file_formats/dictionaries/, and it seems to be consistent with "my" (or rather NeXus') definition.

Similar here: https://www.reflectometry.org/projects/file_formats/tasks/ws_2021-06_text/

The actual specification is here: https://github.com/reflectivity/file_format/blob/master/specification.md

But you are right. 'Experiment' refers to a 'series of measurements'. However, the 'measurement' key also refers to a series.

SimonHeybrock · 2024-02-07T07:27:00Z

src/essreflectometry/amor/orso.py

-    orso.data_source.measurement.scheme = 'angle- and energy-dispersive'
-    orso.reduction.software = fileio.reduction.Software(
-        'scipp-ess', __version__, platform.platform()
+    wavelength = events_in_wavelength.coords['wavelength']


This assumes that there are wavelength coords, which may not be the case if we do not start with a TOF coords, or do not have wavelength masks. If not found, fall back to events_in_wavelength.bins.coords['wavelength'].

SimonHeybrock · 2024-02-07T07:27:18Z

src/essreflectometry/amor/orso.py

+    return OrsoInstrument(
+        orso_data_source.InstrumentSettings(
+            wavelength=orso_base.ValueRange(
+                min=float(wavelength.min().value),
+                max=float(wavelength.max().value),
+                unit=_ascii_unit(wavelength.unit),
+            ),
+            incident_angle=orso_base.ValueRange(
+                min=float(incident_angle.min().value),
+                max=float(incident_angle.max().value),
+                unit=_ascii_unit(incident_angle.unit),
+            ),
+            polarization=None,  # TODO how can we determine this from the inputs?
+        )
+    )


Which part here is Amor specific?

Knowing which inputs to use. This depends on the specific pipeline.

Actually, build_orso_measurement is also specific to the pipeline because it sets the reference file as 'supermirror'.

Are you saying the other ESS instruments will operate differently?

I don't know. Our OFFSPEC workflow normalises by direct beam: https://scipp.github.io/ess/instruments/external/offspec/offspec_reduction.html#Normalisation-of-sample-by-direct-beam Maybe there will be differences between instruments or even operating modes (collimated vs divergent).

SimonHeybrock · 2024-02-07T07:27:57Z

src/essreflectometry/amor/orso.py

+    )
+
+
+def build_orso_iofq_dataset(


Which part here is Amor specific?

Knowing which columns to save. I guess that is not so much amor-specific but workflow-specific. Do we have an established pattern for handling this? Should the providers simply not be part of a providers tuple?

Do we have any reason to believe the workflows at ESS will be different?

I don't know how many different types of workflows we will have. Will the only thing we ever save to ORSO be I(Q)?

To prevent accidental modification in user code

jl-wynen requested review from SimonHeybrock and jokasimr February 2, 2024 15:01

jl-wynen force-pushed the orso-filewriter branch from 611bf76 to 3099be6 Compare February 2, 2024 15:06

jl-wynen mentioned this pull request Feb 2, 2024

Extract subgraphs in visualisation scipp/sciline#117

Open

jl-wynen force-pushed the orso-filewriter branch from 3099be6 to d901722 Compare February 2, 2024 15:27

SimonHeybrock reviewed Feb 5, 2024

View reviewed changes

jl-wynen mentioned this pull request Feb 5, 2024

Support requesting TaskGraph in providers scipp/sciline#113

Closed

jokasimr reviewed Feb 6, 2024

View reviewed changes

jl-wynen mentioned this pull request Feb 6, 2024

Metadata utilities scipp/scippneutron#473

Open

jl-wynen marked this pull request as ready for review February 6, 2024 14:06

jokasimr approved these changes Feb 6, 2024

View reviewed changes

SimonHeybrock reviewed Feb 7, 2024

View reviewed changes

SimonHeybrock approved these changes Feb 7, 2024

View reviewed changes

jl-wynen and others added 12 commits February 7, 2024 14:52

Split loading and extracting events

12841ea

Make providers tuples

d060262

To prevent accidental modification in user code

Depend on dateutil

70c37eb

Add ORSO providers

56f861c

Fix typo

7ba1d4d

Merge amor.orso into top level orso module

3093aef

Add orso module to docs

26135e7

Extract ORSO instrument params

9dc3b45

Change save_ort to use new ORSO types

84d3dd7

Allow passing resolution as separate arg

280d658

Start using ort writer

3927c06

Apply automatic formatting

6fc9f3c

jl-wynen added 8 commits February 7, 2024 14:52

Convert to float

aeb63df

Use a provider to construct ORSO dataset

da9bc01

Explain file writing

327676b

Add tests for general orso module

15eef14

Allow reference_filename=None

48427f2

Get metadata from proper locations

86cc267

More relaxed coord range checking

eb93975

Ignore reference run metadata

a0dbe24

jl-wynen force-pushed the orso-filewriter branch from d2948ed to a0dbe24 Compare February 7, 2024 13:53

jl-wynen merged commit d566404 into main Feb 7, 2024
3 checks passed

jl-wynen deleted the orso-filewriter branch February 7, 2024 14:13

SimonHeybrock mentioned this pull request Feb 12, 2024

Split loading of monitors and detector data scipp/esssans#82

Closed

		events = dg['instrument']['multiblade_detector']['data'].copy(deep=False)
		events.bins.coords['tof'] = events.bins.coords.pop('event_time_offset')

	raise NotImplementedError('Amor NsXus sample parsing is not implemented')
	raise NotImplementedError('Amor NeXus sample parsing is not implemented')

		from ..types import Filename, RawData, Reference, Run, Sample


		def parse_orso_experiment(raw_data: RawData[Run]) -> OrsoExperiment[Run]:

		sample_experiment: Optional[OrsoExperiment[Sample]],
		reference_experiment: Optional[OrsoExperiment[Reference]],

Reimplement ORSO filewriter #27

Reimplement ORSO filewriter #27

Conversation

jl-wynen commented Feb 2, 2024 • edited Loading

SimonHeybrock Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jl-wynen Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jokasimr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jokasimr Feb 6, 2024 • edited Loading

Choose a reason for hiding this comment

jokasimr Feb 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jl-wynen commented Feb 6, 2024

jokasimr commented Feb 6, 2024

jl-wynen commented Feb 6, 2024

jokasimr left a comment

Choose a reason for hiding this comment

jl-wynen commented Feb 6, 2024

Choose a reason for hiding this comment

jl-wynen Feb 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jl-wynen commented Feb 2, 2024 •

edited

Loading

SimonHeybrock Feb 5, 2024 •

edited

Loading

jl-wynen Feb 5, 2024 •

edited

Loading

jokasimr Feb 6, 2024 •

edited

Loading

jokasimr Feb 6, 2024 •

edited

Loading

jl-wynen Feb 7, 2024 •

edited

Loading