normalize sim data upon .from_file() #287

tylerflex · 2022-04-04T17:42:00Z

from tidy3d import SimulationData

# load data and normalize it by default with normalilze_index=0
sim_data = SimulationData.from_file("something.hdf5")

print(sim_data.normalized)
# True

# saves the normalized version
sim_data.to_file("something2.hdf5")

# try to load again fails
sim_data2 = SimulationData.from_file("something2.hdf5")

[10:39:57] ERROR    Data from this file is already normalized.  log.py:33
                    Instead, load `.from_file()` with
                    `normalize_index=None.

# passes
sim_data2 = SimulationData.from_file("something2.hdf5", normalize_index=None)

Let me know if we want to change behavior here.

tylerflex · 2022-04-04T17:45:53Z

We might also want to store SimulationData.normalized as an int, or store SimulationData.normalize_index instead with normalized as a property.

tylerflex · 2022-04-04T18:05:28Z

#283

momchil-flex · 2022-04-04T18:13:46Z

I think that storing the normalize index makes sense yeah. And I think that loading from_file should set normalize_index = 0 by default only if None is given and none is set in the loaded data. I think it's cumbersome to have to manually say from_file(..., normalize_index=None) to load already normalized data?

tylerflex · 2022-04-04T18:30:52Z

about the normalize_index=None point: I also find it cumbersome, but think that it's hard to make it work with the loading of normalized data. For example, there's no way to "un-normalize" the data and re-normalize with a different index. For example

fname = 'unnormalized_data.hdf5'

sim_data_norm0 = SimulationData.from_file(fname)
sim_data_norm0.to_file('normalized_0.hdf5')

# shouldn't be allowed, because the data is already normalized w.r.t. index 0.
sim_data_norm1 = SimulationData.from_file('normalized_0.hdf5', normalize_index=1)

So one option is to just completely ignore normalize_index if the data comes in normalized already, or throw a warning if the normalize_index supplied does not match the normalize_index of the loaded SimulationData.

Thoughts?

tylerflex · 2022-04-04T18:31:50Z

I agree we should store the normalize_index in the SimulationData. Seems like a no brainer. I'll add that.

tylerflex · 2022-04-04T18:35:24Z

actually, it might introduce some breaking changes on the backend if SimulationData are initialized with normalize_index=normalize_index instead of normalize=normalze. Perhaps we should do this as a separate PR. I was thinking

make SimulationData.normalize_index: pd.NonNegativeInt = None field.
make Simulation.normalize a @Property that returns self.normalize_index is not None.
change all the inits of SimulationData present in front end and back end to reflect 1.
change the normalization logic to be aware of the normalize_index of the sim_data loaded from file (if we decide to go that route).

momchil-flex · 2022-04-04T19:15:12Z

So one option is to just completely ignore normalize_index if the data comes in normalized already, or throw a warning if the normalize_index supplied does not match the normalize_index of the loaded SimulationData.

This is what I was thinking yeah, but I didn't realize you introduced normalize_index as a kwarg to from_file only here, I thought it already existed. I think there's no reason for it.

make SimulationData.normalize_index: pd.NonNegativeInt = None field.

make Simulation.normalize a @Property that returns self.normalize_index is not None.

change all the inits of SimulationData present in front end and back end to reflect 1.

change the normalization logic to be aware of the normalize_index of the sim_data loaded from file (if we decide to go that route).

This sounds good, but so do we do the actual normalization in the validator for normalize_index? Also I think in 2. the property should be normalized not normalize. And we remove the normalize method that currently does the normalization.

I think this would mandate just one tiny change on the backend.

momchil-flex · 2022-04-04T19:16:40Z

That said, again we have no way to un-normalize (not right off the bat, but technically we could?) So what if normalize_index = 0 and then someone does sim_data.normalize_index = 1?

We could normalize with 1/spectrum(0) and then normalize with spectrum(1), that should do it?

tylerflex · 2022-04-04T21:01:03Z

1/spectrum(0) seems potentially problematic / divide by zero-y.

This is looking a bit more complicated than originally anticipated. How about this as an option:

All .hdf5 files are un-normalized only and SimulationData can not be written to file?

momchil-flex · 2022-04-04T21:40:13Z

1/spectrum(0) seems potentially problematic / divide by zero-y.

Actually, not really.. Because the normalization is already doing 1/spectrum actually. So the reverse of the normalization would be * spectrum. That said, I don't know if there are situations in which the regular normalization is divide by zero-y (monitor recording way outside the source spectrum?)

All .hdf5 files are un-normalized only and SimulationData can not be written to file?

This could work but is a bit of an annoying restriction. It seems that we could renormalize? If you still don't like that, is there a way to get the previous value of val in the validator? Such that if normalize_index is already set to something different than None, we don't allow the user to change it?

tylerflex · 2022-04-04T22:14:22Z

what I'm worried about is whether normalizing and then unnormalizing will give the same sim_data or different due to numerical issues in many cases. Not sure about getting previous values in validator, seems not really possible without doing setattr stuff

momchil-flex · 2022-04-04T22:49:03Z

Yeah the values can differ at numerical precision level. If you think this is a no-go and the validator doesn't work either, I guess we can remove to_file.

Or - how about this: leave the normalize method and make _normalize_index a pydantic private attribute? And normalize_index can be a property returning _normalize_index (remove normalized field). Then from_file takes a normalize_index = 0 by default. Then,

if normalize_index is None or len(sim.sources) < 1 or sim_data.normalize_index == normalize_index : return sim_data
elif sim_data.normalize_index is None: return sim_data.normalize(normalize_index)
else raise error that sim_data is already normalized by a different normalize_index.

tylerflex · 2022-04-04T23:03:29Z

ok I think I like this solution so far, but how does one set the _normalize_index?

momchil-flex · 2022-04-05T01:44:26Z

With the normalize() method, which should error if _normalize_index is not None.

tylerflex · 2022-04-05T20:52:48Z

Ok I just pushed some changes, summary:

SimulationData no longer takes normalized upon init, the normalized status is stored as a private attribute _normalize_index.
added @property for normalized and normalize_index to reflect the state of the internal _normalize_index.
normalize_index is saved in the .hdf5 file as an attribute and also loaded back out when .from_file() is used.
added normalize_index to .from_file() to optionally normalize the file contents
- if the file has a normalize_index noted, it must be the same as normalize_index supplied, nothing happens.
- if the file does not have a normalize_index noted, calls normalize() on the sim_data loaded from the file.

normalize sim data upon .from_file()

c146a1b

tylerflex requested a review from momchil-flex April 4, 2022 17:44

black

d38f40e

tylerflex mentioned this pull request Apr 4, 2022

normalize sim_data upon loading from file #283

Closed

reorganized how sim data normalizatio works

82aaf3c

momchil-flex approved these changes Apr 6, 2022

View reviewed changes

tylerflex merged commit bcf15c2 into develop Apr 6, 2022

tylerflex deleted the tyler/normalize_simdata branch April 6, 2022 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

normalize sim data upon .from_file() #287

normalize sim data upon .from_file() #287

tylerflex commented Apr 4, 2022 •

edited

Loading

tylerflex commented Apr 4, 2022

tylerflex commented Apr 4, 2022

momchil-flex commented Apr 4, 2022

tylerflex commented Apr 4, 2022 •

edited

Loading

tylerflex commented Apr 4, 2022

tylerflex commented Apr 4, 2022 •

edited

Loading

momchil-flex commented Apr 4, 2022 •

edited

Loading

momchil-flex commented Apr 4, 2022 •

edited

Loading

tylerflex commented Apr 4, 2022

momchil-flex commented Apr 4, 2022

tylerflex commented Apr 4, 2022

momchil-flex commented Apr 4, 2022

tylerflex commented Apr 4, 2022

momchil-flex commented Apr 5, 2022 •

edited

Loading

tylerflex commented Apr 5, 2022

normalize sim data upon .from_file() #287

normalize sim data upon .from_file() #287

Conversation

tylerflex commented Apr 4, 2022 • edited Loading

tylerflex commented Apr 4, 2022

tylerflex commented Apr 4, 2022

momchil-flex commented Apr 4, 2022

tylerflex commented Apr 4, 2022 • edited Loading

tylerflex commented Apr 4, 2022

tylerflex commented Apr 4, 2022 • edited Loading

momchil-flex commented Apr 4, 2022 • edited Loading

momchil-flex commented Apr 4, 2022 • edited Loading

tylerflex commented Apr 4, 2022

momchil-flex commented Apr 4, 2022

tylerflex commented Apr 4, 2022

momchil-flex commented Apr 4, 2022

tylerflex commented Apr 4, 2022

momchil-flex commented Apr 5, 2022 • edited Loading

tylerflex commented Apr 5, 2022

tylerflex commented Apr 4, 2022 •

edited

Loading

tylerflex commented Apr 4, 2022 •

edited

Loading

tylerflex commented Apr 4, 2022 •

edited

Loading

momchil-flex commented Apr 4, 2022 •

edited

Loading

momchil-flex commented Apr 4, 2022 •

edited

Loading

momchil-flex commented Apr 5, 2022 •

edited

Loading