Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalize sim data upon .from_file() #287

Merged
merged 3 commits into from
Apr 6, 2022
Merged

Conversation

tylerflex
Copy link
Collaborator

@tylerflex tylerflex commented Apr 4, 2022

from tidy3d import SimulationData

# load data and normalize it by default with normalilze_index=0
sim_data = SimulationData.from_file("something.hdf5")

print(sim_data.normalized)
# True

# saves the normalized version
sim_data.to_file("something2.hdf5")

# try to load again fails
sim_data2 = SimulationData.from_file("something2.hdf5")

[10:39:57] ERROR    Data from this file is already normalized.  log.py:33
                    Instead, load `.from_file()` with
                    `normalize_index=None.

# passes
sim_data2 = SimulationData.from_file("something2.hdf5", normalize_index=None)

Let me know if we want to change behavior here.

@tylerflex
Copy link
Collaborator Author

We might also want to store SimulationData.normalized as an int, or store SimulationData.normalize_index instead with normalized as a property.

@tylerflex
Copy link
Collaborator Author

#283

@momchil-flex
Copy link
Collaborator

I think that storing the normalize index makes sense yeah. And I think that loading from_file should set normalize_index = 0 by default only if None is given and none is set in the loaded data. I think it's cumbersome to have to manually say from_file(..., normalize_index=None) to load already normalized data?

@tylerflex
Copy link
Collaborator Author

tylerflex commented Apr 4, 2022

about the normalize_index=None point: I also find it cumbersome, but think that it's hard to make it work with the loading of normalized data. For example, there's no way to "un-normalize" the data and re-normalize with a different index. For example

fname = 'unnormalized_data.hdf5'

sim_data_norm0 = SimulationData.from_file(fname)
sim_data_norm0.to_file('normalized_0.hdf5')

# shouldn't be allowed, because the data is already normalized w.r.t. index 0.
sim_data_norm1 = SimulationData.from_file('normalized_0.hdf5', normalize_index=1)

So one option is to just completely ignore normalize_index if the data comes in normalized already, or throw a warning if the normalize_index supplied does not match the normalize_index of the loaded SimulationData.

Thoughts?

@tylerflex
Copy link
Collaborator Author

I agree we should store the normalize_index in the SimulationData. Seems like a no brainer. I'll add that.

@tylerflex
Copy link
Collaborator Author

tylerflex commented Apr 4, 2022

actually, it might introduce some breaking changes on the backend if SimulationData are initialized with normalize_index=normalize_index instead of normalize=normalze. Perhaps we should do this as a separate PR. I was thinking

  1. make SimulationData.normalize_index: pd.NonNegativeInt = None field.
  2. make Simulation.normalize a @Property that returns self.normalize_index is not None.
  3. change all the inits of SimulationData present in front end and back end to reflect 1.
  4. change the normalization logic to be aware of the normalize_index of the sim_data loaded from file (if we decide to go that route).

@momchil-flex
Copy link
Collaborator

momchil-flex commented Apr 4, 2022

So one option is to just completely ignore normalize_index if the data comes in normalized already, or throw a warning if the normalize_index supplied does not match the normalize_index of the loaded SimulationData.

This is what I was thinking yeah, but I didn't realize you introduced normalize_index as a kwarg to from_file only here, I thought it already existed. I think there's no reason for it.

  • make SimulationData.normalize_index: pd.NonNegativeInt = None field.
  • make Simulation.normalize a @Property that returns self.normalize_index is not None.
  • change all the inits of SimulationData present in front end and back end to reflect 1.
  • change the normalization logic to be aware of the normalize_index of the sim_data loaded from file (if we decide to go that route).

This sounds good, but so do we do the actual normalization in the validator for normalize_index? Also I think in 2. the property should be normalized not normalize. And we remove the normalize method that currently does the normalization.

I think this would mandate just one tiny change on the backend.

@momchil-flex
Copy link
Collaborator

momchil-flex commented Apr 4, 2022

That said, again we have no way to un-normalize (not right off the bat, but technically we could?) So what if normalize_index = 0 and then someone does sim_data.normalize_index = 1?

We could normalize with 1/spectrum(0) and then normalize with spectrum(1), that should do it?

@tylerflex
Copy link
Collaborator Author

1/spectrum(0) seems potentially problematic / divide by zero-y.

This is looking a bit more complicated than originally anticipated. How about this as an option:

All .hdf5 files are un-normalized only and SimulationData can not be written to file?

@momchil-flex
Copy link
Collaborator

1/spectrum(0) seems potentially problematic / divide by zero-y.

Actually, not really.. Because the normalization is already doing 1/spectrum actually. So the reverse of the normalization would be * spectrum. That said, I don't know if there are situations in which the regular normalization is divide by zero-y (monitor recording way outside the source spectrum?)

All .hdf5 files are un-normalized only and SimulationData can not be written to file?

This could work but is a bit of an annoying restriction. It seems that we could renormalize? If you still don't like that, is there a way to get the previous value of val in the validator? Such that if normalize_index is already set to something different than None, we don't allow the user to change it?

@tylerflex
Copy link
Collaborator Author

what I'm worried about is whether normalizing and then unnormalizing will give the same sim_data or different due to numerical issues in many cases. Not sure about getting previous values in validator, seems not really possible without doing setattr stuff

@momchil-flex
Copy link
Collaborator

Yeah the values can differ at numerical precision level. If you think this is a no-go and the validator doesn't work either, I guess we can remove to_file.

Or - how about this: leave the normalize method and make _normalize_index a pydantic private attribute? And normalize_index can be a property returning _normalize_index (remove normalized field). Then from_file takes a normalize_index = 0 by default. Then,

  • if normalize_index is None or len(sim.sources) < 1 or sim_data.normalize_index == normalize_index : return sim_data
  • elif sim_data.normalize_index is None: return sim_data.normalize(normalize_index)
  • else raise error that sim_data is already normalized by a different normalize_index.

@tylerflex
Copy link
Collaborator Author

ok I think I like this solution so far, but how does one set the _normalize_index?

@momchil-flex
Copy link
Collaborator

momchil-flex commented Apr 5, 2022

With the normalize() method, which should error if _normalize_index is not None.

@tylerflex
Copy link
Collaborator Author

Ok I just pushed some changes, summary:

  • SimulationData no longer takes normalized upon init, the normalized status is stored as a private attribute _normalize_index.
  • added @property for normalized and normalize_index to reflect the state of the internal _normalize_index.
  • normalize_index is saved in the .hdf5 file as an attribute and also loaded back out when .from_file() is used.
  • added normalize_index to .from_file() to optionally normalize the file contents
    • if the file has a normalize_index noted, it must be the same as normalize_index supplied, nothing happens.
    • if the file does not have a normalize_index noted, calls normalize() on the sim_data loaded from the file.

@tylerflex tylerflex merged commit bcf15c2 into develop Apr 6, 2022
@tylerflex tylerflex deleted the tyler/normalize_simdata branch April 6, 2022 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants