Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing values and MAPL #252

Open
mpagowski opened this issue Aug 18, 2023 · 11 comments
Open

Missing values and MAPL #252

mpagowski opened this issue Aug 18, 2023 · 11 comments

Comments

@mpagowski
Copy link

NOAA's GBBEPx wildfire files (at 0.1deg x 0.1deg resolution) have missing values. How MAPL treats such values in interpolations?

@mathomp4
Copy link
Member

@mpagowski Let me ping @bena-nasa and @atrayano to answer this. I'm sure we handle it "correctly" but I'm not sure of the specifics.

@bena-nasa
Copy link
Contributor

bena-nasa commented Aug 21, 2023

Are these files that are read via ExtData? Please let me know what version of MAPL you are using. The short answer is that even though MAPL can respect it when doing say spatial interpolations, if your application code that ultimately uses the data doesn't respect the missing value, it doesn't matter.

MAPL has an internal "MAPL undefined" constant MAPL_UNDEF, that we use at various places in the MAPL code to protect against operations. It is set to a value of 1.0e15

I don't know what version of MAPL you are using but in newer versions of the generation 2 ExtData, we check the missing value defined in the file, if it does not match the "MAPL undefined" value, when we read the array from the NetCDF file we set the points that have the file missing value to the "MAPL undefined" one. Then any point in our code base that respects this will be aware of it. For example when regridding the file grid of the original file to the application grid we respect this.
If a target cell has contributions from any points, that are the "MAPL undefined" value we do not include them in the calculation to compute the target value. If all were "MAPL undefined" then the target cell is "MAPL undefined".

Of course if the code or component (outside of MAPL) that ultimately uses these does not protect against doing operations at points that are "MAPL undefined", we cannot control that, the user can do what they want with arrays in their code

In general our emissions files do not have missing values as how do you handle that in all the components that may use this? It would be a nightmare. Rather if there are no emissions, they are simply 0 since gocart is a huge code base and does not check for any sort of missing value when doing array level operations. Not to mention protecting every array operation for a missing value would probably destroy performance.

So the answer is that depending on the version of ExtData you are using, we may respect the the file defined missing value and set points that are "missing" to our own internal missing value. This is respected at points in the MAPL code base, but when you get to the application code, all bets are off since we have no control over how the code developer chose to use arrays.

So the ultimate answer is if your input files have missing values, even if MAPL respects them, GOCART does.

At that point we have 2 options.

  1. Reprocess those files so rather than having missing values, they are just set to say 0 at those points.
  2. If the above is not an option, but you are using a version of ExtData that respects the file missing value then after the fact in the application code, the arrays would have to be intercepted so that any MAPL_UNDEFs can be set to 0 or protected against
  3. If you are using an older version of MAPL where ExtData did not respect the file supplied missing value ( I would need to check the version you are using) it was simply not making any accommodation for that we would need to come up with a custom solution

To me of all 3, options seems by far the easiest, it would be a very trivial python script to read in and re-write the file replacing anything that has a missing value with 0. Heck, maybe even NCO or some other utility could do this, replace anything with a missing value with 0.

@mpagowski
Copy link
Author

mpagowski commented Aug 21, 2023 via email

@bena-nasa
Copy link
Contributor

bena-nasa commented Aug 21, 2023

It doesn't matter what is most "correct", what matters is where is this data used, is it going to ingested and used by an application (I assume GOCART), that does operations of floating point arrays filled from this data. At that point you simply cannot have "missing values" unless EVERY array level operations that may use this cata somehow knows to protect/not use points that are missing. That's not realistic or how GOCART (assuming that is the use case) is implemented.

@mpagowski
Copy link
Author

mpagowski commented Aug 21, 2023 via email

@bena-nasa
Copy link
Contributor

bena-nasa commented Aug 21, 2023

Yes, sorry, but by far and away the simplest (and really only) solution would be to take the existing files and make new versions that have the "undef" points replaced with 0 and just use those. Should be easy enough to do that. GOCART simply gets arrays that represent emissions, what would that even mean to have "undefined" emissions, either a cell has something or it doesn't in which case it's 0 i.e. no emissions seems perfectly logical. We are doing floating point math on arrays, so it needs valid array, full arrays. I'm not sure how any code could use files that have missing values unless it had a special accommodation for that at the Fortran or C array level which would be bad for performance and vectorization.

@mpagowski
Copy link
Author

mpagowski commented Aug 21, 2023 via email

@bbakernoaa
Copy link

@mpagowski That is partially true. Yes NESDIS controls the creation of the native files but we can have a preprocessor in the workflow to "fix" the files.

@mpagowski
Copy link
Author

mpagowski commented Aug 21, 2023 via email

@bena-nasa
Copy link
Contributor

bena-nasa commented Aug 22, 2023

Ok, sounds like you can fix this in your workflow. In that, there exists a spot in the workflow where you can take the file(s) as produced by NESDIS, make new file(s) from the originals that have missing value replace by 0 using a some sort of utility, then those are the files that are fed to GOCART. What is the "problem with converting all "missing" to 0s"? Are you asking how one can do this?

@mpagowski
Copy link
Author

mpagowski commented Aug 22, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants