Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the Extdata.rc #346

Closed
4 tasks done
helpyuan opened this issue Sep 28, 2023 · 9 comments
Closed
4 tasks done

Questions about the Extdata.rc #346

helpyuan opened this issue Sep 28, 2023 · 9 comments
Assignees

Comments

@helpyuan
Copy link

helpyuan commented Sep 28, 2023

Name and Institution (Required)

Name: liumy
Institution: AMS

Confirm you have reviewed the following documentation

Description of your issue or question

Hi all, I have the following questions about Extdata.rc:
1. About the Clim.
The description of Clim in the GCHP manual is as follows [https://gchp.readthedocs.io/en/latest/user-guide/config-files/ExtData_rc.html]: Enter Y if the file is a 12 month climatology, otherwise enter N. If you specify it is a climatology ExtData the data can be on either one file or 12 files if they are templated appropriately with one per month. My understanding of this is that Clim is either N or Y. However, in the template of Extdata.rc.fullchem, I found these forms, such as,
1695877582742
image
The above figures show Clim parameters such as D and 2019, I want to know what D and 2019 represent here? Can I use N or Y instead of D and 2019?

2. About the Refresh.
There is an example in the GCHP manual that introduces Refresh, as follows:
For example, a template in the form %y4-%m2-%d2T12:00:00 will cause the variable to be updated at the start of a new day (i.e. when the clock hits 2007-08-02T00:00:00 it will update the variable but the time it will use for reading and interpolation is 2007-08-02T12:00:00).
What I don't understand is that if set to %y4-%m2-%d2T12:00:00, shouldn't the data be updated at 2007-08-02 12:00:00? Perhaps F% y4-%m2-%d2T12:00:00 can update the data at 2007-08-02 12:00:00?.
In addition, what I would like to know is that I have 1 emission source data that changes every 3 hours. Do I need to set my Refresh to %y4-%m2-%d2T%h2:00:00?
Also, I can see that the following is in Refresh format:
1695879606349
Does F2010-%m2-01T00:00:00 mean that this data will only be updated in 2010? If the mode runs until 2011, will this data not be updated?

3. Perhaps this is a fundamental question.
I would like to know how to know if the file is a 12 month climatology in Clim.

The above are some of my questions, and I am grateful for any answers.

Please provide as much detail as possible. Always include the GCHP version number and any relevant configuration and log files.

@lizziel
Copy link
Contributor

lizziel commented Sep 28, 2023

Hi @helpyuan, thanks for asking about this. ExtData.rc is indeed confusing. MAPL ExtData is going to be overhauled in MAPL 3, which we plan to include in GCHP v15.0 next year, and with that update will be a switch to a new yaml-format ExtData configuration file. We hope it will be easier to understand when that change happens. For now we are still using the old way which has several rather hidden features. I will do my best to explain below.

Regarding the "D" entry for the climatology column, this means day of week. It is a special case of values that recur in time throughout the year. The file contains 7 values per month, one for each day of the week in the month. It was implemented specifically for GEOS-Chem NEI scale factors. You should only use "D" if you have a day-of-week file.

Regarding the year entry for the climatology column, this is a feature of MAPL that we typically do not use but that snuck in with an update. If you replace "Y" with the year then MAPL will always use that year as climatology for all years. It can be redundant depending on how you use it. There is actually an update going into the next GCHP version to change how it is specified for AEIC19 (the example you posted) to be more clear, and to fix the typo in those lines (%d2 should be 01). It is more consistent to use "Y" for clim and then put the year in the refresh template (F2019-%m2-01). I recommend doing this if add any new climatology inputs.

Regarding Refresh, I am not sure that example is correct. It appears to have been copied from the GEOS wiki page for ExtData. If you have 3-hourly data I suggest copying what is done for the 3-hourly meteorology fields. For example, see here for reading 3-hourly data that is stored in daily files and the first time in the file is 00:00:00. It gets more complicated if the times are offset from 00:00:00. An example of reading 3-hourly fields whose times start at 01:30:00 rather than 00:00:00 is here. There is an extra entry at the end of the line that specifies the a reference start time (with 1 hr 30 min offset) and the frequency (3hr). For that example each file has only one time, as show in the filename template.

Regarding Refresh F2010-%m2-01T00:00:00, that's a good question and worth checking. In my answer above I said this will always use the 2010 data for all years, but your question makes me wonder if that is really what is happening. We should check this.

I'm not sure I understand your question #3. If the Clim column specifies Y then it assumes 12-month climatology. Are you asking about the file itself, or how ExtData reads and uses the data?

@sdeastham
Copy link
Contributor

Hi @lizziel - one note: I think that using the target year in "clim" is preferable to using "Y", as I think that "Y" implies (requires?) that the data are monthly only. I'm not 100% sure on that but wanted to give you a heads up (for example, if true this would cause a problem for something like the daily AEIC 2019 data).

@lizziel
Copy link
Contributor

lizziel commented Sep 28, 2023

Thanks @sdeastham. Yes, my understanding is Y means monthly only. The daily AEIC should not use Y. I checked my 14.2.1 fixes for AEIC and I left the daily as this (it was the monthly that needed fixing, and only for performance so as not to try to refresh daily):
AEIC19_DAILY_NO kg/m2/s 2019 Y F%y4-%m2-%d2T00:00:00 none none NO ./HcoDir/AEIC2019/v2022-03/2019/%m2/AEIC_2019%m2%d2.0.5x0.625.36L.nc

Regarding monthly climatology, do you know if it is problematic to hard-code year in the refresh template, e.g. what we do for EDGAR scale factors?
POW 1 Y Y F2010-%m2-01T00:00:00 none none POW ./HcoDir/EDGARv43/v2016-11/EDGAR_v43.Seasonal.1x1.nc

@sdeastham
Copy link
Contributor

I think hard-coding the year in the refresh template is fine - I'm not aware of any issues with that!

@lizziel lizziel self-assigned this Sep 29, 2023
@lizziel
Copy link
Contributor

lizziel commented Sep 29, 2023

@helpyuan, I updated the ExtData.rc page on ReadTheDocs to be more clear. In doing so I figured out the meaning of the GEOS wiki example for Refresh that was puzzling you. The updated page is here, including a better explanation of the Refresh entry.

@helpyuan
Copy link
Author

helpyuan commented Oct 6, 2023

Thank you very much for your answer @lizziel, which deepened my understanding of Extdata.rc. I will study the examples you mentioned carefully.

"I'm not sure I understand your question #3. If the Clim column specifies Y then it assumes 12-month climatology. Are you asking about the file itself, or how ExtData reads and uses the data?"
I think I have obtained the answer from your conversation with @sdeastham. If I have a data of 12 months, such as 12x180x360, then Clim should choose the Y, and in other cases, choose the N or D. Is that so?

@helpyuan
Copy link
Author

helpyuan commented Oct 7, 2023

Assuming GCHP simulates a full year of 2019.

If I have a weekly scale factor file, the information of file is as follows:
weekly
values are 0,1,2,3,4,5,6.
Should I write Extdata.rc like this:
WEEKLY 1 D Y F2006-01-%d2T00:00:00 none none weekly_scale_factors weekly_scale_factors.nc

And if I have a diurnal scale factor file, the information of file is as follows:
diurnal
values are 0,1,2,3,...,21,22,23.
Should I write Extdata.rc like this:
DIURNAL 1 N Y F2006-01-01T%h2:00:00 none none diurnal_scale_factors diurnal_scale_factors.nc

In addition, if the diurnal scale factor is written like this:
DIURNAL 1 N Y F%y4-%m2-%d2T%h2:00:00 none none diurnal_scale_factors diurnal_scale_factors.nc
Can the 2019 simulation still use this scale factor from 2006?

@lizziel
Copy link
Contributor

lizziel commented Nov 2, 2023

Hi @helpyuan, we currently only use a day-of-week scale factor for NEI99, and the file containing it has different scale factors for every month for an entire year. This is slightly different than your example. I suggest configuring ExtData.rc and running it, then inspect results to see if it is doing what you expect. You can enable maximum prints for ExtData.rc following instructions on ReadTheDocs to get very detailed log information in allPEs.log.

For diurnal scale factors, see the example for EDGAR under title "Diurnal scale factor" in ExtData.rc. I believe it matches what you wrote above but double-check that. Regarding when the 2019 simulation can use 2016, the answer is yes. However, best to configure it and try it. If needed you could hard-code the year.

@helpyuan
Copy link
Author

helpyuan commented Nov 6, 2023

Thank you for answering my confusion @lizziel! My question has been basically resolved, and I think this issue can be closed.

@helpyuan helpyuan closed this as completed Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants