Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial research #5

Closed
nikola-rados opened this issue Jul 10, 2020 · 20 comments
Closed

Initial research #5

nikola-rados opened this issue Jul 10, 2020 · 20 comments
Assignees
Labels
process Create a WPS process research Initial exploration
Milestone

Comments

@nikola-rados
Copy link
Contributor

We need to figure out how to break down the RVIC tool into usable wps processes. Explore the code and try to come to up with a few pathways for us to follow. Once we have sorted some of our ideas out we can create new issues and begin work.

Please post any notes/thoughts/discussion in this issue thread.

@nikola-rados nikola-rados added process Create a WPS process research Initial exploration labels Jul 10, 2020
@nikola-rados nikola-rados added this to the Initial Setup milestone Jul 10, 2020
@corviday
Copy link

corviday commented Jul 10, 2020

In order to generate routed streamflow, RVIC has to be run twice.

The first run is the "parameters" run - it looks at the watershed uphill from the point you care about and calculates how long it takes rain that falls on each grid square to reach the one you care about.

The second run is the "convolution" run - it starts with the equations created in the parameters run, and applies them to rainfall data to calculate the amount of water passing through the grid cell you're asking about at each time point.

You could probably make those two runs separate processes. The parameter process usually takes about ten seconds. The convolution process usually takes 10-20 minutes (it can be stopped and restarted, but I've never done that and don't know a huge amount about it). So splitting them up doesn't divide the work evenly.


RVIC takes a large number of arguments in the form of a configuration file. (There's a separate one for each of the two processes). You could also have a process to generate the configuration files, or a separate process to generate each one.

RVIC theoretically can be called from a python script and passed a dictionary of values, instead of gettng all its arguments from config files, but this isn't well documented and we've never done it here at PCIC. Might be the most convenient approach for this project, though.

@eyvorchuk
Copy link
Contributor

What does the convert run do?

@nikola-rados
Copy link
Contributor Author

Thanks for that rundown @corviday!

You could probably make those two runs separate processes.

Is there ever a time where you would only want to run just one of these processes? Or is it always one then the other to produce the ouptut you want? If the latter is the case I think it would make more sense to have it a single process (at least to start).

You could also have a process to generate the configuration files, or a separate process to generate each one.

This seems like a pretty good idea. Is there code in RVIC that already does this? Or would it be from scratch?

RVIC theoretically can be called from a python script and passed a dictionary of values

You are right in saying that this is probably the best approach for the project. I think taking in a collection of kwargs then either building a config or dict to pass into RVIC would be a good approach. Although, you also said it was a large number of args so perhaps that would be messy.

@corviday
Copy link

corviday commented Jul 10, 2020

What does the convert run do?

I think it converts data from some other format into a format that can be used by RVIC. I don't know much about it, I've never needed to use it.

Is there code in RVIC that already does this [generate config files]? Or would it be from scratch?

It would be from scratch. The config files are just a list of argument: value pairs for all the arguments to the RVIC program. Most of the arguments are the same for all PCIC runs.

Is there ever a time where you would only want to run just one of these processes? Or is it always one then the other to produce the output you want? If the latter is the case I think it would make more sense to have it a single process (at least to start).

Every convolution run to output routed streamflow data needs a parameter run.

I think if you want to generate routed streamflow for the same point, but different climate model outputs - to compare CanESM2 and ACCESS1-0 or something - they could use the same parameter run. The elevations and soil types and all those things that go into the parameter run don't change over time or between different climate models. (On the other hand, it is pretty quick to run parameter, I don't know that there's really a good case for saving its results.)

@sum1lim
Copy link
Contributor

sum1lim commented Jul 14, 2020

RVIC takes a large number of arguments in the form of a configuration file. (There's a separate one for each of the two processes). You could also have a process to generate the configuration files, or a separate process to generate each one.

It looks like we do not necessarily have to write a process to write a .cfg file(looks kind of redundant to me). Instead, both convolution_init and gen_uh_init can take dictionary input.

So, we can write a process that allows users to input both:

  1. an already written config file(this case we only have the call the function)
  2. string inputs that need to be generated into a dictionary.

@sum1lim
Copy link
Contributor

sum1lim commented Jul 14, 2020

I believe this issue has not been resolved in the most recent version of RVIC. This might be problematic in our future processes. Correct me if I am wrong.

@corviday
Copy link

You are correct; that issue has not been resolved, as far as I know. A fix I used is here.

@sum1lim
Copy link
Contributor

sum1lim commented Jul 14, 2020

Also, I got this error:

slim@devel2:~/Desktop/hydro_data$ rvic parameters sample_parameter_config.cfg
-->> Could not load xarray!! <<--
INFO:init_logger>> -------------------- INITIALIZED RVIC LOG ------------------
INFO:init_logger>> LOG LEVEL: DEBUG
INFO:init_logger>> Logging To Console: True
INFO:init_logger>> LOG FILE: ./routing/sample/logs/RVIC-20200714-171433.log
INFO:init_logger>> ----------------------------------------------------------

INFO:gen_uh_init>> plots directory is ./routing/sample/plots
INFO:gen_uh_init>> logs directory is ./routing/sample/logs
INFO:gen_uh_init>> params directory is ./routing/sample/params
INFO:gen_uh_init>> inputs directory is ./routing/sample/inputs
INFO:gen_uh_init>> aggregated directory is ./routing/sample/temp/aggregated
INFO:gen_uh_init>> remapped directory is ./routing/sample/temp/remapped
INFO:gen_uh_init>> Opened Pour Points File: ./routing/sample/inputs/sample_pour.txt
ERROR:gen_uh_init>> Error opening pour points file: ./routing/sample/inputs/sample_pour.txt
ERROR:gen_uh_init>> 'DataFrame' object has no attribute 'ix'
Traceback (most recent call last):
  File "/home/slim/.local/lib/python3.7/site-packages/rvic/parameters.py", line 188, in gen_uh_init
    pour_points.ix[i, 'names'] = strip_invalid_char(name)
  File "/home/slim/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'ix'
ERROR:write>> Traceback (most recent call last):
ERROR:write>>   File "/home/slim/.local/bin/rvic", line 61, in <module>
ERROR:write>> main()
ERROR:write>>   File "/home/slim/.local/bin/rvic", line 43, in main
ERROR:write>> parameters(args.config_file, numofproc=args.numofproc)
ERROR:write>>   File "/home/slim/.local/lib/python3.7/site-packages/rvic/parameters.py", line 51, in parameters
ERROR:write>> outlets, config_dict, directories = gen_uh_init(config_file)
ERROR:write>>   File "/home/slim/.local/lib/python3.7/site-packages/rvic/parameters.py", line 188, in gen_uh_init
ERROR:write>> pour_points.ix[i, 'names'] = strip_invalid_char(name)
ERROR:write>>   File "/home/slim/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in __getattr__
ERROR:write>> return object.__getattribute__(self, name)
ERROR:write>> AttributeError
ERROR:write>> :
ERROR:write>> 'DataFrame' object has no attribute 'ix'

I had to change this line from .ix[i, 'names'] to .loc[i, 'names]:

ERROR:write>>   File "/home/slim/.local/lib/python3.7/site-packages/rvic/parameters.py", line 188, in gen_uh_init
ERROR:write>> pour_points.ix[i, 'names'] = strip_invalid_char(name)

@eyvorchuk
Copy link
Contributor

eyvorchuk commented Jul 14, 2020

@sum1lim Where'd you get that config file? It's not in samples/configs.

@nikola-rados
Copy link
Contributor Author

After doing some reading, I think the rvic tool breaks down fairly nicely into 3 different components:

  1. parameters: "Development of impulse response functions using input datasets such as a flow direction grid, outlet locations, etc."
  2. convolution: "This utility generates streamflows by convolving the impulse response function developed in the previous setp with runoff fluxes from a land model (e.g. VIC)."1
  3. convert: "A simple conversion utility to provide users with the ability to convert old routing model setups into RVIC parameters."

I would suggest we create an issue/process for each of these items.

Each of these components are called with a config file in the main script which gives us a good starting point in terms of wps inputs. The outputs are a little more difficult. The modules don't actually directly return anything, so we'll have to dig around in workdir to get the files we are interested in (nothing we haven't done before).

There looks to be lots of samples that will likely help us with development. Furthermore, we will likely be able to use some of those files for testing.

I have yet to test out the scripts, but that is my next step. Once we have all had the chance to successfully run them I think we can get to work.

@sum1lim
Copy link
Contributor

sum1lim commented Jul 14, 2020

Just my thought from running rvic. I think it would be a lot nicer if we make our own input checking process. For example, if a user mixes up between the input files such as DOMAIN/FILE_NAME and ROUTING/FILE_NAME, this is what error messages look like:

slim@devel2:~/Desktop/hydro_data$ rvic parameters sample_parameter_config.cfg
-->> Could not load xarray!! <<--
INFO:init_logger>> -------------------- INITIALIZED RVIC LOG ------------------
INFO:init_logger>> LOG LEVEL: DEBUG
INFO:init_logger>> Logging To Console: True
INFO:init_logger>> LOG FILE: ./routing/sample/logs/RVIC-20200714-172724.log
INFO:init_logger>> ----------------------------------------------------------

INFO:gen_uh_init>> plots directory is ./routing/sample/plots
INFO:gen_uh_init>> logs directory is ./routing/sample/logs
INFO:gen_uh_init>> params directory is ./routing/sample/params
INFO:gen_uh_init>> inputs directory is ./routing/sample/inputs
INFO:gen_uh_init>> aggregated directory is ./routing/sample/temp/aggregated
INFO:gen_uh_init>> remapped directory is ./routing/sample/temp/remapped
INFO:gen_uh_init>> Opened Pour Points File: ./routing/sample/inputs/sample_pour.txt
INFO:gen_uh_init>> Opened UHbox File: ./routing/sample/inputs/uhbox.csv
DEBUG:read_netcdf>> Reading input data variables: ['lon', 'lat', 'frac', 'area', 'mask'], from file: ./routing/sample/inputs/sample_routing_domain.nc
ERROR:gen_uh_init>> Error opening FDR file
Traceback (most recent call last):
  File "/home/slim/.local/lib/python3.7/site-packages/rvic/parameters.py", line 227, in gen_uh_init
    fdr_shape = fdr_data[fdr_var].shape
KeyError: 'Flow_Direction'
ERROR:write>> Traceback (most recent call last):
ERROR:write>>   File "/home/slim/.local/bin/rvic", line 61, in <module>
ERROR:write>> main()
ERROR:write>>   File "/home/slim/.local/bin/rvic", line 43, in main
ERROR:write>> parameters(args.config_file, numofproc=args.numofproc)
ERROR:write>>   File "/home/slim/.local/lib/python3.7/site-packages/rvic/parameters.py", line 51, in parameters
ERROR:write>> outlets, config_dict, directories = gen_uh_init(config_file)
ERROR:write>>   File "/home/slim/.local/lib/python3.7/site-packages/rvic/parameters.py", line 227, in gen_uh_init
ERROR:write>> fdr_shape = fdr_data[fdr_var].shape
ERROR:write>> KeyError
ERROR:write>> :
ERROR:write>> 'Flow_Direction'

KeyError: 'Flow_Direction' does not really give meaningful information to users(Does this mean the file has or does not have the Key?). It would be better if there was a logging message saying "Invalid format: FILENAME does not have Flow_direction variable".

Since RVIC uses many input files, it is so confusing and easy to mix up between the input files. Therefore, a message that directs users to try another attempt with the inputs in the right order could help them reduce time on running this program. For example, if Flow_Direction is in the file that was given for DOMAIN/FILE_NAME, a message like "The file given to DOMAIN could possibly be a ROUTING file. Try another attempt by reordering the inputs." can be logged out.

@nikola-rados
Copy link
Contributor Author

@sum1lim I got that same ix error you did. Do you want to put an issue into the RVIC repo and then suggest your change in a PR to them?

@sum1lim
Copy link
Contributor

sum1lim commented Jul 14, 2020

Do you want to put an issue into the RVIC repo and then suggest your change in a PR to them?

@nikola-rados I can surely do that!

@eyvorchuk
Copy link
Contributor

eyvorchuk commented Jul 14, 2020

When I view the sample pour_points csv files through GitHub, it shows proper lons and lats columns, but I get

ERROR:gen_uh_init>> Pour Points File must include variables (lons, lats) or (x, y)

when trying to run rvic parameters on either of the sample parameter configuration files. Reading the rows from the csv file gives me this output

['version https://git-lfs.github.com/spec/v1']
['oid sha256:ce13e2314839c6c2a560db0381b32df6e09300f35a0f4591dc256e0690f9610e']
['size 518']

rather than what's shown in GitHub. These were the commands I ran:

>>> import csv
>>> with open('samples/pour_points/columbia_sample_pour_points.csv') as csv_file:
...     csv_reader = csv.reader(csv_file, delimiter=',')
...     for row in csv_reader:
...             print(row)
... 

Does anyone know why the output would be different?

@nikola-rados
Copy link
Contributor Author

Hmmm, I'm not sure. @corviday do you have any ideas?

@nikola-rados
Copy link
Contributor Author

@sum1lim and @eyvorchuk I have assigned issues to each of you. Once we clear up some of the running issues I will close this issue and you can begin working on the processes.

@sum1lim
Copy link
Contributor

sum1lim commented Jul 14, 2020

@eyvorchuk yeah, I have the same problem. The csv file is corrupted somehow. You probably want to just copy and paste the content you see in gitHub and make a new csv file.

@eyvorchuk
Copy link
Contributor

eyvorchuk commented Jul 14, 2020

I think it'd be good to put rvic in one of the requirements files (probably requirements_dev.txt) so that we can import the RVIC modules we need in our processes. Are there any other packages we should add?

@sum1lim
Copy link
Contributor

sum1lim commented Jul 20, 2020

RVIC theoretically can be called from a python script and passed a dictionary of values, instead of gettng all its arguments from config files, but this isn't well documented and we've never done it here at PCIC.

Unfortunately, I have found out that the ability to pass dictionary into RVIC has not been applied to the latest version 1.1.0.post1. We will probably have to wait until version 1.1.1 is released. Currently, we only can pass config files at the moment.

Do you want to put an issue into the RVIC repo and then suggest your change in a PR to them?

Also, my PR is still pending yet.

@nikola-rados
Copy link
Contributor Author

That's fine we can add that in later once they get around to it. Maybe you can @ the lead developer over there to get their attention. It looks like the repo hasn't been touched in a while so they may not have seen your request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
process Create a WPS process research Initial exploration
Projects
None yet
Development

No branches or pull requests

4 participants