Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display of daily climatic data #3067

Closed
rdstern opened this issue May 12, 2017 · 60 comments
Closed

Display of daily climatic data #3067

rdstern opened this issue May 12, 2017 · 60 comments

Comments

@rdstern
Copy link
Collaborator

rdstern commented May 12, 2017

This is one of the last set of features I am keen to see soon and certainly in the first release. I think the only other is the crops dialogue that should be easy once the start, end and summary (already specified) have been done.

In the old Instat there is a dialogue to display daily data in 2 different ways. I assume this might again be a single dialogue with our new buttons at the top to permit the results as either a Table or a Graph.
Also the main initial fields will again be the same as usual. Perhaps we need a column for the day of the month here? At least for the table. We need the day of the year for the graph.

I give the requirements below, but not (yet) the layout. I am happy to discuss the layout once we know who is doing what, and also when it might be done. I suspect the table part might depend on the progress with the main tabulation. The graphs would be great - and useful as soon as someone has the time.

1) As a "table".
a) This has the months across (columns) and the day of the month as the rows.
b) it is a trivial example of a table (or an unstack by the month factor).
c) It is "trivial" as a table because it is usually used to display single values, rather than a summary.
d) It may later become a summary when we deal better with within-day data, though it would usually have a "shifted day" rather than starting at midnight.
e) Ideally it could be able to include a symbol for a range of values (or a single special value (e.g. "." for zero, so rainfalls stand out.
f) This would display the data in the output window, as in the old Instat.
g) We would be able to choose the number of decimals to display.
h) It could be for up to 12 months and could start from any month, but the default would be 12 months starting from January.
i) It could have margins after each year (i.e. for each month as well as annual - including sum (for rainfall),mean (for temperature) count (from a certain threshold, min, max, and number of missing.
j) A non-trivial set of margins is the annual values. That's odd because we don't need the extra margin on the right! (Once we have margins in the table Danny says this is trivial.)

2 As a graph or set of graphs
a) The graphs were hardly used in the old Instat. They are "needle plots - sort of bar charts - for rainfall and possibly line charts for temperatures and other elements.
b) here they could be great, because of facets, i.e. many years on a single graph.
c) Default is for a graph per year.
d) Another alternative could be for a graph per station - over all the years (filtered usually). This is very common in other software, but usually not so easy, because it confuses the seasonality with the trend.
e) ideally labels could be months or day of the year.
f) Ideally it would also be able to indicate missing values clearly.
g) David has the commands for much of this from his work for Guyana.

@rdstern
Copy link
Collaborator Author

rdstern commented May 14, 2017

David prepared a set of graphs for daily rainfall of exactly the type we would like to see. This comment shows what he did. Then the next one gives ideas for what might be in R-Instat.
There are 3 elements, namely

  1. a bar geom for the rainfall.
  2. rug plot to indicate missing values
  3. text geom for values that are outside the range of the graph.
    Here are examples of the plots:

Daily Rainfall Plot.pdf

Here is the R code from David's plots which present this information for 4 stations:

pdf("Daily Rainfall Plot.pdf", width = 28, height = 18) 
for (station_name in c("BLAIRMONT", "TIMEHRI", "GEORGETOWN", "ANNAI")){
  for (first_year in c(1880,1890,1900,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010)){
    subdata<-subset(subset(allstations,station==station_name),year>=first_year&year<first_year+10)
    if (nrow(subdata)!=0){
      if (nrow(subset(subdata,RR>100))!=0){
        print(ggplot(subdata, aes(x=doy, y=RR)) + geom_bar(stat="identity", fill="blue") + geom_rug(data=subset(subdata, is.na(RR)==1), mapping = aes(x=doy), sides="b", color="red") + theme_minimal() + coord_cartesian(ylim=c(0,100)) + scale_x_continuous(breaks=c(1,32, 61, 92, 122, 153, 183, 214, 245, 275, 306, 336, 367), labels = c(month.abb,""), limits =c(0,367)) + facet_wrap(~year,ncol=2) + ggtitle(paste0(station_name, " Daily Rainfall")) + theme(panel.grid.minor = element_blank(), plot.title = element_text(hjust = 0.5, size=20), axis.title = element_text(size=16)) + xlab("Day of the year") + ylab("Rain amount in mm") + geom_text(data= subset(subdata,RR>100), mapping= aes(y=100, label=RR), size = 3))
      } else {
        print(ggplot(subdata, aes(x=doy, y=RR)) + geom_bar(stat="identity", fill="blue") + geom_rug(data=subset(subdata, is.na(RR)==1), mapping = aes(x=doy), sides="b", color="red") + theme_minimal() + coord_cartesian(ylim=c(0,100)) + scale_x_continuous(breaks=c(1,32, 61, 92, 122, 153, 183, 214, 245, 275, 306, 336, 367), labels = c(month.abb,""), limits =c(0,367)) + facet_wrap(~year,ncol=2) + ggtitle(paste0(station_name, " Daily Rainfall")) + theme(panel.grid.minor = element_blank(), plot.title = element_text(hjust = 0.5, size=20), axis.title = element_text(size=16)) + xlab("Day of the year") + ylab("Rain amount in mm"))
      }
    }
  }
}
dev.off()

provided you name your data files "BLAIRMONT", "TIMEHRI", "GEORGETOWN" and "ANNAI" before running the prep script.

@dannyparsons
Copy link
Contributor

Good to see David finally contributing some code to R-Instat!

Can we come up with a design for the dialog? Then it should be easy for someone to do with the code going into an instat object method.

@dannyparsons
Copy link
Contributor

From Roger:
This now could be a relatively simple dialogue - at least for the graphs. And very exciting, partly because they really exploit ggplot well. Also this is a common next step after the inventory plot in the data analysis. It is sort of the equivalent of the one variable graphs for general data.

  1. This dialogue falls naturally in the prepare section. We then also have the possibility of repeating it (almost) under the Describe section, where we propose a separate menu for each type of element.
  2. It may (later) include tables, but we can start with the graphs. And whether the tables will be a separate dialogue or a button at the top (as for frequencies dialogues) can be decided later.
  3. There is a Station field (as in the other climatic dialogues). If completed, then the results are for just that station. If blank, the results are for all stations (possibly filtered).
  4. There is a Date field (as always!)
  5. There is a Year field (could be the shifted year - depending on the Define Climatic settings. If that is completed, then the graphs are just for that year. If blank, then for all years. (could again be filtered.)
  6. There is an Element field. The default is rainfall, but it can be completed with any climatic element.
  7. There is an x-axis field. This is either DOY or Date.
  8. There are 2 Y-axis Limits fields, possible labels are Y-axis Lower and then Higher.
  9. The elements will have defaults geoms associated with them, but this can be changed - possibly as a "Daily Options sub-dialogue. For rainfall it will be a bar, and for temperatures it will be a line. For most things other than rainfall it will be a line.
  10. There will be a checkbox with label "Indicate Missing with Rugplot". Default is ticked for rainfall and un-ticked for other elements - because then the line will have a gap to indicate missing.
  11. Similarly a checkbox with "Indicate Values outside Y-range".
  12. David also wants an additional Multiple receiver, with a checkbox on whether it is visible or not. This would be used if multiple elements are plotted together. This could be max and min temperatures together. Or it could be rainfall data and estimated value from satellite. It could even be max and min temperature plus rainfall (despite Hadley Wickham not liking this sort of graph.)

@dannyparsons
Copy link
Contributor

This looks good. How about the element receiver always being multiple? Then there is a option to either do all elements together on the graph or include "element" as a facet variable.

Although with multiple elements, how do we specify the geoms for each variable? In a list box on a sub dialog? Unless we restrict to one geom for all elements, but that wouldn't be very useful for rain and temperature together then.

@rdstern
Copy link
Collaborator Author

rdstern commented Jul 24, 2017

I suggest as follows for the dialogue:

  1. Name change for the menu item. Delete climatic, so it is simply Climatic > Prepare > Display Daily Data.
  2. At the top there are our new buttons with Table and Graph (in that order) as the 2 options. (We don't have the Both option, because that would complicate the dialogue too much, at least for now.)
  3. The dialogue starts as do the other climatic dialogues, i.e. with our usual selector on the left. And a set of receivers on the right.
  4. There are then fields for single receivers, first is Station, second is Date, third is Year and 4th is Element.
  5. Now just for Table - so possibly in a group box, labelled Table.
  6. Decimals (if easy?) we usually have significant figures. An up-down from 0 to 3 with default 1.
  7. A check box, default checked with label Use Code. Then the same small set of controls used for the Spells dialogue as the "Condition". (Though that needs some simple corrections first.)
  8. Then a drop down with a set of codes. They include ".", "++", "--", perhaps they allow the user to type a code, but need to limit the length displayed then.
  9. Then a checkbox with the label Margin(s). Default unchecked, and it may be disabled initially. When checked it has further check-boxes with labels: Total, Mean, Max, Min, Missing.
  10. Two radio buttons with initial label "Order", then 2 options By Year or By Station. Default is By Station, i.e. all the years for the first station, then the second station, etc. Alternative is to have all the stations for the first year, etc.
  11. Possibly have an optional checkbox for HTML. (Alternative is just to unstack the data by month and then display.)

@Lunalo
Copy link
Contributor

Lunalo commented Aug 1, 2017

@africanmathsinitiative/developers

I will take on this

@dannyparsons
Copy link
Contributor

Great. We should then decide on the implementation of the R code.

The graph will probably be it's own method since it's a fairly specialised plot.

The table is special case of producing a table from "summarised" data, because we could do the same thing from the data frame produced by Column Summaries dialog. I'll work on having this general method and then we can discuss how to use it for this dialog.

@Lunalo after doing the dialog could you try creating the R method for doing the graph? It's a similar idea to the inventory plot in that we create an instat object method to do a specific ggplot graph.

@Lunalo
Copy link
Contributor

Lunalo commented Aug 1, 2017

@dannyparsons

Great. We we trying to come up with the method yesterday with @stevenndungu Let him push and then I continue working on it

@Lunalo
Copy link
Contributor

Lunalo commented Aug 8, 2017

@dannyparsons

Do you know what RR is in the above code?

@rdstern
Copy link
Collaborator Author

rdstern commented Aug 8, 2017

It is (in general) just the data column being plotted. In the code it was rainfall with the limit of 100mm. In the more general code it is whatever is plotted, and the 100mm would not be fixed, but would also be something that could be changed in the dialogue

@Lunalo
Copy link
Contributor

Lunalo commented Aug 8, 2017

Great, So it is the Rain column?

@rdstern
Copy link
Collaborator Author

rdstern commented Aug 8, 2017

In David’s example, yes. But in general (and in the dialogue) it could alternatively be tmax etc. It is to display the daily data of whatever element is needed.

@Lunalo
Copy link
Contributor

Lunalo commented Aug 8, 2017

Thanks, I will look for a better name for the parameter

@dannyparsons
Copy link
Contributor

element is what we've been using in other methods.

@dannyparsons
Copy link
Contributor

@Lunalo where have you got to on this? Can you share the R code you've written for the graph if you need help with that? Is the table part implemented from the code I sent before?

@rdstern rdstern reopened this Mar 31, 2018
@shadrackkibet
Copy link
Collaborator

@dannyparsons i was wondering if adding summary_count_missing which is already one of our summary functions to the list of summaries in the display_daily_table function will fix 2 above. Please have a look at it.

@shadrackkibet
Copy link
Collaborator

@dannyparsons ?

@dannyparsons
Copy link
Contributor

Yes that's added and done. I also changed the default summary to "sum" only. The function doesn't allow no summaries so that seemed the sensible default.

@rdstern
Copy link
Collaborator Author

rdstern commented Apr 10, 2018

There is a bug here, so this dialogue doesn't run.

Here is the message:

**Error running R command(s)

Error in eval(expr, pf) : attempt to apply non-function

The error occurred in attempting to run the following R command(s):

.temp_val <- capture.output(InstatDataObject$display_daily_table(data_name="Dodoma", climatic_element="Rain", date_col="Date", year_col="Year", Misscode="m", monstats=c(sum="sum")))

OK**

This was with the Dodoma data for rainfall. It did have a filter in operation to limit the number of years to those from 2004.

@dannyparsons
Copy link
Contributor

dannyparsons commented Apr 12, 2018

Unfortunately I had forgotten to include the file containing the new display daily method into the compiled version and so this file is missing in the latest version, sorry about that mistake. It will be included in the next version.
If a workaround is useful, adding the attached file (unzipped) to this folder will fix the issue:
C:\Program Files (x86)\AMI\R-Instat 0.4.13\static\InstatObject\R\Backend_Components
DisplayDaily.zip

@rdstern rdstern added the bug label Apr 18, 2018
@rdstern
Copy link
Collaborator Author

rdstern commented Apr 18, 2018

The dialogue now works. Thanks.

There is one oddity of [1] printed at the start of the 5 lines between each year.

But there are one (or more) bugs when the option is added to include the missing values to the list of summaries.
1) A short-term fix is to disable this option for now.
2) It gives the wrong answer. For the example I used it gave 122.0 each and every month.
3) There should be no decimal anyway.
4) The layout is affected, so December is now separated from the first 11 months. (This could be simply because the label n_missing is too long. Replace it by n_miss.)
5) Once used (say use sum and miss) when the miss option is dropped it gives an error, as follows:

Error running R command(s)
Error in parse(text = paste("suppressWarnings(", monstats[st], "(dat[,loc],na.rm=TRUE))")) :
:1:30: unexpected ','
The error occurred in attempting to run the following R command(s):
.temp_val <- capture.output(InstatDataObject$display_daily_table(data_name="Guinee2", climatic_element="Rain", date_col="Date", year_col="year", station_col="Station", Misscode="m", monstats=c(sum="sum", summary_count_missing="")))
OK

I guess it doesn't like the fact we keep the bit of the code in bold?

@rdstern rdstern reopened this Apr 18, 2018
@dannyparsons
Copy link
Contributor

I've changed the label to n_miss. I also can replicate the error in 5. but I don't get the wrong answers in 2. Which data did you use for that?
3. This is how all the summaries are formatted in the function so we can't really change that.

@rdstern
Copy link
Collaborator Author

rdstern commented Apr 18, 2018

Output from plot daily Koundara.docx

Here is the output I had. Look first at the second table in Word, which is how it looks in the output window. It is data from Guinee, for Koundara. I will send Danny a copy. It will soon be in the Instat library.

As an aside, when I copy (as RTF) and do the default paste into Word I get the first table that has kept the Courier font, but lost the new lines. The second table was the option to copy just the text. Then I changed back to Courier in Word.

@dannyparsons
Copy link
Contributor

Fixed all apart from 3. This would take a bit more messing about with the code which I thought wasn't a priority at this time, but could still be done.

@rdstern rdstern modified the milestones: 0.4.14, 0.4.16 Apr 27, 2018
@rdstern
Copy link
Collaborator Author

rdstern commented Apr 27, 2018

I am still not able to get this dialogue to work! I am starting to regret using the code from Helen - which you must be also.

  1. A minor point is that OK becomes enabled when clicking on the option to display the count of missing values.
  2. More major is the message below. I was analysing the data from Guinea, Koundara, where I have filtered for just Koundara and also for Year < 1983. The message I get is as follows:

Error running R command(s)
Error in parse(text = paste("suppressWarnings(", monstats[st], "(dat[,loc],na.rm=TRUE))")) :
:1:30: unexpected ','
1: suppressWarnings( (dat[,loc],
^
The error occurred in attempting to run the following R command(s):
.temp_val <- capture.output(InstatDataObject$display_daily_table(data_name="Guinee2", climatic_element="Rain", date_col="Date1", year_col="year", station_col="Station", Misscode="m", monstats=c(sum="sum", summary_count_missing="")))
OK

The R-command was:
# Code generated by the dialog, Display Daily Data
InstatDataObject$display_daily_table(data_name="Guinee2", climatic_element="Rain", date_col="Date1", year_col="year", station_col="Station", Misscode="m", monstats=c(sum="sum", summary_count_missing=""))

The results displayed are then from the previous command that was run.

@rdstern rdstern reopened this Apr 27, 2018
@maxwellfundi maxwellfundi modified the milestones: 0.4.16, 0.4.17 May 14, 2018
@shadrackkibet
Copy link
Collaborator

More major is the message below. I was analysing the data from Guinea, Koundara, where I have filtered for just Koundara and also for Year < 1983. The message I get is as follows:

@rdstern I am unable to replicate this error, How did you get that?

@rdstern
Copy link
Collaborator Author

rdstern commented May 23, 2018

I now tried again with the same data file. And it crashed. Then I opened again and tried and it all worked fine. Confusing. Let's close for now if the other small issue is fixed. Then wait and see the problem.

@dannyparsons
Copy link
Contributor

This was because summary_count_missing="" was added to the function incorrectly, which has now been corrected in the dialog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants