-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HOGER homogenization #240
base: develop
Are you sure you want to change the base?
Conversation
|
||
# Postprocess all the data to get the main jumps for each wind turbine | ||
d2 = d.copy() | ||
d2["Class"] = discretize(d["Knot"], threshold=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of the threshold argument for the discretize function should be the same as the one used for the decission tree regressor. It was a mistake on my part. Thus,
d2["Class"] = discretize(d["Knot"], threshold=threshold)
would be more appropiate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching, fixed!
@rglope , I have a question in writing the tests, I realized I wasn't clear on the units in threshold. In the discretize function, it defines the minimum distance between nodes to differentiate classes. Here the units is number of indices right? It is also used as the min_samples_split, which would mean the number of points for a leaf right? I think I understand then, the two are consistent, if a leaf must have N points, then it stands to reason, to different knots should be N points apart. Do I have that right? Would you mind to see if you agree with the tests I've added? |
That's right. The threshold is the number of indices or registries in the SCADA. In the case of the discretize function, as you say, it sets the distance between the nodes. For the Decision Tree Regressor is considered the same way to create a node (or knot, which is what I call them) and a leaf. The only thing is that to create a leaf, it is needed only half the number of indices.
More information can be found here: https://scikit-learn.org/dev/modules/generated/sklearn.tree.DecisionTreeRegressor.html. The original R function only had a minimun length argument, so during the translation to Python it seemed a good proportion.
As I said, the minimum number of indices to create a leaf is 'threshold // 2', so maybe the argument for the discretize function should also be 'threshold // 2'. That way it would follow the reasoning you write (and the one I was actually going for).
I'll try them as soon as I can! Finally, just an observation about the ccp_alpha argument if it helps. It is a pruning parameter to reduce the complexity of the regression. In the example it was set to a specific value since it made the regression cleaner, but I don't think there is necessarily a default value for every case. More information can be found here: https://scikit-learn.org/1.5/auto_examples/tree/plot_cost_complexity_pruning.html |
Thanks for these responses @rglope ! I implemented the following change:
|
ok tests pass and I confirmed docs all build locally. Good to merge for me once a few reviews are in, thanks all! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't add a comment on the 03_northing_calibration_hoger.ipynb, but there is an "apparant" which I think is "apparent". Besides that detail, the tests work and check all the main features of the algorithm and the example shows exactly what it does. Everything looks good on my end!
examples_artificial_data/01_raw_data_processing/01_northing_calibration.ipynb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the examples here use a wind direction that is steadily ramping (with a very small amount of noise added). How difficult/feasible would it to be to move to a more "reasonable" wind direction signal (e.g., a Gaussian wind direction with a standard deviation of, say, 10 degrees and a non-zero mean to provide bias)? I think that could also help make the jump clearer.
It could also be interesting to compare the northing calibration procedure before and after the jump is detected by HOGER; i.e., demonstrate that the northing calibration doesn't work well if there is a jump in the bias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can for sure change over the yaw from ramping to fixed. That's no problem. Running the calibration procedure twice could be instructive, but at the cost of a much longer notebook, since it generates a lot of text and plots, so I'd vote against that step
I think we should also add documentation describing the HOGER algorithm a bit (even just a paragraph would help a lot for new users). |
Fixed! |
You mean beyond what is in docstring? This will appear in autodocs... |
Implement HOGER homogenization
This PR implements the HOGER algorithm for homogenization of yaw/wind direction signals. It detects and removes changes in the calibration of these channels.
HOGER was developed by Paul Poncet (@engie-paul-poncet) and Thomas Duc (@engie-thomas-duc) of Engie, and Rubén González-Lope (@rglope) and Alvaro Gonzalez Salcedo (@alvarogonzalezsalcedo) of CENER within the TWAIN project.
This PR contains a new module
northing_offset_change_hoger.py
. Also a new example is included03_northing_calibration_hoger.ipynb
. This new example will be included in the online documentation as it shows the combined usage of different yaw/wd calibration modules.Some notes on connections
This functionality implements something akin to what is proposed in #148 (@ejsimley )
It also could be seen to address Issue #106
It also mirrors some features included in wind-up (@aclerc)
@Bartdoekemeijer , you'll see in the new example how the new module can fit within the current calibration workflow, will be curious for your opinions there.
@engie-paul-poncet, @engie-thomas-duc, @rglope, @alvarogonzalezsalcedo @misi9170 @ejsimley @Bartdoekemeijer @aclerc please use this PR to provide any feedback comments.
Open Questions:
Todo
Related issue, if one exists
#148
#106