Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HOGER homogenization #240

Open
wants to merge 29 commits into
base: develop
Choose a base branch
from
Open

Implement HOGER homogenization #240

wants to merge 29 commits into from

Conversation

paulf81
Copy link
Collaborator

@paulf81 paulf81 commented Nov 19, 2024

Implement HOGER homogenization

This PR implements the HOGER algorithm for homogenization of yaw/wind direction signals. It detects and removes changes in the calibration of these channels.

HOGER was developed by Paul Poncet (@engie-paul-poncet) and Thomas Duc (@engie-thomas-duc) of Engie, and Rubén González-Lope (@rglope) and Alvaro Gonzalez Salcedo (@alvarogonzalezsalcedo) of CENER within the TWAIN project.

This PR contains a new module northing_offset_change_hoger.py. Also a new example is included 03_northing_calibration_hoger.ipynb. This new example will be included in the online documentation as it shows the combined usage of different yaw/wd calibration modules.

Some notes on connections

This functionality implements something akin to what is proposed in #148 (@ejsimley )

It also could be seen to address Issue #106

It also mirrors some features included in wind-up (@aclerc)

@Bartdoekemeijer , you'll see in the new example how the new module can fit within the current calibration workflow, will be curious for your opinions there.

@engie-paul-poncet, @engie-thomas-duc, @rglope, @alvarogonzalezsalcedo @misi9170 @ejsimley @Bartdoekemeijer @aclerc please use this PR to provide any feedback comments.

Open Questions:

Todo

  • Agree on defaults
  • Documentation and credit sufficient?
  • Make sure all new functions tested
  • Confirm example works as expected
  • Confirm documentation builds locally

Related issue, if one exists
#148
#106

@paulf81 paulf81 added the enhancement An improvement of an existing feature label Nov 19, 2024
@paulf81 paulf81 self-assigned this Nov 19, 2024

# Postprocess all the data to get the main jumps for each wind turbine
d2 = d.copy()
d2["Class"] = discretize(d["Knot"], threshold=100)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of the threshold argument for the discretize function should be the same as the one used for the decission tree regressor. It was a mistake on my part. Thus,

d2["Class"] = discretize(d["Knot"], threshold=threshold)

would be more appropiate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching, fixed!

@paulf81
Copy link
Collaborator Author

paulf81 commented Nov 21, 2024

@rglope , I have a question in writing the tests, I realized I wasn't clear on the units in threshold. In the discretize function, it defines the minimum distance between nodes to differentiate classes. Here the units is number of indices right? It is also used as the min_samples_split, which would mean the number of points for a leaf right? I think I understand then, the two are consistent, if a leaf must have N points, then it stands to reason, to different knots should be N points apart. Do I have that right? Would you mind to see if you agree with the tests I've added?

@rglope
Copy link

rglope commented Nov 21, 2024

@rglope , I have a question in writing the tests, I realized I wasn't clear on the units in threshold. In the discretize function, it defines the minimum distance between nodes to differentiate classes. Here the units is number of indices right? It is also used as the min_samples_split, which would mean the number of points for a leaf right?

That's right. The threshold is the number of indices or registries in the SCADA. In the case of the discretize function, as you say, it sets the distance between the nodes. For the Decision Tree Regressor is considered the same way to create a node (or knot, which is what I call them) and a leaf. The only thing is that to create a leaf, it is needed only half the number of indices.

regr = DecisionTreeRegressor(max_depth=max_depth, min_samples_split=threshold, min_samples_leaf=threshold // 2, ccp_alpha=ccp_alpha)

More information can be found here: https://scikit-learn.org/dev/modules/generated/sklearn.tree.DecisionTreeRegressor.html. The original R function only had a minimun length argument, so during the translation to Python it seemed a good proportion.

I think I understand then, the two are consistent, if a leaf must have N points, then it stands to reason, to different knots should be N points apart. Do I have that right?

As I said, the minimum number of indices to create a leaf is 'threshold // 2', so maybe the argument for the discretize function should also be 'threshold // 2'. That way it would follow the reasoning you write (and the one I was actually going for).

Would you mind to see if you agree with the tests I've added?

I'll try them as soon as I can!

Finally, just an observation about the ccp_alpha argument if it helps. It is a pruning parameter to reduce the complexity of the regression. In the example it was set to a specific value since it made the regression cleaner, but I don't think there is necessarily a default value for every case. More information can be found here: https://scikit-learn.org/1.5/auto_examples/tree/plot_cost_complexity_pruning.html

@paulf81
Copy link
Collaborator Author

paulf81 commented Nov 22, 2024

Thanks for these responses @rglope ! I implemented the following change:

As I said, the minimum number of indices to create a leaf is 'threshold // 2', so maybe the argument for the discretize function should also be 'threshold // 2'. That way it would follow the reasoning you write (and the one I was actually going for).

@paulf81
Copy link
Collaborator Author

paulf81 commented Nov 22, 2024

ok tests pass and I confirmed docs all build locally. Good to merge for me once a few reviews are in, thanks all!

@paulf81 paulf81 requested a review from rglope November 22, 2024 17:26
Copy link

@rglope rglope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't add a comment on the 03_northing_calibration_hoger.ipynb, but there is an "apparant" which I think is "apparent". Besides that detail, the tests work and check all the main features of the algorithm and the example shows exactly what it does. Everything looks good on my end!

flasc/data_processing/northing_offset_change_hoger.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@misi9170 misi9170 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the examples here use a wind direction that is steadily ramping (with a very small amount of noise added). How difficult/feasible would it to be to move to a more "reasonable" wind direction signal (e.g., a Gaussian wind direction with a standard deviation of, say, 10 degrees and a non-zero mean to provide bias)? I think that could also help make the jump clearer.

It could also be interesting to compare the northing calibration procedure before and after the jump is detected by HOGER; i.e., demonstrate that the northing calibration doesn't work well if there is a jump in the bias

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can for sure change over the yaw from ramping to fixed. That's no problem. Running the calibration procedure twice could be instructive, but at the cost of a much longer notebook, since it generates a lot of text and plots, so I'd vote against that step

@misi9170
Copy link
Collaborator

misi9170 commented Nov 25, 2024

I think we should also add documentation describing the HOGER algorithm a bit (even just a paragraph would help a lot for new users).

@paulf81
Copy link
Collaborator Author

paulf81 commented Nov 26, 2024

apparant

Fixed!

@paulf81
Copy link
Collaborator Author

paulf81 commented Nov 26, 2024

I think we should also add documentation describing the HOGER algorithm a bit (even just a paragraph would help a lot for new users).

You mean beyond what is in docstring? This will appear in autodocs...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants