Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements data transformations in TSA module #2121

Merged
merged 33 commits into from
Jul 10, 2023

Conversation

j-bryan
Copy link
Collaborator

@j-bryan j-bryan commented May 23, 2023


Pull Request Description

What issue does this change request address? (Use "#" before the issue to link it, i.e., #42.)

#2120

What are the significant changes in functionality due to this change request?

A new "Transformer" class is added to the TSA module which can apply any transformer class whose interface is defined by the sklearn.base.TransformerMixin base class to any target or group of target variables. The specifics of how the residual and composite signals are assembled are changed slightly to allow for non-additive signal composition so that nonlinear transformation functions can be accommodated.


For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

  • 1. Review all computer code.
  • 2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
  • 3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
  • 4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass.
  • 5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True.
  • 6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
  • 7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
  • 8. If an analytic test is changed/added is the the analytic documentation updated/added?
  • 9. If any test used as a basis for documentation examples (currently found in raven/tests/framework/user_guide and raven/docs/workshop) have been changed, the associated documentation must be reviewed and assured the text matches the example.

@j-bryan
Copy link
Collaborator Author

j-bryan commented May 23, 2023

@dylanjm and @GabrielSoto-INL , I'd love to get your input on how this is implemented.

j-bryan added 2 commits May 23, 2023 14:03
… adds the ability to use custom transformers from a separate python file
@wangcj05 wangcj05 changed the title Implements data transformations in TSA module WIP: Implements data transformations in TSA module Jun 7, 2023
@wangcj05 wangcj05 changed the title WIP: Implements data transformations in TSA module [WIP]: Implements data transformations in TSA module Jun 7, 2023
@j-bryan j-bryan requested a review from PaulTalbot-INL June 23, 2023 15:07
Copy link
Collaborator

@PaulTalbot-INL PaulTalbot-INL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor initial changes to consider; I haven't carefully checked all the implementation, but I did notice a few things.

I'm still not convinced fit increases readability over characterize, but I'm not completely against it. I do have questions about what it means to be a generator, however.

ravenframework/SupervisedLearning/SyntheticHistory.py Outdated Show resolved Hide resolved
ravenframework/TSA/AlgorithmClassification.png Outdated Show resolved Hide resolved
@@ -192,24 +246,6 @@ def getParamsAsVars(self, params):
rlz[f'{baseWave}__{stat}'] = value
return rlz


def generate(self, params, pivot, settings):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this need renaming?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Fourier being implemented as a TimeSeriesTransformer instead of a TimeSeriesGenerator, I wanted to avoid using the same generate() method name used in TimeSeriesGenerator. If we revert this back to being a generator (and relaxing the restriction on generators being stochastic), I'll revert the name change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, help me understand that, and maybe talking this over in a meeting makes more sense. Fourier projects the signal from signal space to a Hilbert space, but the result is not a transformation of the data, but a characterization of the data, as it seems to me. Help me see how you're seeing Fourier as a transformer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it as a transformer in the sense that you give it a signal and it returns a modified signal. It certainly characterizes as well, but I see the characterization of the signal as separate thing from the operation of modifying the signal based on the results of that characterization. I'd certainly be open to discussing this and some of the broader design decisions in a meeting instead of through a Github PR review..

ravenframework/TSA/README.md Outdated Show resolved Hide resolved
@@ -209,7 +191,8 @@ def generate(self, params, pivot, settings):
@ In, settings, dict, additional settings specific to algorithm
@ Out, synthetic, np.array(float), synthetic estimated model signal
"""

# FIXME This method isn't currently tested or used anywhere. Trying to call this method results
# in an error due to a mismatch of array shapes when calculating sigMatSynthetic. Remove this method?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ravenframework/TSA/Transformers/Normalizers.py Outdated Show resolved Hide resolved
######################################
# CONSTRUCTION #
######################################
def createZeroFilter(targets):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstrings

if printComment:
print(f'checking {str(dtype)} {comment} | {value} != {expected}')

def checkFloat(comment, value, expected, tol=1e-10, update=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I've really wanted to move to using pytest and standardize all these checks across RAVEN unit tests, but have never had the bandwidth to do it. It's not required (by any means) for this work, but if there's spare bandwidth and interest, it would improve the unit tests dramatically.

@j-bryan j-bryan requested a review from PaulTalbot-INL July 6, 2023 18:13
Copy link
Collaborator

@PaulTalbot-INL PaulTalbot-INL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one discussion point, on the Venn diagram of algorithms. Thanks for making those changes.

ravenframework/TSA/AlgorithmClassification.png Outdated Show resolved Hide resolved
@j-bryan j-bryan requested a review from PaulTalbot-INL July 6, 2023 20:18
PaulTalbot-INL
PaulTalbot-INL previously approved these changes Jul 6, 2023
@PaulTalbot-INL
Copy link
Collaborator

Can you convert this from WIP so the automated tests can run? I assume that's what's holding them back.

@PaulTalbot-INL PaulTalbot-INL linked an issue Jul 6, 2023 that may be closed by this pull request
10 tasks
@j-bryan j-bryan changed the title [WIP]: Implements data transformations in TSA module Implements data transformations in TSA module Jul 6, 2023
@moosebuild
Copy link

All jobs on b7d80db : invalidated by @wangcj05

Testing with raven library updates

@moosebuild
Copy link

Job Mingw Test on b7d80db : invalidated by @wangcj05

Copy link
Collaborator

@PaulTalbot-INL PaulTalbot-INL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment to check.

@@ -6,6 +6,7 @@
type = OrderedCSV
output = 'Basic/chz.csv'
windows_gold = 'Basic/windowsChz.csv'
mac_gold = 'Basic/windowsChz.csv'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so windows and mac agree but linux differs now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems that the difference is just a phase of one of the Fourier modes being pi on some OSes and -pi on others. Do you want me to change that file name now that that gold file is being used for the Mac test, too?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I'm fine with this.

@PaulTalbot-INL PaulTalbot-INL merged commit 137d53f into idaholab:devel Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TASK] Add data transformations to TSA module
4 participants