Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable user to specify measurement controlType #97

Closed
MatloaItumeleng opened this issue Jul 21, 2021 · 2 comments · Fixed by #107
Closed

Enable user to specify measurement controlType #97

MatloaItumeleng opened this issue Jul 21, 2021 · 2 comments · Fixed by #107
Assignees
Milestone

Comments

@MatloaItumeleng
Copy link
Collaborator

MatloaItumeleng commented Jul 21, 2021

Enable user to chose which aggregate (AggregatedTotal, AbsAggregatedTotal or HashCrc32) is Atum supposed to use for a given input column, this is for method "withAggregateColumns".

Use case: In a scenario where input for aggregate column generates controlType.HashCrc32 whereas requirement to match with metrics received from control file from source , user needs to choose which aggregate Atum is supposed use for the aggregate column.

@dk1844
Copy link
Collaborator

dk1844 commented Jul 28, 2021

Hi @MatloaItumeleng, thanks for submitting this idea. I understand the current behavior of the ControlMeasureBuilder does not suit you (in essence, now: for numeric types controlType.absAggregatedTotal;for non-numeric controlType.HashCrc32 is chosen.)

So what granularity of the setup do you need? I understand that you want to choose the controlType, but is it sufficient to specify a common controlType for all columns in aggregateColumns to create the control counts? (Here, there could be a slight problem if numeric aggregation would be set for non-numeric columns, too.)

Or do you imagine that there will need to be an option to specify controlType set for every column in aggregateColumns?

// Edit: perhaps check the outline in commit f526eec - just what it might look like on the API, its just an API change/suggestion, but without implementation. Or do we want to go deeper and design a more convoluted class for the Specific case (where each aggregateColumn has it's defined controlType?)

@MatloaItumeleng
Copy link
Collaborator Author

Hi @dk1844, for this use case one column is currently being passed in the aggrigateColumns where I'd like to select one controlType (for this instance AggregateTotal ) which will be used when I pass it in ControlMeasureBuilder.forDF(df).withAggregateColumns

dk1844 added a commit that referenced this issue Aug 3, 2021
…o needs testing, cleanup & documentation update

MeasurementProcessor split into object/class to offer generic processing methods to be reusable.
dk1844 added a commit that referenced this issue Aug 4, 2021
dk1844 added a commit that referenced this issue Aug 4, 2021
…al only-default `cmBuilder.calculateMeasurement` removed
@dk1844 dk1844 linked a pull request Aug 5, 2021 that will close this issue
@dk1844 dk1844 added this to the 3.6.0 milestone Aug 5, 2021
@dk1844 dk1844 self-assigned this Aug 5, 2021
dk1844 added a commit that referenced this issue Aug 6, 2021
dk1844 added a commit that referenced this issue Aug 12, 2021
* #97 AggregateControlTypeStrategy suggested API for ControlMeasureBuilder usage

* #97 ControlMeasureBuilder.withAggregateColumn(s) implementations. Todo needs testing, cleanup & documentation update
MeasurementProcessor split into object/class to offer generic processing methods to be reusable.

* #97 ControlMeasureBuilder.withAggregateColumn(s) unit tests added (regression guard)

* #97 ControlMeasureBuilder.withAggregateColumn(s) in README.md, original only-default `cmBuilder.calculateMeasurement` removed
@dk1844 dk1844 mentioned this issue Aug 12, 2021
dk1844 added a commit that referenced this issue Aug 13, 2021
dk1844 added a commit that referenced this issue Aug 24, 2021
dk1844 added a commit that referenced this issue Aug 24, 2021
dk1844 added a commit that referenced this issue Aug 24, 2021
* #97 readme update (related to #97, too)

* #97 maven central version badge added
Zejnilovic added a commit that referenced this issue Apr 13, 2022
* #88: Add some files to configure the project (#89)
* #88: Add some files configure the project
* Git configuration
* Scalastyle support
* Ensured no Scalastyle errors
* Added CRLF for Windows *.bat and *.cmd files to .editorconfig
* Added Spark 3.1 build into the `build-all.sh` script
* Created Windows `build-all.cmd` script
* Upgrade from Spark 3.1.1 to 3.1.2 (fixes several issues of the previous version)
* `--no-transfer-progress` added to build.yml
* #97 Aggregate control type strategy (#107)
* #97 AggregateControlTypeStrategy suggested API for ControlMeasureBuilder usage
* #97 ControlMeasureBuilder.withAggregateColumn(s) implementations. Todo needs testing, cleanup & documentation update
MeasurementProcessor split into object/class to offer generic processing methods to be reusable.
* #97 ControlMeasureBuilder.withAggregateColumn(s) unit tests added (regression guard)
* #97 ControlMeasureBuilder.withAggregateColumn(s) in README.md, original only-default `cmBuilder.calculateMeasurement` removed
* [maven-release-plugin] prepare release v3.6.0
* [maven-release-plugin] prepare for next development iteration
* #97 readme update - ControlMeasureBuilder API (#110)
* #97 readme update (related to #97, too)
* #97 maven central version badge added
* Feature/113 info permissions config (#114)
* #113 atum info file permissions for hdfs loaded from `atum.hdfs.info.file.permissions` config value
  - tests use MiniDfsCluster to assert controlled correct behavior
  - test update (custom MiniDfsCluster with umask 000 allows max permissions)
  - HdfsFileUtils.DefaultFilePermissions is now publicly exposed; the user is expected to call compose the default and configured value it on his own by e.g.:
`HdfsFileUtils.getInfoFilePermissionsFromConfig().getOrElse(HdfsFileUtils.DefaultFilePermissions)`
* #77 Fix parameter handling bug in CreateInfoFileToolCSV (#78)

* #121 sbt cross comptilation
* #121 multiversion build (scala, spark, json4s).
* #121 hadoop3 used for Spark3/Scala2.12
* #121 sbt github autobuild
* #125 publish, pgp/gpg plugin added; sbt-sonatype howto referenced; model, parent prefixed by `atum-` to conform to the mvn publish, cleanup
* Upgrade dependencies and remove MiniDFSCluster
* GH Action fix
* Remove pom.xml files
* Add licence header and header check
* Fix examples

Co-authored-by: David Benedeki <[email protected]>
Co-authored-by: Daniel K <[email protected]>
Co-authored-by: Jan Scherbaum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants