Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Feature/s3 extension defaults #192

Closed
wants to merge 15 commits into from

Conversation

adwk67
Copy link
Member

@adwk67 adwk67 commented Mar 4, 2022

Description

It is not immediately evident from the Apache Druid documentation that only one s3-endpoint can be used per druid cluster: the endpoint cannot be entered in the console UI and must be supplied as a runtime property (only buckets/baseKeys can be entered in the console). The credentials can be passed either as environment variables, as runtime properties, or directly via the UI, so they do not need to exist as runtime properties necessarily.

The upshot of this is that the druid-s3-extension requires an s3-endpoint for it to be initialized: making the loading of the s3-extension conditional upon the provision of an s3:endpoint in our CR means that s3-functionality will only be activated in the console if it can be used (i.e. that an endpoint is also available).

The following have been tested:

  • quickstart: include s3-extension in list and set a value for druid.s3.endpoint.url
    • ingest sample- and s3-data (adding credentials/bucket in UI)
  • operator + hdfs-deep-storage + without s3 definition in CR: ingestion of sample data
  • operator + hdfs-deep-storage + with s3 definition in CR: ingestion of s3-data
  • operator + s3-deep-storage: ingestion of s3 and non-s3 data
  • existing integration tests

Review Checklist

  • Code contains useful comments
  • (Integration-)Test cases added (or not applicable)
  • Documentation added (or not applicable)
  • Changelog updated (or not applicable)
  • Cargo.toml only contains references to git tags (not specific commits or branches)
  • Helm chart can be installed and deployed operator works (or not applicable)

Once the review is done, comment bors r+ (or bors merge) to merge. Further information

@adwk67 adwk67 marked this pull request as ready for review March 4, 2022 17:56
@adwk67 adwk67 self-assigned this Mar 4, 2022
@adwk67 adwk67 requested a review from a team March 4, 2022 17:57
@fhennig
Copy link
Contributor

fhennig commented Mar 7, 2022

I'm gonna review this after lunch

Copy link
Contributor

@fhennig fhennig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I just went through the Review Checklist, the only thing that's missing is an update for the Changelog. I'll test the functionality now

@fhennig
Copy link
Contributor

fhennig commented Mar 8, 2022

I think the "Integration tests added" is a good point here, I see that the current integration test has a Druid cluster definition with the s3 section defined, we should have a second test without the s3 case to test both. I think that test is part of the ticket here and it would need to be created.

Edit: At the moment our ingestion test uses a dummy s3 config. We should remove that since it should not be needed anymore with this PR. And i think it would be good to have two ingestion tests:

  • ingest the quickstart data once with the dummy s3 config
  • ingest once without the dummy s3 config.
    And use HDFS for both.

It would also be good to split out the smoke test again, I think it's good to have that as a seperate test. Sorry for all my change requests!

Copy link
Contributor

@fhennig fhennig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now!

@adwk67
Copy link
Member Author

adwk67 commented Mar 9, 2022

bors merge

bors bot pushed a commit that referenced this pull request Mar 9, 2022
## Description

It is not immediately evident from the Apache Druid documentation that only one s3-endpoint can be used per druid cluster: the endpoint cannot be entered in the console UI and must be supplied as a runtime property (only buckets/baseKeys can be entered in the console). The credentials can be passed either as environment variables, as runtime properties, or directly via the UI, so they do not need to exist as runtime properties necessarily.

The upshot of this is that the druid-s3-extension requires an s3-endpoint for it to be initialized: making the loading of the s3-extension conditional upon the provision of an `s3:endpoint` in our CR means that s3-functionality will only be activated in the console if it can be used (i.e. that an endpoint is also available).

The following have been tested:

- quickstart: include s3-extension in list and set a value for `druid.s3.endpoint.url`
  - ingest sample- and s3-data (adding credentials/bucket in UI)
- operator + hdfs-deep-storage + without s3 definition in CR: ingestion of sample data
- operator + hdfs-deep-storage + with s3 definition in CR: ingestion of s3-data
- operator + s3-deep-storage: ingestion of s3 and non-s3 data
- existing integration tests
@bors
Copy link
Contributor

bors bot commented Mar 9, 2022

Pull request successfully merged into main.

Build succeeded:

@bors bors bot changed the title Feature/s3 extension defaults [Merged by Bors] - Feature/s3 extension defaults Mar 9, 2022
@bors bors bot closed this Mar 9, 2022
@bors bors bot deleted the feature/s3-extension-defaults branch March 9, 2022 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants