Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add e2e test for train API #2199

Open
wants to merge 87 commits into
base: master
Choose a base branch
from

Conversation

helenxie-bit
Copy link
Contributor

What this PR does / why we need it:
Add an e2e test in the test_e2e_train_api.py for the train API.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
@coveralls
Copy link

coveralls commented Aug 9, 2024

Pull Request Test Coverage Report for Build 11021210555

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 10963743211: 0.0%
Covered Lines: 66
Relevant Lines: 66

💛 - Coveralls

Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
@helenxie-bit helenxie-bit changed the title Add e2e test for train API [WIP]Add e2e test for train API Aug 9, 2024
@helenxie-bit helenxie-bit changed the title [WIP]Add e2e test for train API [WIP] Add e2e test for train API Aug 9, 2024
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
@google-oss-prow google-oss-prow bot removed the lgtm label Sep 3, 2024
Copy link

New changes are detected. LGTM label has been removed.

Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
@andreyvelich
Copy link
Member

I think, as part of this E2Es we need to build and verify that Trainer and Storage Initializer images are functional with the PR changes.
Since we don't allow to override those image right now, maybe we can solve this problem by implementing logic that I described here: #2247 (comment)

WDYT @deepanker13 @helenxie-bit @kubeflow/wg-training-leads ?

@tenzen-y
Copy link
Member

I think, as part of this E2Es we need to build and verify that Trainer and Storage Initializer images are functional with the PR changes. Since we don't allow to override those image right now, maybe we can solve this problem by implementing logic that I described here: #2247 (comment)

WDYT @deepanker13 @helenxie-bit @kubeflow/wg-training-leads ?

That sounds reasonable. Actually, #2247 has already been merged. So, we can do it in this PR.
@helenxie-bit Could you update this PR to verify the image every for PR?

Signed-off-by: helenxie-bit <[email protected]>
…train-api

merge upstream/master into local branch
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
@helenxie-bit
Copy link
Contributor Author

helenxie-bit commented Sep 22, 2024

I think, as part of this E2Es we need to build and verify that Trainer and Storage Initializer images are functional with the PR changes. Since we don't allow to override those image right now, maybe we can solve this problem by implementing logic that I described here: #2247 (comment)
WDYT @deepanker13 @helenxie-bit @kubeflow/wg-training-leads ?

That sounds reasonable. Actually, #2247 has already been merged. So, we can do it in this PR. @helenxie-bit Could you update this PR to verify the image every for PR?

Sure, I attempted to add the code for building and verifying the Storage Initializer and Trainer images, but the e2e tests failed due to a 'no space left on device' error when loading the images to the kind cluster. I’ve already tried pruning unused images and build caches, but there still isn’t enough space. Do you have any suggestions on how to resolve this? @andreyvelich @tenzen-y

@andreyvelich
Copy link
Member

@tenzen-y Any thoughts on this ? Do we need to migrate to Minikube similar to Katib ?
For Katib we load a lot of images to the CI cluster without any issues: https://github.com/kubeflow/katib/blob/master/test/e2e/v1beta1/scripts/gh-actions/build-load.sh#L55

@helenxie-bit
Copy link
Contributor Author

@andreyvelich @tenzen-y According to today's discussion, I have tried "docker system prune -a -f" but it still cannot fix the problem. Should we try to build a smaller image specifically for e2e test? Do you have any suggestions about the base image that we can use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants