Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport PR #2758 to release/v1.7 for add affinity to jobTemplate #2760

Conversation

vdaas-ci
Copy link
Collaborator

@vdaas-ci vdaas-ci commented Dec 3, 2024

Description

Add affinity, nodeSelector and torelations to jobTemplate in manager/index (corrector, creator, saver).

Related Issue

Versions

  • Vald Version: v1.7.14
  • Go Version: v1.23.3
  • Rust Version: v1.82.0
  • Docker Version: v27.3.1
  • Kubernetes Version: v1.31.2
  • Helm Version: v3.16.2
  • NGT Version: v2.3.4
  • Faiss Version: v1.9.0

Checklist

Special notes for your reviewer

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced deployment flexibility with new affinity configurations for corrector, creator, and saver components.
    • Introduced nodeSelector and tolerations options for more granular pod scheduling control.
    • Expanded metrics and observability properties for improved monitoring and tracing.
  • Improvements

    • Updated configuration options in the values.yaml file for better resource management and deployment strategies.
    • Enhanced job template configurability with support for affinity, nodeSelector, and tolerations.

Copy link

cloudflare-workers-and-pages bot commented Dec 3, 2024

Deploying vald with  Cloudflare Pages  Cloudflare Pages

Latest commit: d09f0d4
Status: ✅  Deploy successful!
Preview URL: https://5ac2a86a.vald.pages.dev
Branch Preview URL: https://backport-release-v1-7-featur-imvm.vald.pages.dev

View logs

Copy link
Contributor

coderabbitai bot commented Dec 3, 2024

📝 Walkthrough

Walkthrough

The pull request introduces significant modifications to the valdrelease.yaml, _helpers.tpl, values.schema.json, and values.yaml files within the vald Helm chart. Key additions include properties for affinity, nodeSelector, and tolerations across various components, enhancing pod scheduling and resource management. These properties are defined in detail, allowing for specific configurations related to node affinity and toleration of taints. The changes aim to standardize configurations and expand deployment flexibility for Kubernetes resources.

Changes

File Path Change Summary
charts/vald-helm-operator/crds/valdrelease.yaml Added affinity, nodeSelector, and tolerations properties in spec.corrector, spec.creator, spec.saver, and spec.manager.
charts/vald/templates/_helpers.tpl Updated vald.index_job to include affinity, nodeSelector, and tolerations in job specifications.
charts/vald/values.schema.json Introduced affinity, nodeSelector, and tolerations properties for corrector, creator, and saver.
charts/vald/values.yaml Added nodeSelector, tolerations, and affinity configurations in manager.index.corrector and manager.index.creator.

Possibly related PRs

  • add affinity to jobTemplate #2758: This PR directly relates to the main PR as it also adds affinity, nodeSelector, and tolerations features to the jobTemplate, enhancing the scheduling and resource allocation for the same components (corrector, creator, and saver) in the valdrelease.yaml file.

Suggested labels

priority/low, size/XXXL, area/makefile, actions/backport/release/v1.7

Suggested reviewers

  • vankichi
  • kmrmt

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vdaas-ci
Copy link
Collaborator Author

vdaas-ci commented Dec 3, 2024

[CHATOPS:HELP] ChatOps commands.

  • 🙆‍♀️ /approve - approve
  • 🍱 /format - format codes and add licenses
  • /gen-test - generate test codes
  • 🏷️ /label - add labels
  • 🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
charts/vald/values.schema.json (2)

15230-15233: Consider enhancing nodeSelector schema validation

The nodeSelector schema is correctly implemented but could benefit from additional validation rules. Consider adding:

  • Pattern validation for keys (must be valid Kubernetes label keys)
  • Pattern validation for values (must be valid Kubernetes label values)

Example enhancement:

 "nodeSelector": {
   "type": "object",
   "description": "node selector",
+  "additionalProperties": {
+    "type": "string",
+    "pattern": "^[a-zA-Z0-9][-._a-zA-Z0-9]*$"
+  },
+  "propertyNames": {
+    "pattern": "^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*\\/)?[a-z0-9]([-a-z0-9]*[a-z0-9])?$"
+  }
 }

Also applies to: 17280-17283, 23051-23054


16452-16456: Consider enhancing tolerations schema validation

The tolerations schema is correctly implemented but could benefit from additional validation rules to match Kubernetes API specifications.

Example enhancement:

 "tolerations": {
   "type": "array",
   "description": "tolerations",
-  "items": { "type": "object" }
+  "items": {
+    "type": "object",
+    "properties": {
+      "key": { "type": "string" },
+      "operator": {
+        "type": "string",
+        "enum": ["Equal", "Exists"]
+      },
+      "value": { "type": "string" },
+      "effect": {
+        "type": "string",
+        "enum": ["NoSchedule", "PreferNoSchedule", "NoExecute"]
+      },
+      "tolerationSeconds": { "type": "integer" }
+    }
+  }
 }

Also applies to: 18502-18506, 24273-24277

charts/vald/values.yaml (3)

3430-3453: LGTM! Consider adding examples in comments for common affinity use cases.

The affinity configuration for the corrector job looks good and follows Kubernetes best practices. To improve usability, consider adding comment examples showing common affinity rules like:

  • Co-locating with specific services
  • Spreading across zones
  • Hardware requirements

3566-3589: LGTM! Consider adding examples in comments for common affinity use cases.

The affinity configuration for the creator job looks good and follows Kubernetes best practices. To improve usability, consider adding comment examples showing common affinity rules like:

  • Co-locating with specific services
  • Spreading across zones
  • Hardware requirements

3696-3719: LGTM! Consider adding examples in comments for common affinity use cases.

The affinity configuration for the saver job looks good and follows Kubernetes best practices. To improve usability, consider adding comment examples showing common affinity rules like:

  • Co-locating with specific services
  • Spreading across zones
  • Hardware requirements
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b9bd4ba and d09f0d4.

📒 Files selected for processing (4)
  • charts/vald-helm-operator/crds/valdrelease.yaml (9 hunks)
  • charts/vald/templates/_helpers.tpl (2 hunks)
  • charts/vald/values.schema.json (9 hunks)
  • charts/vald/values.yaml (3 hunks)
🔇 Additional comments (9)
charts/vald/templates/_helpers.tpl (3)

899-902: Addition of affinity settings to the job template is appropriate

The affinity field is correctly added to the job specification when .Job.affinity is provided, enabling node affinity configurations for job scheduling.


936-939: Addition of nodeSelector to the job template is appropriate

The nodeSelector field is properly included in the job specification when .Job.nodeSelector is defined, allowing for selection of nodes matching specified labels.


940-943: Addition of tolerations to the job template is appropriate

The tolerations field is correctly added to the job specification when .Job.tolerations is provided, allowing the job to tolerate specific node taints.

charts/vald/values.schema.json (2)

14074-14128: LGTM! Well-structured affinity configurations

The affinity schema definitions for corrector, creator, and saver components are well-structured and correctly follow the Kubernetes pod scheduling API. The schema includes all necessary affinity types (nodeAffinity, podAffinity, podAntiAffinity) with proper descriptions and type definitions.

Also applies to: 16471-16525, 22246-22300


Line range hint 14074-24277: Verify configuration examples in documentation

The schema changes for adding affinity, nodeSelector, and tolerations to the jobTemplate are well-implemented and consistent across all components (corrector, creator, and saver). Since this is a backport to release/v1.7, ensure that:

  1. The documentation includes examples of these new configuration options
  2. The changes are reflected in the default values.yaml
  3. Any upgrade notes mention these new configuration capabilities
charts/vald-helm-operator/crds/valdrelease.yaml (4)

8409-8453: LGTM: Affinity configuration follows Kubernetes standards

The affinity configuration structure correctly implements the standard Kubernetes affinity schema with all required fields:

  • nodeAffinity with preferred/required scheduling rules
  • podAffinity with preferred/required scheduling rules
  • podAntiAffinity with preferred/required scheduling rules

9085-9087: LGTM: NodeSelector configuration follows Kubernetes standards

The nodeSelector configuration is properly defined as an object with x-kubernetes-preserve-unknown-fields to allow arbitrary label selectors.


9765-9769: LGTM: Tolerations configuration follows Kubernetes standards

The tolerations configuration is correctly defined as an array of objects with x-kubernetes-preserve-unknown-fields to allow standard Kubernetes toleration rules.


Line range hint 8409-14284: Verify consistent implementation across components

The affinity, nodeSelector and tolerations configurations are consistently implemented across the corrector, creator and saver components.

✅ Verification successful

Based on the output from the shell scripts, I can see that the affinity configuration is consistently implemented across all components in the CRD, with identical structure for nodeAffinity, podAffinity and podAntiAffinity fields. Let me verify the nodeSelector and tolerations with a modified script.


Based on the verification results, I can now provide the final response:

Scheduling controls are consistently implemented across components
The affinity, nodeSelector, and tolerations configurations show consistent implementation across all components:

  • All affinity configurations follow the same structure with nodeAffinity, podAffinity, and podAntiAffinity
  • All nodeSelector fields are defined as objects with x-kubernetes-preserve-unknown-fields
  • All tolerations are defined as arrays of objects with x-kubernetes-preserve-unknown-fields
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify consistent implementation of scheduling controls across components

# Check for consistent affinity structure
echo "Checking affinity structure..."
ast-grep --pattern 'affinity:
  type: object
  properties:
    nodeAffinity:
      $$$
    podAffinity:
      $$$
    podAntiAffinity:
      $$$'

# Check for consistent nodeSelector structure
echo "Checking nodeSelector structure..."
ast-grep --pattern 'nodeSelector:
  type: object
  x-kubernetes-preserve-unknown-fields: true'

# Check for consistent tolerations structure  
echo "Checking tolerations structure..."
ast-grep --pattern 'tolerations:
  type: array
  items:
    type: object
    x-kubernetes-preserve-unknown-fields: true'

Length of output: 66762


Script:

#!/bin/bash
# Check for consistent nodeSelector structure
echo "Checking nodeSelector structure..."
rg -A 1 'nodeSelector:' charts/vald-helm-operator/crds/valdrelease.yaml

# Check for consistent tolerations structure  
echo "Checking tolerations structure..."
rg -A 2 'tolerations:' charts/vald-helm-operator/crds/valdrelease.yaml

Length of output: 2242

@kmrmt kmrmt merged commit 27d4aea into release/v1.7 Dec 3, 2024
40 of 41 checks passed
@kmrmt kmrmt deleted the backport/release/v1.7/feature/manager-index/add-affinities-to-job-fix branch December 3, 2024 05:57
This was referenced Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants