add affinity to jobTemplate #2758

takuyaymd · 2024-12-02T09:21:21Z

Description

Add affinity, nodeSelector and torelations to jobTemplate in manager/index (corrector, creator, saver).

Related Issue

Versions

Vald Version: v1.7.14
Go Version: v1.23.3
Rust Version: v1.82.0
Docker Version: v27.3.1
Kubernetes Version: v1.31.2
Helm Version: v3.16.2
NGT Version: v2.3.4
Faiss Version: v1.9.0

Checklist

I have read the CONTRIBUTING document and completed our CLA agreement.
I have checked open Pull Requests for the similar feature or fixes?

Special notes for your reviewer

Summary by CodeRabbit

Release Notes

New Features
- Enhanced deployment flexibility with new affinity configurations for corrector, creator, and saver components.
- Introduced nodeSelector and tolerations options for more granular pod scheduling control.
- Expanded metrics and observability properties for improved monitoring and tracing.
Improvements
- Updated configuration options in the values.yaml file for better resource management and deployment strategies.
- Enhanced job template configurability with support for affinity, nodeSelector, and tolerations.

coderabbitai · 2024-12-02T09:21:29Z

📝 Walkthrough

Walkthrough

The changes in this pull request introduce significant enhancements to the ValdRelease custom resource and related Helm chart configurations. Key updates include the addition of affinity, nodeSelector, and tolerations properties for various components, improving pod scheduling flexibility. The Helm templates and values files have been updated to support these new configurations, allowing users to define scheduling constraints more granularly. Additionally, the discoverer field has been added to enhance integration with the discoverer service, improving operational efficiency.

Changes

File Path	Change Summary
charts/vald-helm-operator/crds/valdrelease.yaml	Added `affinity`, `nodeSelector`, and `tolerations` properties in `spec.corrector`, `spec.creator`, `spec.saver`, and `spec.manager` for `ValdRelease`.
charts/vald/templates/_helpers.tpl	Introduced `affinity`, `nodeSelector`, and `tolerations` keys in the job template definition for enhanced job configuration.
charts/vald/values.schema.json	Added new properties for `affinity`, `nodeSelector`, and `tolerations` in the `corrector`, `creator`, and `saver` sections of the JSON schema.
charts/vald/values.yaml	Updated `manager.index.corrector` and `manager.index.creator` sections to include `nodeSelector`, `tolerations`, `affinity`, and a new `discoverer` field.

Warning

Rate limit exceeded

@takuyaymd has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 10 minutes and 7 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 59f1a66 and 1736fc7.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)

charts/vald/values.schema.json (4)
14074-14128: Schema structure follows Kubernetes conventions but could be more specific

The addition of affinity, nodeSelector, and tolerations configurations across corrector, creator, and saver components is well-structured and follows Kubernetes API conventions. However, there are opportunities for improvement:

The schema uses generic { "type": "object" } for complex nested structures in affinity configurations. Consider defining these more explicitly to provide better validation and auto-completion support.

The descriptions could be more detailed to guide users on proper usage.

Consider enhancing the schema with more specific definitions, for example:
 "preferredDuringSchedulingIgnoredDuringExecution": {
   "type": "array",
   "description": "node affinity preferred scheduling terms",
-  "items": { "type": "object" }
+  "items": {
+    "type": "object",
+    "required": ["weight", "preference"],
+    "properties": {
+      "weight": {
+        "type": "integer",
+        "description": "Weight associated with matching the corresponding nodeSelectorTerm"
+      },
+      "preference": {
+        "type": "object",
+        "properties": {
+          "matchExpressions": {
+            "type": "array",
+            "items": {
+              "type": "object",
+              "required": ["key", "operator"],
+              "properties": {
+                "key": { "type": "string" },
+                "operator": { "type": "string" },
+                "values": {
+                  "type": "array",
+                  "items": { "type": "string" }
+                }
+              }
+            }
+          }
+        }
+      }
+    }
+  }
 }
Also applies to: 15230-15233, 16452-16456, 16471-16525, 17280-17283, 18502-18506, 22246-22300, 23051-23054, 24273-24277

14074-14128: Consider documenting recommended affinity configurations

The addition of affinity configurations to index management components (corrector, creator, saver) provides valuable control over pod scheduling. However, users would benefit from guidance on recommended configurations for different scenarios.

Consider:

Adding examples in the documentation for common use cases:

Co-locating index management jobs with specific agent pods

Spreading jobs across nodes for load balancing

Ensuring jobs run on nodes with specific hardware (e.g., SSD storage)

Documenting best practices for affinity rules to prevent scheduling conflicts between components

Also applies to: 16471-16525, 22246-22300

15230-15233: Enhance nodeSelector description with examples

The nodeSelector schema is correctly implemented but could benefit from a more descriptive comment to guide users.

Consider enhancing the description:
 "nodeSelector": {
   "type": "object",
-  "description": "node selector"
+  "description": "nodeSelector is the simplest recommended form of node selection constraint. nodeSelector is a field of PodSpec that specifies a map of key-value pairs. For the pod to be eligible to run on a node, the node must have each of the indicated key-value pairs as labels (it can have additional labels as well). Example: { \"disktype\": \"ssd\", \"environment\": \"production\" }"
 }
Also applies to: 17280-17283, 23051-23054

16452-16456: Define specific schema for toleration objects

The tolerations schema could be more specific to provide better validation and documentation.

Consider enhancing the schema:
 "tolerations": {
   "type": "array",
   "description": "tolerations",
-  "items": { "type": "object" }
+  "items": {
+    "type": "object",
+    "properties": {
+      "key": {
+        "type": "string",
+        "description": "The taint key that the toleration applies to"
+      },
+      "operator": {
+        "type": "string",
+        "description": "Operator represents a key's relationship to the value",
+        "enum": ["Equal", "Exists"]
+      },
+      "value": {
+        "type": "string",
+        "description": "Value is the taint value the toleration matches to"
+      },
+      "effect": {
+        "type": "string",
+        "description": "Effect indicates the taint effect to match",
+        "enum": ["NoSchedule", "PreferNoSchedule", "NoExecute"]
+      },
+      "tolerationSeconds": {
+        "type": "integer",
+        "description": "TolerationSeconds represents the period of time a toleration tolerates the taint"
+      }
+    }
+  }
 }
Also applies to: 18502-18506, 24273-24277
charts/vald-helm-operator/crds/valdrelease.yaml (1)
Line range hint 8409-14284: Consider adding schema validation examples

While the schema is technically correct, it would be helpful to add examples in the CRD documentation to demonstrate proper usage of these scheduling controls.

Example:
# Example affinity configuration
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/e2e-az-name
          operator: In
          values:
          - e2e-az1
          - e2e-az2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3d51a12 and 59f1a66.

📒 Files selected for processing (4)

charts/vald-helm-operator/crds/valdrelease.yaml (9 hunks)
charts/vald/templates/_helpers.tpl (2 hunks)
charts/vald/values.schema.json (9 hunks)
charts/vald/values.yaml (3 hunks)

🔇 Additional comments (10)

charts/vald/values.schema.json (1)

Line range hint 14074-24277: Document security implications of scheduling configurations

The addition of scheduling controls (affinity, nodeSelector, tolerations) could have security implications if not properly configured.

Consider adding security documentation that covers:

How these configurations affect pod isolation
Best practices for maintaining security boundaries
Recommendations for production deployments

Run the following to check for existing security documentation:

charts/vald/templates/_helpers.tpl (3)

899-902: LGTM! Clean affinity implementation

The affinity section is well-implemented using the existing vald.affinity template for consistency across components.

936-939: LGTM! Clean nodeSelector implementation

The nodeSelector section is properly implemented using toYaml for map rendering.

940-943: LGTM! Clean tolerations implementation

The tolerations section is properly implemented using toYaml for array rendering.

charts/vald/values.yaml (3)

3430-3453: LGTM! Well-structured scheduling configurations for corrector job

The scheduling configurations (nodeSelector, tolerations, affinity) are properly structured with appropriate empty defaults that can be configured per deployment needs.

3566-3589: LGTM! Well-structured scheduling configurations for creator job

The scheduling configurations mirror the corrector job structure, maintaining consistency across components.

3696-3719: LGTM! Well-structured scheduling configurations for saver job

The scheduling configurations maintain consistency with other index manager components, following the same pattern.

charts/vald-helm-operator/crds/valdrelease.yaml (3)

8409-8453: LGTM: Comprehensive affinity configuration added

The affinity configuration is well-structured and complete, including all necessary scheduling controls:

nodeAffinity with both preferred and required scheduling rules
podAffinity for co-location rules
podAntiAffinity for anti-co-location rules

Also applies to: 9778-9822, 13121-13165

9085-9087: LGTM: NodeSelector schema properly defined

The nodeSelector fields are correctly defined with x-kubernetes-preserve-unknown-fields to allow arbitrary node labels for flexible scheduling.

Also applies to: 10258-10260, 13599-13601

9765-9769: LGTM: Tolerations schema properly defined

The tolerations array schema is correctly defined with x-kubernetes-preserve-unknown-fields to support all Kubernetes toleration configurations.

Also applies to: 10939-10943, 14280-14284

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)

charts/vald/values.schema.json (3)

14074-14128: Consider documenting recommended affinity configurations for corrector

The scheduling configurations are well-structured, but since the corrector handles index corrections, consider:

Adding examples in documentation for recommended affinity rules to:

Co-locate with related index pods for better performance

Ensure high availability across zones

Handle resource-intensive correction operations

Would you like me to help draft documentation with these recommendations?

Also applies to: 15230-15233, 16452-16456

16471-16525: Consider performance implications of creator scheduling

The scheduling configurations are well-implemented. For the creator component:

Consider documenting recommended nodeSelector patterns for nodes with:

Higher CPU/memory resources for efficient index creation

Local SSDs for better I/O performance

Would you like me to help draft performance-oriented configuration examples?

Also applies to: 17280-17283, 18502-18506

22246-22300: Consider data locality for saver scheduling

The scheduling configurations are well-implemented. For the saver component:

Consider documenting recommended affinity rules for:

Co-location with persistent storage

Ensuring index saves occur close to where data is stored

High availability patterns for saved indices

Would you like me to help draft data locality-oriented configuration examples?

Also applies to: 23051-23054, 24273-24277
charts/vald-helm-operator/crds/valdrelease.yaml (2)
8409-8453: LGTM! The affinity schema implementation looks correct.

The affinity configuration follows Kubernetes standards and is consistently implemented across corrector, creator, and saver components.

Consider adding description fields to document the purpose and usage of these scheduling features. For example:
 affinity:
+  description: "Scheduling constraints to control pod placement using node/pod affinity rules"
   type: object
   properties:
     nodeAffinity:
+      description: "Constraints for scheduling pods on specific nodes based on node labels"
       type: object
Also applies to: 9778-9822, 13121-13165

Line range hint 8409-14284: Consider documenting recommended scheduling patterns

These scheduling features enable powerful deployment scenarios. Consider documenting recommended patterns for:

Using pod affinity to co-locate related components for better performance

Using pod anti-affinity to spread components for high availability

Using nodeSelector for hardware-specific requirements (e.g., GPU nodes)

Using tolerations to allow deployment on specialized nodes

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 59f1a66 and 1736fc7.

📒 Files selected for processing (4)

charts/vald-helm-operator/crds/valdrelease.yaml (9 hunks)
charts/vald/templates/_helpers.tpl (2 hunks)
charts/vald/values.schema.json (9 hunks)
charts/vald/values.yaml (3 hunks)

🔇 Additional comments (10)

charts/vald/values.schema.json (2)

14074-14128: Schema structure follows Kubernetes conventions correctly

The schema implementation for affinity, nodeSelector, and tolerations:

Correctly matches Kubernetes API object structures
Maintains consistency across all components (corrector, creator, saver)
Includes appropriate descriptions for all fields

Also applies to: 15230-15233, 16452-16456, 16471-16525, 17280-17283, 18502-18506, 22246-22300, 23051-23054, 24273-24277

Line range hint 14074-24277: Verify schema completeness against Kubernetes API

Let's verify that all standard Kubernetes scheduling fields are included.

✅ Verification successful

All standard Kubernetes scheduling fields are properly defined in the schema

Based on the verification results, the schema includes all required Kubernetes scheduling fields:

nodeSelector with proper type and description
tolerations with proper structure and description
affinity with all required subfields:
- nodeAffinity (including preferred/required scheduling terms)
- podAffinity (including preferred/required scheduling terms)
- podAntiAffinity (including preferred/required scheduling terms)

Each field has appropriate descriptions and follows the standard Kubernetes scheduling schema structure.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify schema completeness against Kubernetes scheduling fields

# Check if we have all standard Kubernetes scheduling fields
echo "Checking for required scheduling fields..."
required_fields=("affinity" "nodeSelector" "tolerations")
for field in "${required_fields[@]}"; do
  echo "Checking $field..."
  rg -i "\"$field\".*:.*{?\s*\"type\"" charts/vald/values.schema.json
done

# Verify affinity subfields
echo "Checking affinity subfields..."
affinity_fields=("nodeAffinity" "podAffinity" "podAntiAffinity")
for field in "${affinity_fields[@]}"; do
  echo "Checking $field..."
  rg -i "\"$field\".*:.*{?\s*\"type\"" charts/vald/values.schema.json
done

Length of output: 2095

Script:

#!/bin/bash
# Description: Verify schema completeness against Kubernetes scheduling fields with fixed regex

# Check if we have all standard Kubernetes scheduling fields
echo "Checking for required scheduling fields..."
required_fields=("affinity" "nodeSelector" "tolerations")
for field in "${required_fields[@]}"; do
  echo "Checking $field..."
  rg -i "\"${field}\".*type" charts/vald/values.schema.json
done

# Verify affinity subfields
echo "Checking affinity subfields..."
affinity_fields=("nodeAffinity" "podAffinity" "podAntiAffinity")
for field in "${affinity_fields[@]}"; do
  echo "Checking $field..."
  rg -i "\"${field}\".*type" charts/vald/values.schema.json
done

# Also check for any scheduling related properties
echo "Checking overall scheduling properties..."
rg -i "scheduling|affinity|selector|toleration" charts/vald/values.schema.json

Length of output: 15576

charts/vald/templates/_helpers.tpl (3)

899-902: LGTM: Affinity configuration properly implemented

The affinity configuration is correctly added to the job template following Kubernetes best practices. The implementation allows for flexible pod scheduling control through nodeAffinity, podAffinity, and podAntiAffinity settings.

936-939: LGTM: NodeSelector configuration properly implemented

The nodeSelector configuration is correctly added to the job template, allowing for node-level scheduling constraints.

940-943: LGTM: Tolerations configuration properly implemented

The tolerations configuration is correctly added to the job template, enabling pods to be scheduled on nodes with matching taints.

charts/vald/values.yaml (3)

3430-3453: LGTM: Corrector job scheduling configuration properly implemented

The scheduling configuration (affinity, nodeSelector, tolerations) for the corrector job is well-structured with appropriate defaults and helpful comments.

3566-3589: LGTM: Creator job scheduling configuration properly implemented

The scheduling configuration (affinity, nodeSelector, tolerations) for the creator job is well-structured with appropriate defaults and helpful comments.

3696-3719: LGTM: Saver job scheduling configuration properly implemented

The scheduling configuration (affinity, nodeSelector, tolerations) for the saver job is well-structured with appropriate defaults and helpful comments.

charts/vald-helm-operator/crds/valdrelease.yaml (2)

9085-9087: LGTM! The nodeSelector schema is properly defined.

The nodeSelector field is correctly implemented with x-kubernetes-preserve-unknown-fields to allow arbitrary label selectors.

Also applies to: 10258-10260, 13599-13601

9765-9769: LGTM! The tolerations schema follows Kubernetes standards.

The tolerations array is properly defined to allow standard Kubernetes toleration objects.

Also applies to: 10939-10943, 14280-14284

Co-authored-by: takuya <[email protected]>

Add affinity to jobTemplate

1736fc7

takuyaymd force-pushed the feature/manager-index/add-affinities-to-job-fix branch from 59f1a66 to 1736fc7 Compare December 2, 2024 09:22

coderabbitai bot reviewed Dec 2, 2024

View reviewed changes

coderabbitai bot approved these changes Dec 2, 2024

View reviewed changes

kmrmt approved these changes Dec 2, 2024

View reviewed changes

vankichi added actions/e2e-deploy actions/e2e-chaos actions/e2e-profiling actions/e2e-max-dim actions/backport/release/v1.7 labels Dec 2, 2024

vankichi self-requested a review December 3, 2024 04:33

vankichi approved these changes Dec 3, 2024

View reviewed changes

kmrmt merged commit 9bd9034 into vdaas:main Dec 3, 2024
172 of 176 checks passed

takuyaymd deleted the feature/manager-index/add-affinities-to-job-fix branch December 3, 2024 05:07

vdaas-ci pushed a commit that referenced this pull request Dec 3, 2024

Add affinity to jobTemplate (#2758)

d09f0d4

coderabbitai bot mentioned this pull request Dec 3, 2024

Backport PR #2758 to release/v1.7 for add affinity to jobTemplate #2760

Merged

2 tasks

kmrmt pushed a commit that referenced this pull request Dec 3, 2024

Add affinity to jobTemplate (#2758) (#2760)

27d4aea

Co-authored-by: takuya <[email protected]>

This was referenced Dec 3, 2024

[patch] Release v1.7.15 #2761

Closed

[patch] Release v1.7.15 #2763

Merged

This was referenced Dec 4, 2024

Update libs dependency #2764

Closed

🔖 🤖 Release v1.7.15 #2766

Merged

Backport PR #2766 to main for 🔖 🤖 Release v1.7.15 #2767

Merged

github-actions bot mentioned this pull request Jan 1, 2025

Monthly review time metrics report: 2024-12-01..2024-12-31 #2793

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add affinity to jobTemplate #2758

add affinity to jobTemplate #2758

takuyaymd commented Dec 2, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 2, 2024 •

edited

Loading

Walkthrough

Changes

Rate limit exceeded

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

add affinity to jobTemplate #2758

add affinity to jobTemplate #2758

Conversation

takuyaymd commented Dec 2, 2024 • edited by coderabbitai bot Loading

Description

Related Issue

Versions

Checklist

Special notes for your reviewer

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Dec 2, 2024 • edited Loading

Walkthrough

Changes

Rate limit exceeded

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

takuyaymd commented Dec 2, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 2, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)