Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use NVMe datadisk on Yellow if it's present on first boot #3686

Merged
merged 1 commit into from
Nov 21, 2024

Conversation

sairon
Copy link
Member

@sairon sairon commented Nov 21, 2024

If HAOS on Yellow is booted for the first time with NVMe data disk present, it should be preferred over the empty eMMC data partition. This will ease reinstall of the system and migration from CM4 to CM5. All other data disks (e.g. if a USB drive is used for them) are still treated as before, requiring manual adoption using the Supervisor repair.

Summary by CodeRabbit

  • New Features

    • Improved handling of data partitions based on system configuration, allowing for more intelligent management.
    • Introduced conditional logic for distinguishing between internal eMMC and NVMe data partitions based on the VARIANT_ID.
  • Bug Fixes

    • Enhanced functionality to ensure only one type of internal data partition is active at a time.
  • Refactor

    • Restructured the script for better readability and maintainability by encapsulating logic into a new function.

@sairon sairon added the board/yellow Home Assistant Yellow label Nov 21, 2024
@sairon sairon requested a review from agners November 21, 2024 13:59
Copy link

coderabbitai bot commented Nov 21, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes involve a significant restructuring of the haos-data-disk-detach script to improve its functionality in managing data partitions. A new function, disable_data_partition, has been introduced to encapsulate the logic for disabling non-root data partitions. Additionally, the script now includes conditional checks based on the VARIANT_ID, allowing it to differentiate between data partitions on internal eMMC and NVMe disks, ensuring only one type is active. For non-"yellow" variants, the original logic for handling external disks remains unchanged.

Changes

File Path Change Summary
buildroot-external/rootfs-overlay/usr/libexec/haos-data-disk-detach Restructured script to introduce disable_data_partition(partition) function; added logic for handling data partitions based on VARIANT_ID.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Script
    participant eMMC
    participant NVMe
    participant ExternalDisk

    User->>Script: Run detach script
    Script->>Script: Check VARIANT_ID
    alt VARIANT_ID is "yellow"
        Script->>eMMC: Check eMMC data partition
        Script->>NVMe: Check NVMe data partition
        alt Both partitions found
            Script->>eMMC: Disable eMMC data partition
        else Only one type found
            Script->>eMMC/NVMe: Keep active
        end
    else VARIANT_ID is not "yellow"
        Script->>ExternalDisk: Disable non-root external disks
    end
Loading

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
buildroot-external/rootfs-overlay/usr/libexec/haos-data-disk-detach (2)

18-49: Optimize device enumeration wait time

The script includes a fixed 10-second sleep to wait for devices to enumerate (sleep 10s at line 6). This can unnecessarily slow down the boot process if devices are ready earlier.

Consider implementing a loop to poll for the presence of data partitions, reducing the wait time when devices are ready sooner and timing out after a maximum period. Here’s how you might modify the script:

 # Wait up to 10s for devices to enumerate
-sleep 10s
+timeout=10
+while [ $timeout -gt 0 ]; do
+    datapartitions=$(blkid --match-token LABEL="hassos-data" --output device)
+    [ -n "$datapartitions" ] && break
+    sleep 1
+    timeout=$((timeout - 1))
+done

This change will check every second for up to 10 seconds and proceed as soon as datapartitions is not empty.


38-41: Simplify condition check for disabling external data disks

The condition if [ "$rootdev" != "$datadev" ]; then is used within the default case to disable external data disks. Since this condition is common and used elsewhere, consider abstracting it or ensuring consistency throughout the script.

No code changes are necessary, but be mindful of maintaining consistency in condition checks to improve code readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between af833db and 58eccc8.

📒 Files selected for processing (1)
  • buildroot-external/rootfs-overlay/usr/libexec/haos-data-disk-detach (1 hunks)
🔇 Additional comments (2)
buildroot-external/rootfs-overlay/usr/libexec/haos-data-disk-detach (2)

12-12: Ensure VARIANT_ID is set before using it

The script sources /etc/os-release to access VARIANT_ID. However, if VARIANT_ID is not set or /etc/os-release is missing or corrupted, the script may not behave as expected.

Consider adding a check to ensure VARIANT_ID is set:

 . /etc/os-release

+if [ -z "$VARIANT_ID" ]; then
+    echo "Error: VARIANT_ID is not set. Exiting..."
+    exit 1
+fi

This ensures the script exits gracefully if VARIANT_ID is unavailable.


22-44: ⚠️ Potential issue

Ensure all non-root data partitions are properly disabled

In the "yellow" variant case, data partitions on mmc* devices are only assigned to emmc_data_partition if they are on the root device. However, data partitions on mmc* devices that are not the root device are not being disabled, potentially leaving unwanted data partitions active.

Consider modifying the script to disable data partitions on mmc* devices when they are not on the root device. Apply this diff to address the issue:

             case "${datadev}" in
                 mmc*)
                     # Data partition on internal eMMC
-                    if [ "$rootdev" = "$datadev" ]; then
+                    if [ "$rootdev" = "$datadev" ]; then
                         emmc_data_partition="${datapart}"
+                    else
+                        echo "Found extra eMMC data disk device on ${datapart}, marking it disabled..."
+                        disable_data_partition "${datapart}"
                     fi
                     ;;

To verify if any non-root mmc* data partitions are left enabled, run the following script:

This script will output any hassos-data partitions on mmc* devices that are not the root device, indicating they may not be properly disabled.

Comment on lines +14 to +17
disable_data_partition() {
e2label "${1}" hassos-data-dis
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling in disable_data_partition function

The disable_data_partition function does not handle possible errors from the e2label command. If e2label fails (e.g., due to a corrupted filesystem or permission issues), the script will continue without disabling the data partition, which could lead to inconsistent system states.

Consider adding error handling to check the exit status of e2label and take appropriate action if it fails. Apply this diff to enhance the function:

 disable_data_partition() {
     e2label "${1}" hassos-data-dis
+    if [ $? -ne 0 ]; then
+        echo "Error: Failed to disable data partition ${1}" >&2
+        # Optionally, exit or handle the error as needed
+    fi
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
disable_data_partition() {
e2label "${1}" hassos-data-dis
}
disable_data_partition() {
e2label "${1}" hassos-data-dis
if [ $? -ne 0 ]; then
echo "Error: Failed to disable data partition ${1}" >&2
# Optionally, exit or handle the error as needed
fi
}

If HAOS on Yellow is booted for the first time with NVMe data disk present, it
should be preferred over the empty eMMC data partition. This will ease
reinstall of the system and migration from CM4 to CM5. All other data disks
(e.g. if a USB drive is used for them) are still treated as before, requiring
manual adoption using the Supervisor repair.
@sairon sairon force-pushed the yellow-always-prefer-nvme-datadisk branch from 58eccc8 to 900407f Compare November 21, 2024 14:26
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
buildroot-external/rootfs-overlay/usr/libexec/haos-data-disk-detach (3)

Line range hint 9-10: Consider implementing a more robust device enumeration wait

The hardcoded 10-second sleep might not be sufficient for all devices to enumerate, especially slower NVMe or USB devices. Consider implementing a polling mechanism with timeout.

Example approach:

-# Wait up to 10s for devices to enumerate
-sleep 10s
+# Wait up to 30s for devices to enumerate
+timeout=30
+while [ $timeout -gt 0 ] && [ -z "$(blkid)" ]; do
+    sleep 1
+    timeout=$((timeout - 1))
+done
+if [ $timeout -eq 0 ]; then
+    echo "Warning: Timeout waiting for devices to enumerate" >&2
+fi

47-50: Add logging for selected data partition

When choosing between eMMC and NVMe, the script should log which partition will be used for data storage.

 if [ -n "${emmc_data_partition}" ] && [ -n "${nvme_data_partition}" ]; then
     echo "Found both eMMC and NVMe data disk devices, marking eMMC as disabled"
     disable_data_partition "${emmc_data_partition}"
+    echo "System will use NVMe data partition: ${nvme_data_partition}"
+elif [ -n "${nvme_data_partition}" ]; then
+    echo "System will use NVMe data partition: ${nvme_data_partition}"
+elif [ -n "${emmc_data_partition}" ]; then
+    echo "System will use eMMC data partition: ${emmc_data_partition}"
 fi

Line range hint 1-62: Consider adding disk health validation

Before selecting or disabling data partitions, it would be beneficial to validate:

  1. Disk health (SMART status for NVMe)
  2. Filesystem integrity
  3. Partition table consistency

This would prevent selecting a potentially failing drive as the primary data partition.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 58eccc8 and 900407f.

📒 Files selected for processing (1)
  • buildroot-external/rootfs-overlay/usr/libexec/haos-data-disk-detach (2 hunks)
🔇 Additional comments (3)
buildroot-external/rootfs-overlay/usr/libexec/haos-data-disk-detach (3)

15-17: Error handling concern in previous review remains valid


51-62: LGTM: Non-Yellow device logic maintains backward compatibility

The handling of external data disks for non-Yellow devices remains unchanged, which aligns with the PR objectives.


33-36: Verify NVMe disk selection criteria

The script assumes the first NVMe disk (nvme0*) is the intended target. This might not be true in systems with multiple NVMe drives.

✅ Verification successful

NVMe disk selection is correctly implemented

The script's handling of NVMe disk selection is appropriate for its use case. The code implements a clear priority system where:

  1. It detects all partitions labeled "hassos-data"
  2. For Yellow variant, it specifically handles NVMe (nvme0*) and eMMC devices
  3. If both NVMe and eMMC data partitions exist, it prioritizes NVMe by disabling the eMMC partition
  4. Any other data partitions on devices that aren't the root device are disabled

The assumption of using the first NVMe disk (nvme0*) is intentional as part of the system's design, and the script includes proper safeguards to handle various disk configurations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there could be multiple NVMe drives in the system
if grep -q "nvme" /proc/partitions; then
    echo "NVMe drives found in system:"
    grep "nvme" /proc/partitions
fi

Length of output: 81


Script:

#!/bin/bash
# Let's analyze the codebase to understand the NVMe disk handling logic
# Search for files containing nvme-related logic
rg -l "nvme" --type sh

# Then look for patterns related to NVMe disk detection/handling
rg "nvme[0-9]" --type sh -C 3

Length of output: 54


Script:

#!/bin/bash
# Let's check the content of the file under review and related files
fd "haos-data-disk" --type f --exec cat {}

# Also check for any documentation or configuration related to disk handling
fd -e md -e conf -e yaml -e txt | grep -i "disk\|storage" | xargs cat

Length of output: 2712

Copy link
Member

@agners agners left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Really nice and clean 🤩

@sairon sairon merged commit 98ac7f0 into dev Nov 21, 2024
3 checks passed
@sairon sairon deleted the yellow-always-prefer-nvme-datadisk branch November 21, 2024 18:42
sairon added a commit that referenced this pull request Nov 22, 2024
If HAOS on Yellow is booted for the first time with NVMe data disk present, it
should be preferred over the empty eMMC data partition. This will ease
reinstall of the system and migration from CM4 to CM5. All other data disks
(e.g. if a USB drive is used for them) are still treated as before, requiring
manual adoption using the Supervisor repair.

(cherry picked from commit 98ac7f0)
@sairon sairon mentioned this pull request Nov 22, 2024
@sairon sairon mentioned this pull request Dec 3, 2024
sairon added a commit that referenced this pull request Dec 12, 2024
If data disk is adopted on Yellow using the mechanism added in #3686, it
contains RAUC version information that is very likely invalid. In such case,
remove the file on first boot and have it recreated by the raucdb-update
service.
sairon added a commit that referenced this pull request Dec 12, 2024
If data disk is adopted on Yellow using the mechanism added in #3686, it
contains RAUC version information that is very likely invalid. In such case,
remove the file on first boot and have it recreated by the raucdb-update
service.
This was referenced Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
board/yellow Home Assistant Yellow cla-signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants