Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed unnecessary migration of ROOT volumes with KVM #8908

Merged
merged 1 commit into from
Jun 12, 2024

Conversation

JoaoJandre
Copy link
Contributor

Description

Using KVM as a hypervisor, when performing a live migration of a VM with its datadisks, the ROOT volume is also migrated, even if the user has not requested this action. Since this migration of the root volume is unnecessary and causes greater latency for the live migrate process, this behavior has been changed. Now, when migrating a VM and its datadisks, only the volumes selected for migration will actually be migrated.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Description of tests

A new VM was created with a root and a data volume for testing; Several live migration scenarios were tested. The cases are described below.

The following migrations occurred successfully, in each case, when only the datadisk was moved, a dump of the VM XML was performed to verify that the ROOT volume had not been moved.

Volume Origin Destination
ROOT iSCSI1 iSCSI2
DATA1 iSCSI1 ---
Volume Origin Destination
ROOT iSCSI1 ---
DATA1 iSCSI1 iSCSI2
Volume Origin Destination
ROOT iSCSI1 iSCSI2
DATA1 iSCSI1 iSCSI2
Volume Origin Destination
ROOT iSCSI1 Local
DATA1 iSCSI1 Local
Volume Origin Destination
ROOT Local iSCSI1
DATA1 Local iSCSI1
Volume Origin Destination
ROOT iSCSI1 NFS1
DATA1 iSCSI1 NFS1
Volume Origin Destination
ROOT iSCSI1 NFS1
DATA1 iSCSI1 ---
Volume Origin Destination
ROOT iSCSI1 ---
DATA1 iSCSI1 NFS1
Volume Origin Destination
ROOT NFS1 NFS2
DATA1 NFS1 NFS2
Volume Origin Destination
ROOT NFS1 Local
DATA1 NFS1 Local

The following migrations ended in error. However, the error is expected, because, as described in #8907, when migrating the VM + volume and not migrating all NFS volumes, an error occurs. This error will be fixed in another PR.

Volume Origin Destination
ROOT NFS1 Local
DATA1 NFS1 ---
Volume Origin Destination
ROOT NFS1 ---
DATA1 NFS1 Local
Volume Origin Destination
ROOT NFS1 NFS2
DATA1 NFS1 ---
Volume Origin Destination
ROOT NFS1 NFS2
DATA1 NFS1 ---
Volume Origin Destination
ROOT ISCSI1 ---
DATA1 NFS1 ISCSI1
Volume Origin Destination
ROOT NFS1 ---
DATA1 ISCSI1 NFS1

The following migration ended in error; however, this case is due to another bug that was also described in #8907: when migrating to an NFS storage, ACS will validate if the destination host has access to the source storage. This causes an issue when migrating from local storage to NFS storage, as the destination host will never have direct access to the source host's local storage. This bug will be fixed in another PR.

Volume Origin Destination
ROOT Local NFS1
DATA1 Local NFS1

@codecov-commenter
Copy link

codecov-commenter commented Apr 12, 2024

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 17.12%. Comparing base (a5508ac) to head (42aea70).
Report is 561 commits behind head on main.

Files Patch % Lines
...n/java/com/cloud/vm/VirtualMachineManagerImpl.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #8908      +/-   ##
============================================
+ Coverage     13.17%   17.12%   +3.95%     
- Complexity     9214    15262    +6048     
============================================
  Files          2725     4880    +2155     
  Lines        258235   327081   +68846     
  Branches      40249    46227    +5978     
============================================
+ Hits          34013    56027   +22014     
- Misses       219913   262982   +43069     
- Partials       4309     8072    +3763     
Flag Coverage Δ
simulator-marvin-tests 17.12% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sureshanaparti
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9254

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed make sense, thanks for the extensive test report

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@DaanHoogland
Copy link
Contributor

@blueorangutan test rocky8 kvm-rocky8

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (rocky8 mgmt + kvm-rocky8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-9831)

@DaanHoogland
Copy link
Contributor

@blueorangutan test alma9 kvm-alma9

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + kvm-alma9) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-9835)
Environment: kvm-rocky8 (x2), Advanced Networking with Mgmt server r8
Total time taken: 47181 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8908-t9835-kvm-rocky8.zip
Smoke tests completed. 127 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_events_resource Error 310.81 test_events_resource.py
test_01_events_resource Error 310.82 test_events_resource.py
test_04_deploy_vm_for_other_user_and_test_vm_operations Failure 88.69 test_network_permissions.py
ContextSuite context=TestNetworkPermissions>:teardown Error 1.49 test_network_permissions.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-9839)
Environment: kvm-alma9 (x2), Advanced Networking with Mgmt server a9
Total time taken: 50419 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8908-t9839-kvm-alma9.zip
Smoke tests completed. 127 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_events_resource Error 323.90 test_events_resource.py
test_01_events_resource Error 323.91 test_events_resource.py
test_04_deploy_vm_for_other_user_and_test_vm_operations Failure 90.35 test_network_permissions.py
ContextSuite context=TestNetworkPermissions>:teardown Error 1.48 test_network_permissions.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-9838)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 52231 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8908-t9838-kvm-centos7.zip
Smoke tests completed. 126 look OK, 3 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_events_resource Error 306.31 test_events_resource.py
test_01_events_resource Error 306.32 test_events_resource.py
test_04_deploy_vm_for_other_user_and_test_vm_operations Failure 90.42 test_network_permissions.py
ContextSuite context=TestNetworkPermissions>:teardown Error 1.36 test_network_permissions.py
test_01_redundant_vpc_site2site_vpn Failure 439.87 test_vpc_vpn.py

@JoaoJandre
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@JoaoJandre a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9292

@JoaoJandre
Copy link
Contributor Author

@DaanHoogland could we rerun the CI here?

@JoaoJandre
Copy link
Contributor Author

@blueorangutan package

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-10208)

@JoaoJandre
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@JoaoJandre a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9639

@JoaoJandre
Copy link
Contributor Author

@DaanHoogland could we try again?

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-10248)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 50810 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8908-t10248-kvm-centos7.zip
Smoke tests completed. 128 look OK, 3 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_events_resource Error 421.49 test_events_resource.py
test_02_trigger_shutdown Failure 351.78 test_safe_shutdown.py
test_01_redundant_vpc_site2site_vpn Failure 446.44 test_vpc_vpn.py

Copy link
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the extensive testing @JoaoJandre

@borisstoyanov borisstoyanov merged commit 3b8d220 into apache:main Jun 12, 2024
5 of 23 checks passed
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants