Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select the most appropriate VM OS image based on Levenshtein Distance #89

Merged
merged 1 commit into from
May 30, 2024

Conversation

yunkon-kim
Copy link
Member

This PR will initially provide a selection function of the most appropriate VM OS image based on Levenshtein Distance.

  • Add Levenshtein Distance mechanism
  • Select the keywords representing a server OS info from the source computing infra
  • Calculate text similarity between keywords and VM OS image ID/Description
  • Select the VM OS image that shows the highest similarity value

(see Levenshtein distance - wikipedia)

Background

Matching server OS information and VM OS image info is difficult. For example, there are 2 texts.

text1: "ubuntu 22.04.4 LTS (Jammy Jellyfish) x86_64 SSD"
: It's extracted/selected one by one from the physical machine info from source computing infra

text2: "ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20220609"
: It's AWS' VM OS image name

As you can see "22.04.4"-"22.04", "x86_64"-"amd64", "hvm-ssd", and "SSD", these are similar but different. Our system would tell us the values are different if we use just the string comparison.

Thus, we need Levenshtein Distance algorithm. It makes this possible by calculating the similarity between the texts.

Example

VM OS image ID in Case 2 is selected because Case 2 has a higher text similarity score than Case 1.

Case 1:

  • Base text: "ubuntu 22.04.4 LTS (Jammy Jellyfish) x86_64 SSD"
  • VM OS image name 1: "ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20191002"
  • Text similarity score: 27.150

Case 2:

  • Base text: "ubuntu 22.04.4 LTS (Jammy Jellyfish) x86_64 SSD"
  • VM OS image name 2: "ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20220609"
  • Text similarity score: 29.625

* Add Levenshtein Distance mechanism
* Select the keywords representing a server OS info from the source computing infra
* Calculate text similarity between keywords and VM OS image ID/Description
* Select the VM OS image that shows the highest similarity value
@yunkon-kim yunkon-kim requested a review from seokho-son as a code owner May 30, 2024 12:55
@yunkon-kim
Copy link
Member Author

/approve

@github-actions github-actions bot added the approved This PR is approved and will be merged soon. label May 30, 2024
@cb-github-robot cb-github-robot merged commit 4873b50 into cloud-barista:main May 30, 2024
2 checks passed
@yunkon-kim yunkon-kim deleted the 240530-20 branch May 31, 2024 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved This PR is approved and will be merged soon.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants