Select the most appropriate VM OS image based on Levenshtein Distance #89
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR will initially provide a selection function of the most appropriate VM OS image based on Levenshtein Distance.
(see Levenshtein distance - wikipedia)
Background
Matching server OS information and VM OS image info is difficult. For example, there are 2 texts.
text1: "ubuntu 22.04.4 LTS (Jammy Jellyfish) x86_64 SSD"
: It's extracted/selected one by one from the physical machine info from source computing infra
text2: "ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20220609"
: It's AWS' VM OS image name
As you can see "22.04.4"-"22.04", "x86_64"-"amd64", "hvm-ssd", and "SSD", these are similar but different. Our system would tell us the values are different if we use just the string comparison.
Thus, we need Levenshtein Distance algorithm. It makes this possible by calculating the similarity between the texts.
Example
VM OS image ID in Case 2 is selected because Case 2 has a higher text similarity score than Case 1.
Case 1:
Case 2: