Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[move-stdlib] Introduce efficient OrderedMap implementation #14872

Open
wants to merge 2 commits into
base: igor/native_compare_move_part
Choose a base branch
from

Conversation

igor-aptos
Copy link
Contributor

Description

Implement efficient OrderedMap implementation in move-stdlib, to replace/deprecate SimpleMap. It is an in-memory datastructure - i.e. doesn't store anything in separate resources.

Implementation is a SortedVectorMap, but given the efficiency improvements of vector operations with memcpy, it's performance characteristics are extremely good.

Running different sizes tests (these are full results), we get milliseconds needed for 100 inserts and 100 removals on maps of different sizes:

num elements  SimpleMap       OrderedMap::SortedVectorMap
10            6.4             6.8
100           25.4            10.3
1000          673.9           12.8
10000         too long        19.5

Basically achieving log(n) performance, with very simple implementation.

How Has This Been Tested?

provided unit tests

Key Areas to Review

Type of Change

  • New feature

Which Components or Systems Does This Change Impact?

  • Aptos Framework

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Oct 4, 2024

⏱️ 4h 32m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 1h 41m 🟥🟥🟥🟥
execution-performance / test-target-determinator 17m 🟩🟩🟩🟩🟩
rust-move-unit-coverage 16m 🟩
rust-move-unit-coverage 14m 🟩
test-target-determinator 14m 🟩🟩🟩🟩
rust-move-unit-coverage 14m 🟩
rust-cargo-deny 14m 🟩🟩🟩🟩🟩 (+3 more)
rust-move-unit-coverage 11m 🟩
rust-move-unit-coverage 10m 🟩
rust-move-unit-coverage 9m 🟩
check-dynamic-deps 8m 🟩🟩🟩🟩🟩 (+4 more)
rust-move-tests 4m 🟥
semgrep/ci 4m 🟩🟩🟩🟩🟩 (+5 more)
rust-move-tests 4m 🟥
rust-move-tests 4m 🟥

🚨 2 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
execution-performance / single-node-performance 37m 16m +127%
test-target-determinator 3m 4m -24%

settingsfeedbackdocs ⋅ learn more about trunk.io

/// Uses cmp::compare for ordering, which compares primitive types natively, and uses common
/// lexicographical sorting for complex types.
module aptos_std::ordered_map {
// friend aptos_std::big_ordered_map;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be uncommented in the PR with big_ordered_map

/// The OrderedMap datastructure.
enum OrderedMap<K, V> has drop, copy, store {
/// sorted-vector based implementation of OrderedMap
SortedVectorMap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

Copy link

codecov bot commented Oct 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 57.4%. Comparing base (41a0b13) to head (76e0b76).

Additional details and impacted files
@@                 Coverage Diff                  @@
##           igor/native_compare   #14872   +/-   ##
====================================================
  Coverage                 57.4%    57.4%           
====================================================
  Files                      859      859           
  Lines                   211663   211663           
====================================================
  Hits                    121527   121527           
  Misses                   90136    90136           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

let len = self.entries.length();
let index = binary_search(key, &self.entries, 0, len);
assert!(index < len, error::invalid_argument(EKEY_NOT_FOUND));
let Entry { key: old_key, value } = self.entries.remove(index);
Copy link
Contributor

@lightmark lightmark Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vector does not keep the order here? IIRC, it swap and pop_back. Did you change that behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove keeps the order. swap_remove doesn't

/// Aborts with EKEY_ALREADY_EXISTS if key already exist.
public fun add<K, V>(self: &mut OrderedMap<K, V>, key: K, value: V) {
let len = self.entries.length();
let index = binary_search(&key, &self.entries, 0, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should introduce iterator to vector too so you don't need to know the length anymore for assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems fine to me

/// Changes the key, with keeping the same value attached to it
/// Aborts with EKEY_NOT_FOUND if `old_key` doesn't exist.
/// Aborts with ENEW_KEY_NOT_IN_ORDER if `new_key` doesn't keep the order `old_key` was in.
public(friend) fun replace_key_inplace<K: drop, V>(self: &mut OrderedMap<K, V>, old_key: &K, new_key: K) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, any use case for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems not generally useful, but in BTreeMap keys here represent max_key of a child subtree, and so I need to modify it in place.

return;
};

// TODO: can be implemented more efficiently, as we know both maps are already sorted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but issue is here we have vectors, not lists, and so how to do it in-place, without moving too many elements.

Easiest is to can either create a new vector, populate, and then swap (I'll probably just do that)

if "vector" here had capacity, I could start from the end, without temporary storage, but I don't have a way to create empty values here (i.e. have this work without value being drop+copy)

f(iter.iter_borrow_key(self), iter.iter_borrow(self));
iter = iter.iter_next(self);
}
// vector::for_each_ref(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entry is not a public type, so when this get's inlined in the caller, it cannot do compile.

basically inline functions need to be implemented only using public API.

I can remove this snippet as well


// return index containing the key, or insert position.
// I.e. index of first element that has key larger or equal to the passed `key` argument.
fun binary_search<K, V>(key: &K, entries: &vector<Entry<K, V>>, start: u64, end: u64): u64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a standard Entry and implement binary search only once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am writing binary search only once.

If you mean for binary search to be reusable, so other's don't need to write it, maybe that should be an inline function in vector

Moving Entry to a standard separate file, makes all field accesses to require getter and setter functions, not sure that's best

Copy link
Contributor

@lightmark lightmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing work! Thanks @igor-aptos . You are the hero!

@igor-aptos igor-aptos force-pushed the igor/native_compare branch 2 times, most recently from 12807a4 to 1c15e62 Compare November 9, 2024 00:17
@igor-aptos igor-aptos force-pushed the igor/native_compare branch 5 times, most recently from 3144f47 to 73c4d4d Compare November 13, 2024 21:13
Base automatically changed from igor/native_compare to main November 13, 2024 21:46
@igor-aptos igor-aptos changed the base branch from main to igor/native_compare_move_part November 15, 2024 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants