You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 22, 2023. It is now read-only.
Iterators are more user-friendly (especially in languages like Python or Java). The iterators API is also more 'composable'. Since finding an element and modifying/reading from it would be separate actions we would not have to expose functions like (get_lower, get_higher, update_lower, update_higher, etc.).
Description
The capabilities of iterators will be at least the same as of existing get/get_all/get_above functions. With iterators we can also implement update feature without callbacks (#387).
API Changes
C++
classdb {
template <bool Const>
iterator new_iterator(); // TODO: should be pass some configuration structure to this func to allow for future extensions?
};
namespaceinternal {
template <bool IsConst>
classiterator {
public:iterator(); // in ctor, we might take global locks if necessary (e.g. for csmap)~iterator(); // drops any locksiterator(const iterator&) = delete;
iterator(iterator&&);
iterator& operator=(iterator&&); // TODO: do we want move assignment op?
iterator& operator=(const iterator&) = delete;
status seek(string_view key);
status seek_lower(string_view key);
status seek_lower_eq(string_view key);
status seek_higher(string_view key);
status seek_higher_eq(string_view key);
status seek_to_first();
status seek_to_last();
status next();
status prev(); // it can return NOT_SUPPORTED (e.g, for csmap)
result<string_view, status> key();
result<string_view, status> value(); // only for IsConst
result<accessor, status> value(); // only for !IsConst, starts a transaction// TODO: should this function consume iterator?
};
/* For some engines (e.g. cmap) accessor takes locks internally and starts a transacion * which makes it impossible to create other iterators while accessor is live. For now, we should just * prohibit using multiple accessors for pmemobj engines. We can relax this requirement if there is * an active pmemkv transaction (see description below for batch update). */classaccessor {
result<slice<InputIt<char>>, status> read_range(size_t pos, size_t n); // only for reading
result<slice<OutputIt<char>>, status> write_range(size_t pos, size_t n); // only for writing (can point to the same memory as read_range or to WAL).// It is unspecified whether writes made through write_range are visible though read_range// TODO: should we also have read_for_update_range here (for MVCC based engines)?// TODO: if commit and abort consumes iterator (like proposed in C API) maybe it would be better// to make those functions as standalone functions taking rvalue ref, e.g.: status commit(accessor&&);
status commit();
voidabort();
};
}
// Based on https://doc.rust-lang.org/std/result/enum.Result.htmltemplate <typename Ok, typename Err>
classresult {
boolis_ok();
boolis_error();
// std::optional is available only since C++17 so we might need to think about some other approach
std::optional<Ok> ok(); // discards error, if any
std::optional<Err> err(); // discards value, if any
};
using read_iterator = internal::iterator<true>;
using write_iterator = internal::iterator<false>;
C
typedefstructpmemkv_read_iteratorpmemkv_read_iterator;
typedefstructpmemkv_write_iteratorpmemkv_write_iterator;
typedefstructpmemkv_accessorpmemkv_accessor;
pmemkv_read_iterator*pmemkv_read_iterator_new();
pmemkv_write_iterator*pmemkv_write_iterator_new();
voidpmemkv_read_iterator_delete(pmemkv_read_iterator*);
voidpmemkv_write_iterator_delete(pmemkv_write_iterator*);
// TODO: since we have two iterator types we either have to implement all those functions twice or take some generic// paramater like void* or pmemkv_iterator*. In case of pmemkv_iterator* user would have to explicitly cast the pointer. intpmemkv_iterator_seek(void*it, constchar*k, size_tkb);
intpmemkv_iterator_seek_lower(void*it, constchar*k, size_tkb);
intpmemkv_iterator_seek_lower_eq(void*it, constchar*k, size_tkb);
intpmemkv_iterator_seek_higher(void*it, constchar*k, size_tkb);
intpmemkv_iterator_seek_higher_eq(void*it, constchar*k, size_tkb);
intpmemkv_iterator_seek_to_first(void*it);
intpmemkv_iterator_seek_to_last(void*it);
intpmemkv_iterator_next(void*it);
intpmemkv_iterator_prev(void*it);
intpmemkv_iterator_key(void*it, constchar**k, size_t*kb);
intpmemkv_iterator_key(void*it, constchar**k, size_t*kb);
intpmemkv_read_iterator_value(pmemkv_read_iterator*it, constchar**val, size_t*kb);
intpmemkv_write_iterator_value(pmemkv_write_iterator*it, pmemkv_accessor**acc);
intpmemkv_accessor_read_range(pmemkv_accessor*acc, size_tpos, size_tn, constchar**data, size_t*kb);
intpmemkv_accessor_write_range(pmemkv_accessor*acc, size_tpos, size_tn, char**data, size_t*kb);
// both commit and abort will consume the accessor (destroy it)intpmemkv_accessor_commit(pmemkv_accessor*);
voidpmemkv_accessor_abort(pmemkv_accessor*);
// void pmemkv_accessor_delete(pmemkv_accessor*); // probably not needed if commit/abort consumes the iterator
Batch updates (Just an initial idea)
For current engines (at least cmap) we cannot use iterators inside of a transaction.
If we would like to add support for batch updates the only way would be to first obtain iterators
and only then start transaction/batch operation like this (this assumes we can hold several iterators per thread):
auto it1 = db.new_iterator<false>().seek("k1");
auto it2 = db.new_iterator<false>().seek("k2");
// auto acc0 = it1.get().accessor;// acc0.write_range(...);// TODO: if commit was not called before starting a tx, this is an error (or do we want the db.tx() to just create an inner tx?)auto tx = db.tx();
auto acc1 = it1.get().accessor;
auto r1 = acc1.write_range(0, 10);
std::memset(r1.begin(), 2, r1.size());
// acc1.commit() // TODO: this is an error since this would commit the outer tx OR this does nothingauto acc2 = it2.get().accessor;
auto r2 = acc2.write_range(0, 10);
std::memset(r2.begin(), 2, r2.size());
tx.update(std::move(acc1)); // consumes accessor and transfers any locks, any modifications to acc1 are visible only after tx.commit()
tx.update(std::move(acc2));
tx.commit();
Holding mutliple iterators
The biggest problem with holding multiple iterators (for concurrent engines) is that we can call lock on an already locked element. A solution to that might be keeping a list of all active iterators and if a user requests second iterator to the same element we just return him the same one (wrapped in some shared_ptr?). If then one of the iterators is modified (via this shared_ptr) we can use CoW approach to just modify one of them.
Other methods
The problem is similar to multiple iterators. If we call get(key) while already holding an iterator for key we might get undefined behavior. For put/get we could detect if we already hold an iterator and return appropriate status (calling get/put on different keys can suceed). For methods like defrag() we should probably always return an error in case any iterator is held.
@karczex are you suggesting begin() and end() for db class? I think it's better to have just seek_to_first/last in iterator class to avoid duplication. We wouldn'y be able to just replace new_iterator() with begin or end because that would decrease performance (we would have to lock the first or last element even if you just want to seek to some other one).
Well, traditionally begin()/end() returns you the iterator. Here, I already have an iterator and want to proceed.
Update from the meeting: any checks (for single accessor/single iterator per thread) should probably be done just in debug (asserts). Unless we want to actually track all iterator, then returning a status in case we call get() on key which is already hold by iterator would not introduce much additional overhead.
FEAT: Iterators
Rationale
Iterators are more user-friendly (especially in languages like Python or Java). The iterators API is also more 'composable'. Since finding an element and modifying/reading from it would be separate actions we would not have to expose functions like (get_lower, get_higher, update_lower, update_higher, etc.).
Description
The capabilities of iterators will be at least the same as of existing get/get_all/get_above functions. With iterators we can also implement update feature without callbacks (#387).
API Changes
C++
C
Batch updates (Just an initial idea)
For current engines (at least cmap) we cannot use iterators inside of a transaction.
If we would like to add support for batch updates the only way would be to first obtain iterators
and only then start transaction/batch operation like this (this assumes we can hold several iterators per thread):
Holding mutliple iterators
The biggest problem with holding multiple iterators (for concurrent engines) is that we can call lock on an already locked element. A solution to that might be keeping a list of all active iterators and if a user requests second iterator to the same element we just return him the same one (wrapped in some shared_ptr?). If then one of the iterators is modified (via this shared_ptr) we can use CoW approach to just modify one of them.
Other methods
The problem is similar to multiple iterators. If we call get(key) while already holding an iterator for
key
we might get undefined behavior. For put/get we could detect if we already hold an iterator and return appropriate status (calling get/put on different keys can suceed). For methods like defrag() we should probably always return an error in case any iterator is held.Ref:
#387
#713
The text was updated successfully, but these errors were encountered: