Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

commitChanges async api #727

Merged
merged 11 commits into from
Nov 17, 2023

Conversation

AnkitaSuman07
Copy link

@AnkitaSuman07 AnkitaSuman07 commented Oct 24, 2023

Hi John,
Please review. I am testing SET commands and just did basic testing now. This might not be ready to merge yet until I validate the changes but please provide your review comments. Thanks

@AnkitaSuman07
Copy link
Author

Attaching logs file for deadlock issues observed.
448_endWriteBatch_async_crash.txt
logs_2.txt

@yzhao244
Copy link

@paulmchen @JohnSully Hi John, here is the PR for the async commit which we went over last Friday. Please help review when you get a chance. @AnkitaSuman07 For the issue you are facing today, please share more details in this PR with John.

@AnkitaSuman07
Copy link
Author

@paulmchen @JohnSully Hi John, here is the PR for the async commit which we went over last Friday. Please help review when you get a chance. @AnkitaSuman07 For the issue you are facing today, please share more details in this PR with John.

Hi John, I observed the AePostFunction callback function is not processed and waits forever. There was no keydb crash observed but there is this never ending wait in the task queue.
23639:23872:M 31 Oct 2023 02:51:33.952 # redisDbPersistentData::processStorageToken
23639:23872:M 31 Oct 2023 02:51:33.952 # RocksEncoderStorageProvider::beginWriteBatch
23639:23872:M 31 Oct 2023 02:51:33.952 # RocksEncoderStorageProvider::begin_endWriteBatch
23639:23873:M 31 Oct 2023 02:51:33.952 # redisDbPersistentData::processStorageToken
23639:23872:M 31 Oct 2023 02:51:33.952 # Populating token objects
23639:23873:M 31 Oct 2023 02:51:33.953 # RocksEncoderStorageProvider::beginWriteBatch
23639:23873:M 31 Oct 2023 02:51:33.953 # RocksEncoderStorageProvider::begin_endWriteBatch
23639:23873:M 31 Oct 2023 02:51:33.953 # Populating token objects
23639:23874:M 31 Oct 2023 02:51:33.990 # redisDbPersistentData::processStorageToken
23639:23874:M 31 Oct 2023 02:51:33.991 # redisDbPersistentData::processStorageToken
23639:23874:M 31 Oct 2023 02:51:33.991 # RocksEncoderStorageProvider::beginWriteBatch
23639:23874:M 31 Oct 2023 02:51:33.991 # RocksEncoderStorageProvider::begin_endWriteBatch
23639:23874:M 31 Oct 2023 02:51:33.991 # Populating token objects
23639:23871:M 31 Oct 2023 02:51:33.991 # redisDbPersistentData::processStorageToken
23639:23871:M 31 Oct 2023 02:51:33.992 # RocksEncoderStorageProvider::beginWriteBatch
23639:23871:M 31 Oct 2023 02:51:33.992 # RocksEncoderStorageProvider::begin_endWriteBatch
23639:23871:M 31 Oct 2023 02:51:33.992 # Populating token objects
23639:23873:M 31 Oct 2023 02:51:34.081 # redisDbPersistentData::processStorageToken
23639:23873:M 31 Oct 2023 02:51:34.081 # RocksEncoderStorageProvider::beginWriteBatch
23639:23873:M 31 Oct 2023 02:51:34.081 # RocksEncoderStorageProvider::begin_endWriteBatch
23639:23873:M 31 Oct 2023 02:51:34.081 # Populating token objects

image

Please let me know if you can see anything is not correct

@paulmchen
Copy link
Contributor

@JohnSully John, could you please take a look at this PR and see if the change makes sense. Thank you.

@yzhao244
Copy link

yzhao244 commented Nov 3, 2023

  1. What problem i have when applying this commit
    memtier got stuck in the middle, and it seems threads are hanging somewhere in the code logic so that either building a new redis-cli command or run a redis cli command is not possible, redis cli commands just got stuck

  2. How i can reproduce it
    When running the following memtiers on client machine, the memtier process may get stuck at a couple of percent or more and please note that this issue is not reproducible consistently because sometimes the same memtier script was able to finish the whole run.
    ./memtier_benchmark -s 172.25.0.89 -p 8000 -t 8 -c 50 -n allkeys -d 32 --random-data --randomize --key-minimum=1 --key-maximum=6000000 --ratio 1:0 --key-pattern=P:P --hide-histogram --cluster-mode
    ./memtier_benchmark -s 172.25.0.89 -p 8000 -c 2500 -t 4 -n 15000 --random-data --randomize --distinct-client-seed -d 32 --key-maximum=150000000 --key-minimum=1 --ratio=1:0 --key-pattern=P:P --show-config --cluster-mode

  3. What i am thinking could be, please check which part of change that i made
    The happy flow should be aePostFunction makes the callback to processStorageToken, and eventually invokes complete_endWriteBatch
    However, sometimes, occasionally, processStorageToken callback was not being invoked, neither the complete_endWriteBatch. And both keydb logs and memtier command just stuck there.
    We may suspect some locking things or maybe how writebatch used , may contribute to some complication which occasionally contributes to this threads hanging scenario.
    Really appreciate it if you can take a look at our implementation and see anything looks a bit suspcious.
    We have put a log entry in beginWriteBatch, processStorageToken, complete_endWriteBatch for tracking when the functions are executed.

the following is subset of log since we log everytime when those functions are invoked and the whole log is huge. The problem is not consistent reproducible so we had to let the memtier run a bit while, eventually it stuck at beginwritebatch.
image
keydb-subset.log

@paulmchen @JohnSully

@paulmchen
Copy link
Contributor

@JohnSully Could you please check the issue mentioned in the previous message by @yzhao244? We think the problem could be listed in 3, but we would like to hear your thoughts. Thank you.

@yzhao244
Copy link

yzhao244 commented Nov 5, 2023

@JohnSully Also, we would like to ask your opinion how to block clients in commitChanges.

void redisDbPersistentData::commitChanges(const redisDbPersistentDataSnapshot **psnapshotFree)
{

    std::unordered_set<client *> setcBlocked; //How to block clients in commitChanges
    if (m_pdbSnapshotStorageFlush)
    {
        dictIterator *di = dictGetIterator(m_dictChangedStorageFlush);
        dictEntry *de;
        while ((de = dictNext(di)) != nullptr)
        {
            serializeAndStoreChange(m_spstorage.get(), (redisDbPersistentData*)m_pdbSnapshotStorageFlush, (const char*)dictGetKey(de), (bool)dictGetVal(de));
        }
        dictReleaseIterator(di);
        dictRelease(m_dictChangedStorageFlush);
        m_dictChangedStorageFlush = nullptr;
        *psnapshotFree = m_pdbSnapshotStorageFlush;
        m_pdbSnapshotStorageFlush = nullptr;
    }
    
    if (m_spstorage != nullptr)
    {
        auto *tok = m_spstorage->begin_endWriteBatch(serverTL->el, storageLoadCallback);
        if (tok != nullptr)
        {
            for (client *c : setcBlocked) //need to check how to push client to blocked list
            {
                if (!(c->flags & CLIENT_BLOCKED))
                    blockClient(c, BLOCKED_STORAGE);
            }
           // tok->setc = std::move(setcBlocked);
            tok->db = this;
            tok->type = StorageToken::TokenType::BatchWrite;
        }
    }
}

src/db.cpp Outdated Show resolved Hide resolved
src/db.cpp Outdated
auto *tok = m_spstorage->begin_endWriteBatch(serverTL->el, storageLoadCallback);
if (tok != nullptr)
{
for (client *c : setcBlocked) //need to check how to push client to blocked list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this code doesn't work lets remove it.

src/db.cpp Outdated Show resolved Hide resolved
src/db.cpp Show resolved Hide resolved
src/storage/rocksdb.cpp Outdated Show resolved Hide resolved
src/storage/rocksdb.cpp Outdated Show resolved Hide resolved
@yzhao244
Copy link

@JohnSully a quick update. Ankita has resolved code review comments. :)

@JohnSully JohnSully merged commit 1d59efb into Snapchat:async_flash Nov 17, 2023
1 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants