Skip to content

Online Change Options

ZengJingtao edited this page Mar 30, 2023 · 4 revisions

Modify configuration online

With ToplingDB, configuration parameters can be set in json/yaml, which are fixed during the running of the process and cannot be modified. However, in practical applications, online modification of configuration parameters is a very realistic requirement.

2022-03-04, we implemented the function of modifying the configuration online through the embedded Web Service. There are two main categories:

1. DBOptions & CFOptions

In the RocksDB system, DBOptions & CFOptions are divided into two types: Mutable and Immutable. Of course, we can only modify Mutable Options online. In theory, we should use the same syntax as json/yaml to modify these configurations. However, in RocksDB, it may be based on interface security considerations, and unordered_map<string, string> is used to modify the configuration instead of the corresponding Option class.

Therefore, on the one hand, for the sake of simplicity (the not-so-simple solution is that theoretically we can deeply modify the RocksDB code to achieve the goal), on the other hand, there is a practical creed in software engineering: try to make dangerous operations complicated and obscure. Modifying the configuration is a relatively dangerous operation. It may be a good thing to make it complicated and obscure and different from Topling's json configuration syntax. Therefore, for DBOptions & CFOptions, we use RocksDB's own syntax:

1.1. For DB_MultiCF (DB with multiple CFs), for example:

{
  "cfo": {
    "default": "max_write_buffer_number=10;level0_stop_writes_trigger=100"
  },
  "dbo": "max_background_compactions=17;delayed_write_rate=15M"
}

cfo means CFOptions(ColumnFamilyOptions), dbo means DBOptions; Both cfo and dbo are optional fields. Save the above json as a file mdb-cf-opt.txt. We use the curl command to modify the configuration:

# When using the -d parameter, curl uses the HTTP POST method by default
curl -d @mdb-cf-opt.txt "http://somehost:port/db/strings?html=0"

It can be simply put json body in command line, like this:

curl -d '{"cfo": {"default": "max_write_buffer_number=5"}}' "http://somehost:port/db/strings?html=0"

1.2. For DB (DB with only one CF), for example:

Note: This configuration string must be enclosed in double quotes, and the corresponding curl command is as follows:

curl -d '"delayed_write_rate=8M;max_background_compactions=40"' 'http://somehost:port/db/strings?html=0'

Because this string is parsed as json, double quotes can be parsed as json string.

1.3. Possible future improvements

With the current syntax, CFOptions & DBOptions use strings to configure, which also provides an opportunity for us to use ToplingDB's json syntax to modify online in the future: when the json syntax is supported, the current RocksDB string syntax is still valid, which is Compatibility is maintained.

1.4. Tips

a. moderate manual compaction

curl 'http://somehost:port/db/dbname?html=0&compact=default'

This command has a link on the WebView, which can be triggered by clicking the mouse. It calls the CompactRange function. The disadvantage is that it cannot execute multiple compact jobs concurrently, and the execution is very slow. We have proposed a Feature Request for concurrent execution to upstream RocksDB, but it has not yet been implemented.

b. faster manual compaction

curl -d '{"cfo": {"default":"max_bytes_for_level_base=1M"}}' 'http://somehost:port/db/dbname?html=0&indent=2'

Changing max_bytes_for_level_base to a small value will trigger automatic compaction, and automatic compaction can fully run the concurrency set by max_background_compaction. With the support of distributed compact, the full compact can be completed quickly, and the parameter can be changed back after completion.

It must be understood: Although this method improves concurrency, it increases the amount of calculation. Manual Compact is compacted layer by layer from the top layer to the bottom layer, and this method is to compact all layers at the same time, so that part of the lower layer compact will occur before the upper layer compact , thus a little more workload than Manual Compact.

2. Other objects

Other objects are modified using json syntax

2.1. Modify MemTableRepFactory

Modifying the MemTableRepFactory would be of great value to us (ToplingDB) as we expected to see the superiority of ToplingCSPPMemTab over SkipList immediately. But MemTableRepFactory is Immutable in RocksDB, but it doesn't matter. We can bypass this limitation of RocksDB by implementing an intermediate layer. When configuring:

"dyna": {
  "class": "Dyna",
  "params": {
    "real_fac": "${cspp}"
  }
}

When modifying online:

# change to cspp
curl -d '{"real_fac": "${cspp}"}' "http://somehost:port/memtable_factory/dyna?html=0"

# change to skiplist
curl -d '{"real_fac": "${skiplist}"}' "http://somehost:port/memtable_factory/dyna?html=0"

Theoretically, we can also directly modify the code of RocksDB and change the memtable_factory to Mutable, but we did not choose to do this, because such a modification is a "shotgun" modification that requires modification in many places, involves many files, and is very expensive. It may conflict with some subsequent modifications of upstream RocksDB, which is not conducive to our continuous follow-up of upstream RocksDB.

2.2. Modify DispatcherTableFactory

DispatcherTableFactory can modify the TableFactory used by each layer of LSM online, and observe the effect immediately:

curl -d '{"level_writers":["fast","fast","fast","fast"]}}' "http://somehost:port/table_factory/dispatch?html=0"

The above command changes the first 4 layers (L0~L3) of the LSM to fast, and the latter L4~L6 remain unchanged.

curl -d '{"level_writers":[null,null,null,"fast"]}}' "http://somehost:port/table_factory/dispatch?html=0"

This command is equivalent to the previous command, null means skip, and the original value will not be changed

2.3. Operate BlockCache

All Topling SSTs do not require BlockCache, but BlockBasedTable does. You can modify the capacity of the Cache, or clear objects that are not in use in the Cache, such as for lru_cache in the example:

set capacity

curl -d '{"capacity":"8G","strict_capacity":true}}' "http://somehost:port/cache/lru_cache?html=0"

Clear objects that are not in use (reference count is 0)

# As long as there is EraseUnRefEntries in the json object, its value is ignored
curl -d '{"EraseUnRefEntries":1}' "http://somehost:port/cache/lru_cache?html=0"
Clone this wiki locally