Skip to content

A collection of mini-benchmarks and other scripts related to ArangoDB

Notifications You must be signed in to change notification settings

dhly-etc/arangodb-etc

Repository files navigation

Some tiny benchmarks and other helpful scripts for ArangoDB

This is a collection of some scripts that I built up while working with ArangoDB. Some of them may be useful to others, so I figured I would share them.

benchmarks

This folder contains a few benchmarks for testing out changes to some ArangoDB features. There's one to test out how our AQL condition normalization behaves, one for working with geo indexes, one for testing out our caching behavior using a small hotset within a large collection of data, and finally one for incremental replication. All but the last are stand-alone JS scripts. The last is meant to be added as a file within the replication_sync test suite, and will require changes if it is to be used elsewhere.

build-scripts

This folder contains some scripts that I used when building and maintaining the main codebase. There are a number of build-* scripts, which are convenience methods to run cmake in a given directory to initialize a build folder. I found it much easier to remember to call mkdir build-enterprise; ~/work/scripts/build-enterprise build-enterprise ~/src/arangodb/arangodb than to remember all the different cmake flags I used for development. All of these scripts should be configured to use ccache along with either g++ or clang++. They may use some hard-coded paths from my system, but those should be pretty easy to spot and modify.

One script here that might be quite useful is format. It's a one-liner that is used to apply clang-format with the correct projects settings only to your staged changes prior to a commit. E.g.

git add *
format
git add *
git commit

It requires that you install the git clang-format integration, which is maintained by the llvm project here.

include-analysis

These scripts are simple. They perform some analysis to determine how many files in the codebase depend on a given header. The first, get-includes.sh simply scans all the files to extract which #include statements appear in which files. The second, process-includes.js, writes this data as a graph into ArangoDB and uses the graph to do the quantitative analysis. This analysis can be used while refactoring, to determine how much code will have to recompile if you change a given header. It can also be used to determine what the central header files are, to better target any optimization efforts. For instance, I spent some time removing unnecessary includes (helped by include-what-you-use) from various central header files, replacing some includes with forward declarations, moving code from header to implementation files, etc. and was able to achieve a considerable speedup in project compilation time (and greatly reduce the number of files that need to be re-compiled when various headers are changed.)

prepare-dump-for-pregel-customer

These scripts are used to take data generated by arangodump from a single-server instance and prepare it to be restored using arangorestore into a cluster instance to be used with Pregel. In particular, it changes the number of shards for the collections, distributes each of them like a prototype collection, and adds a new shard key vertex for edge collection so that each edge ends up the same shard as its source vertex (a requirement for Pregel). Each script operates on a single file in the dump directory at a time, and will generate a new file with the extension .fixed. It shouldn't be to hard to modify these scripts to run in a loop over all relevant files in the directory, and it also shouldn't be hard to write a short script to remove all the original files and rename the new files without the .fixed extension.

timeseries-tests

These are some old benchmarks which were used circa 2019 to evaluate ArangoDB's current storage format for timeseries data ingestion, compared to an optmized prototype, compared to the then-current release of TimescaleDB. To be honest, I don't remember too much about the tests, but I would guess it's straightforward from the source code.

About

A collection of mini-benchmarks and other scripts related to ArangoDB

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published