-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Optimization
Here are the strategies or best ways to optimize SeaweedFS.
When starting volume server, you can specify the index type. By default it is using memory. This is fast when volume server is started, but the start up time can be long in order to be loaded file indexes into memory.
weed volume -index=leveldb
can change to leveldb. It is much faster to start up a volume server, at the cost of a little bit slower when accessing the files. Compared to network speed, the extra cost is not that much in most cases.
Note: there are 3 flavors of leveldb indexes for the volume server:
-
leveldb
: small memory footprint (4MB total, 1 write buffer, 2 block buffers) -
leveldbMedium
: medium memory footprint (8MB total, 2 write buffers, 4 block buffers) -
leveldbLarge
: large memory footprint (12MB total, 4 write buffers, 8 block buffers)
In some Linux file system, e.g., XFS, ext4, Btrfs, etc, SeaweedFS can optionally allocate disk space for the volume files. This ensures file data is on contiguous blocks, which can improve performance when files are large and may cover multiple extents.
To enable disk space preallcation, start the master with these options on a Linux OS with a supporting file system.
-volumePreallocate
Preallocate disk space for volumes.
-volumeSizeLimitMB uint
Master stops directing writes to oversized volumes. (default 30000)
By default, SeaweedFS grows the volumes automatically. For example, for no-replication volumes, there will be concurrently 7 writable volumes allocated.
If you want to distribute writes to more volumes, you can do so by instructing SeaweedFS master via this URL.
curl http://localhost:9333/vol/grow?count=12&replication=001
This will assign 12 volumes with 001 replication. Since 001 replication means 2 copies for the same data, this will actually consumes 24 physical volumes.
The volume can be pre-created with other parameters:
curl http://localhost:9333/vol/grow?count=12&collection=benchmark
curl http://localhost:9333/vol/grow?count=12&dataCenter=dc1
curl http://localhost:9333/vol/grow?count=12&dataCenter=dc1&rack=rack1
curl http://localhost:9333/vol/grow?count=12&dataCenter=dc1&rack=rack1&dataNode=node1
Another way to change the volume growth strategy is to use master.toml
generated by weed scaffold -config=master
. Adjust the following section:
[master.volume_growth]
copy_1 = 7 # create 1 x 7 = 7 actual volumes
copy_2 = 6 # create 2 x 6 = 12 actual volumes
copy_3 = 3 # create 3 x 3 = 9 actual volumes
copy_other = 1 # create n x 1 = n actual volumes
threshold = 0.9 # create threshold
Same as above, more volumes will increase read concurrency.
In addition, increasing the replication will also help. Having the same data stored on multiple servers will surely increase read concurrency.
More hard drives will give you better write/read throughput.
The SeaweedFS usually only open a few actual disk files. But the network file requests may exceed the default limit, usually default to 1024. For production, you will need root permission to increase the limit to something higher, e.g., "ulimit -n 10240".
For each volume server, there are 2 things impacts the memory requirements.
By default, the volume server uses in memory index to achieve O(1) disk read. Roughly about 20 bytes is needed for each file. If one 30GB volume has 1 million files of averaged 30KB, the volume can cost 20MB memory to hold the index. You can also use leveldb index to reduce memory consumption and speed up startup time. To use it, "weed server -volume.index=[memory|leveldb|leveldbMedium|leveldbLarge]", or "weed volume -index=[memory|leveldb|leveldbMedium|leveldbLarge]".
Another aspect is the total file size that are read at the same time. If serving 100KB-sized file for 1000 concurrent reads, 100MB memory is needed.
For large files uploaded through filer, auto-chunking would be applied, the default value is weed filer -maxMB=32
. So if your system is going to support 1000 concurrent reads, 1000 x 32MB
memory is needed.
The file id generation is actually pretty trivial and you could use your own way to generate the file keys.
A file key has 3 parts:
- volume id: a volume with free spaces
- needle id: a monotonously increasing and unique number
- file cookie: a random number, you can customize it in whichever way you want
You can directly ask master server to assign a file key, and replace the needle id part with your own unique id, e.g., user id.
Also you can get each volume's free space from the server status.
curl "http://localhost:9333/dir/status?pretty=y"
Once you are sure about the volume free spaces, you can use your own file ids. Just need to ensure the file key format is compatible.
The assigned file cookie can also be customized, as long as it is 32-bit.
"strict monotonously increasing" needle id is not necessary, but keeping needle id in a "mostly" increasing order helps to keep the in memory data structure efficient.
If the needle id has to be random, you can use leveldb as the index when starting volume server.
If files are large and network is slow, the server will take time to read the file. Please increase the "-idleTimeout=30" limit setting for volume server. It cut off the connection if uploading takes a longer time than the limit.
If the file is large, it's better to upload this way:
weed upload -maxMB=64 the_file_name
This will split the file into data chunks of 64MB each, and upload them separately. The file ids of all the data chunks are saved into an additional meta chunk. The meta chunk's file id are returned.
When downloading the file, just
weed download the_meta_chunk_file_id
The meta chunk has the list of file ids, with each file id on each line. So if you want to process them in parallel, you can download the meta chunk and deal with each data chunk directly.
When assigning file ids,
curl http://master:9333/dir/assign?collection=pictures
curl http://master:9333/dir/assign?collection=documents
will also generate a "pictures" collection and a "documents" collection if they are not created already. Each collection will have its dedicated volumes, and they will not share the same volume.
Actually, the actual data files have the collection name as the prefix, e.g., "pictures_1.dat", "documents_3.dat".
If you need to delete them later see https://github.com/seaweedfs/seaweedfs/wiki/Master-Server-API#delete-collection
By default, if there is a slow read request caused by network problems, this request will block other requests, such as writing.
Set volume.hasSlowRead
to true,this prevents slow reads from blocking other requests,but large file read P99 latency will increase.
Increasing volume.readBufferSizeMB
(for example, set to filer.maxMB
size) can make read requests require fewer locks , which can fix the P99 latency problem caused by volume.hasSlowRead
When going to production, you will want to collect the logs. SeaweedFS uses glog. Here are some examples:
weed -v=2 master
weed -logdir=. volume
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- Server Startup Setup
- Environment Variables
- Filer Setup
- Directories and Files
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
- Amazon S3 API
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up