Skip to content

airnandez/cluefs

Repository files navigation

clueFS — a tool for tracing I/O activity at the file system level

Overview

cluefs is a lightweight utility to collect data on the I/O events induced by an application when interacting with a file system. It emits detailed, machine-parseable data on every file system-level operation.

The trace information emitted by this utility is meant to be analysed using tools not included in this package. You can find a collection of such tools in a separate project.

Motivation

The main goal of developing this utility is to observe and quantify the file I/O load induced by the software system being developed by the LSST data management team to process the data to be collected by the Large Synoptic Survey Telescope (LSST).

However, cluefs does not depend on LSST software system and can be used in several unrelated contexts. It may also be useful for other use cases, such as to get an overall understanding of how file systems work or to observe the (usually hidden and unexpected) operations performed when you mount a file system on your computer.

Although there are several tools for tracing system activity such as strace, DTrace, SystemTap or sysdig, for different reasons none of them was considered suitable for our particular use case.

How to use

Let's suppose you want to observe what file operations the command cat $HOME/data/hello.txt induces on the file system where the file hello.txt is actually located. You can use cluefs to expose the contents under the directory $HOME/data (the shadow directory) through a synthesized file system mounted on /tmp/trace. To mount the file system use the command:

$ cluefs --shadow=$HOME/data  --mount=/tmp/trace &

Once the file system is successfully mounted, when an application accesses a file or directory under /tmp/trace, cluefs emits an event for every call to the file system (e.g. access, open, read, close, etc.). For instance, the command:

$ cat /tmp/trace/hello.txt

will make cluefs emit the events below (one event per line):

...
2015-07-10T13:14:13.066799456Z,2015-07-10T13:14:13.066854171Z,54715,fabio,1000,fabio,1000,/bin/cat,28997,/home/fabio/data/hello.txt,file,open,O_RDONLY,0000,14,4096,58
2015-07-10T13:14:13.067274118Z,2015-07-10T13:14:13.067287085Z,12967,fabio,1000,fabio,1000,/bin/cat,28997,/home/fabio/data/hello.txt,file,read,14,0,4096,14,58
2015-07-10T13:14:13.067602625Z,2015-07-10T13:14:13.069215159Z,1612534,fabio,1000,fabio,1000,/bin/cat,28997,/home/fabio/data/hello.txt,file,flush,O_RDONLY,14,58
2015-07-10T13:14:13.069899802Z,2015-07-10T13:14:13.0699212Z,21398,root,0,root,0,,0,/home/fabio/data/hello.txt,file,release,58
...

To get detailed help on how to use this utility, including examples of usage, do:

$ cluefs

USAGE:
   cluefs --mount=<directory>  --shadow=<directory>  [--out=<file>]
           [(--csv | --json)]  [--ro]
   cluefs --help
   cluefs --version

Use 'cluefs --help' to get detailed information about options and
examples of usage.

When you are done collecting the trace information you want, you can unmount the file system created by cluefs with the command:

$ sudo umount /tmp/trace

Event formats

cluefs emits event records formatted in CSV or JSON. The format of each record is documented here.

How to install

Operating environment

This utility is tested on Scientific Linux v6 and v7, Ubuntu v14.04, CentOS v7 and MacOS X v10.9. It is possible cluefs also works on other systems or other versions of those operating systems where its dependencies are satisfied (see below).

Dependencies

To use cluefs you need Filesystem in Userspace (FUSE) installed on your system. To to that, please follow the installation instructions for your operating system according in the table below:

To install FUSE on ... ... follow the instructions below
Ubuntu $ sudo apt-get --yes install fuse
Scientific Linux, CentOS $ sudo yum install --assumeyes fuse
MacOS X install the latest stable version of FUSE for OS X

In addition, if you intend to build this software from sources you need both:

To install the Go tool chain please follow these detailed instructions. To install a C compiler please refer to the table below:

To install C compiler on ... ... follow the instructions below
Ubuntu $ sudo apt-get --yes install gcc
Scientific Linux, CentOS $ sudo yum install --assumeyes gcc
MacOS X download and install Xcode, including its command line tools

Installation

The recommended way to install this tool is to download one of the ready-to-use binary files available for your target execution platform. Those are self-contained executable files so you only need to download, unpack and you are ready to start using the tool.

Download binary releases here.

Alternatively, to build from sources do:

go get -u github.com/airnandez/cluefs

How this utility works

cluefs implements a synthesized file system which exposes all the files and directories existing on the underlying shadow file system. It intercepts each system call (e.g. open, read, etc.), emits a trace event about the call and forwards the operation to the appropriate file system for execution.cluefs collects the result of the operation and returns it to the calling application.

Although special attention has been given to make this utility as lightweight as possible, it is not intended to be permanently run in heavy-load I/O environments as there is an intrinsic non-zero performance penalty.

Known limitations

Currently, lock-related file system operations are not supported by cluefs. That is, it does not emit traces for those operations and makes them appear as unsupported by the file system. These are the operations induced by calling the fcntl(3) file system call using as second argument any of the values F_GETLK, F_SETLK or F_SETLKW.

You can contribute

Your contribution is more than welcome. There are several ways you can help:

  • Test this software on your particular environment and let us know how it works. If it does not work for you and you think it should, please provide all the relevant details when opening a new issue
  • If you find a bug, please report it by opening an issue
  • If you spot a defect either in this documentation or in the source code documentation we consider it a bug so please let us know
  • Providing feedback on how to improve this software by opening an issue

Roadmap

The items in our to-do list are documented separately.

Disclaimer

Although we have payed a lot attention to make this utility as reliable as possible, it is still experimental and surely contains undiscovered bugs that may adversely affect your data.

In particular, please note that cluefs does not protect you against any destructive operation you can normally perform on your data. Use it at your own risk.

Credits

Author

This software was developed and is maintained by Fabio Hernandez at IN2P3 / CNRS computing center (Lyon, France).

Acknowledgements

This work is based in other people's work, including:

License

Copyright 2015 Fabio Hernandez

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

A tool for tracing I/O activity at the file system level

Resources

License

Stars

Watchers

Forks

Packages

No packages published