diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 1152891..d981a7b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -55,7 +55,7 @@ pip install -e .[docs] Build docs ```bash -make clean && make html +make html ``` Use [start-docs-host.sh](dev-tools/start-docs-host.sh) to deploy a local http server to view the docs @@ -66,7 +66,6 @@ cd ./dev-tools && ./start-docs-host.sh Access `http://localhost:8080` for docs. - ## Contributing a new tracer/filter/collector/analyzer 1. Create a new file in `duetector/tracer`, `duetector/filter`, `duetector/collector` or `duetector/analyzer` directory, with the name `{name}.py` diff --git a/README.md b/README.md index b79abfc..7b4fcd6 100644 --- a/README.md +++ b/README.md @@ -37,24 +37,33 @@ In the [ABAUC control model](https://github.com/hitsz-ids/dataucon), duetector c ## Feature -- [X] Plug-in system - - [X] Customized tracer support - - [X] Support for custom filters - - [X] Custom collector support - - [X] [Custom Plugin Examples](./examples/) -- [ ] Configuration Management +- Plug-in system support, see [examples](./examples/) for more details + - [X] Custom `Tracer` and `TracerManager` + - [X] Custom `Filters` and `FilterManager` + - [X] Custom `Collector` and `CollectorManager` + - [X] Custom `Analyzer` and `AnalyzerManager` +- Configuration Management - [X] Configuration using a single configuration file - [X] Generate Plugin Configuration - [ ] Support for dynamically loading configurations -- [ ] eBPF-based data usage probes - - [X] File Open Operation - - [ ] ...... -- [ ] Shell command probes - - [X] Kernel Information Probe - - [ ] ...... -- [X] Data collector with SQL database support -- [X] CLI Tools -- [ ] PIP Service +- `Tracer` Support + - [X] eBPF-based tracer + - [X] Shell command tracer + - [ ] Subprocess tracer +- `Filter` Support + - [X] Pattern matching, based on regular expressions +- `Collector` and `Analyzer` Support + - [X] SQL database + - [ ] Opentelemetry +- Analyzer Support + - [X] SQL database support + - [ ] Opentelemetry support +- User Interface + - [X] CLI Tools + - [X] PIP Service + - [ ] Control Panel +- Enhancements + - [ ] Runc containers identification The eBPF program requires kernel support, see [Kernel Support](./docs/kernel_config.md) @@ -158,7 +167,7 @@ Commands: ### Analyzing with analyzer -We provide an [Analyzer](https://duetector.readthedocs.io/en/latest/analyzer/index.html) that can query the data in storage, here we provide a [user case](./docs/usercases/simplest-open-count/README.md) +We provide an [Analyzer](https://duetector.readthedocs.io/en/latest/analyzer/index.html) that can query the data in storage, try it in [user case](./docs/usercases/simplest-open-count/README.md) ### Using duetector server @@ -231,12 +240,10 @@ This project is initiated by **Institute of Data Security, Harbin Institute of T ## How to contribute -You are very welcome to join! [Raise an Issue](https://github.com/hitsz-ids/duetector/issues/new) or submit a Pull Request. +Starting with the [good first issue](https://github.com/hitsz-ids/duetector/issues/70) and reading our [contributing guidelines](./CONTRIBUTING.md). -Please refer to the [Developer Documentation](./CONTRIBUTING.md). - -Learn about the design ideas and architecture of this project here: [DESIGN DOCUMENTS](./docs/design/README.md). +Learn about the designing and architecture of this project here: [docs/design](./docs/design/README.md). ## License -This project uses Apache-2.0 license, please refer to [LICENSE](https://github.com/hitsz-ids/duetector/blob/main/LICENSE). +This project uses Apache-2.0 license, please refer to [LICENSE](./LICENSE). diff --git a/README_zh.md b/README_zh.md index 6c7d961..4a30395 100644 --- a/README_zh.md +++ b/README_zh.md @@ -41,24 +41,28 @@ duetector🔍是一个基于可扩展的的数据使用探测器,它可以在L ## 主要特性 -- [X] 插件化系统 - - [X] 支持自定义tracer - - [X] 支持自定义filter - - [X] 支持自定义collector - - [X] [自定义插件示例](./examples/) -- [ ] 配置管理 +- 插件化系统,在[例子](./examples/)获取更多细节 + - [X] 支持自定义`Tracer`和`TracerManager` + - [X] 支持自定义`Filters`和`FilterManager` + - [X] 支持自定义`Collector`和`CollectorManager` + - [X] 支持自定义`Analyzer`和`AnalyzerManager` +- 配置管理 - [X] 使用单一配置文件配置 - [X] 支持生成插件配置 - [ ] 支持动态加载配置 -- [ ] 基于eBPF的数据使用探测器 - - [X] 文件打开操作 - - [ ] …… -- [ ] 基于Shell命令的探测器 - - [X] 内核信息探测 - - [ ] …… -- [X] 支持SQL数据库的数据收集器 -- [X] CLI工具 -- [ ] PIP服务 +- `Tracer`支持 + - [X] 基于eBPF的tracer + - [X] 基于shell命令的tracer + - [ ] 基于子进程的tracer +- `Filter`支持 + - [X] 支持正则的模式匹配 +- `Collector`和`Analyzer`支持 + - [X] SQL数据库 + - [ ] Opentelemetry +- 用户接口 + - [X] 命令行工具 + - [X] PIP服务 + - [ ] 控制平面 eBPF程序需要内核支持,详见[内核支持](./docs/kernel_config.md) @@ -235,19 +239,10 @@ Commands: ## 如何贡献 -非常欢迎您的加入![我们欢迎任何类型的Issue](https://github.com/hitsz-ids/duetector/issues/new),同时也期待您的PR +从[good first issue](https://github.com/hitsz-ids/duetector/issues/70)了解如何开始,并阅读我们的[贡献指南](./CONTRIBUTING.md)。 -我们提供了以下资料让您更快了解项目 - -- 开发环境配置和其他注意事项请参考:[开发者文档](./CONTRIBUTING.md) -- 在这里了解本项目的设计思路和架构:[设计文档](./docs/design/README.md) - -# 如何开发插件 - -目前,tracer、filter、collector都支持自定义插件开发,以Python包作为单个插件或多个插件,可以查看[自定义插件示例](./examples/)了解开发步骤 - -TODO: 提供一个插件的cookiecutter模板 +在这里了解本项目的设计思路和架构:[设计文档](./docs/design/README.md) ## 许可证 -本项目使用 Apache-2.0 license,有关协议请参考[LICENSE](https://github.com/hitsz-ids/duetector/blob/main/LICENSE)。 +本项目使用 Apache-2.0 license,有关协议请参考[LICENSE](./LICENSE)。 diff --git a/docs/README.md b/docs/README.md index e69de29..0dd12f9 100644 --- a/docs/README.md +++ b/docs/README.md @@ -0,0 +1,25 @@ +# Duetector's documentation + +## Contents + +- [design](./design.md): Duetector's design +- [how-to](./how-to/README.md): How to use Duetector +- [usercases](./usercases/README.md): User cases of using Duetector +- [kernel_config](./kernel_config.md): Kernel config requirements for Duetector +- [source](./source/README.md): Duetector's sphinx source, see [build docs](#build-docs) for more details. + + + +## Build docs + +Install docs requirements: + +```bash +pip install -e ..[docs] +``` + +build docs: + +```bash +make html +``` diff --git a/docs/design/README.md b/docs/design/README.md index 927a357..41403b3 100644 --- a/docs/design/README.md +++ b/docs/design/README.md @@ -4,19 +4,25 @@ Key Components and Features: -- [ ] **HTTP / RPC Server**: PIP Server, providing API for PDP to get data usage information. -- [ ] **Analyzer**: Analyze data usage information and generate data usage behavior. - - [ ] **DBAnalyzer**: Analyze data usage information from database. +- [] Control Server: Control plane, providing API for administrator to manage duetector. +- [X] Query Server: PIP Server, providing API for PDP to get data usage information. +- [X] **Analyzer**: Analyze data usage information and generate data usage behavior. + + - [X] **DBAnalyzer**: Analyze data usage information from database. - [X] **CLI**: CLI for administrator to manage duetector. - [X] **BccMonitor**: Monitor data usage behavior in kernel space. Use BCC to implement. - [X] **ShMonitor**: A general monitor for custom command. Polling the output of command. +- [ ] **SubprocessMonitor**: A subprocess monitor for subprocesses. Manage subprocesses and daemonize them. - [X] **TracerManager**: Manage tracers, support plugin. + - [X] **OpenTracer**: A `bcc` tracer, trace `open` syscall. - [ ] ... - [X] **FilterManager**: Manage filters, support plugin. + - [X] **DefaultFilter**: Filtering some meaningless information - [ ] ... - [X] **CollectorManager**: Manage collectors, support plugin. + - [X] **DBCollector**: Collect filted trackings and store them into database. - [ ] ... @@ -30,8 +36,5 @@ Current data flow implementation: 2. Once **Monitor's** `poll` is called, it will trigger **Tracer's** `callback` 3. **Tracer's** `callback` will call **Filter** to filter the data. 4. **Filter's** `filter` will call **Collector's** `emit` to collect the data. - -The following are not yet realized and may be subject to change. - -- [ ] **Analyzer**'s data stracture and API. -- [ ] **Query Service** will get data from **Analyzer**. +5. **Analyzer** restructure the data and provide query and analysis API. +6. **Query Service** expose API for **PDP** to query data usage information. diff --git a/docs/how-to/run-with-docker.md b/docs/how-to/run-with-docker.md index e587e6b..34dc1ec 100644 --- a/docs/how-to/run-with-docker.md +++ b/docs/how-to/run-with-docker.md @@ -1,6 +1,5 @@ # Run with docker - > Get the image from [dockerhub](https://hub.docker.com/r/dataucon/duetector/). BCC relies on kernel headers, either by turning on the kernel compilation parameter `CONFIG_IKHEADERS=m` or by installing the `kernel-development-package` provided by the distribution. @@ -14,6 +13,7 @@ There are two options: ```Bash docker run -it --rm --privileged \ -p 8888:8888 \ +-p 8120:8120 \ --entrypoint bash \ -v /sys/kernel/kheaders.tar.xz:/sys/kernel/kheaders.tar.xz \ -v /sys/kernel/debug:/sys/kernel/debug \ @@ -31,6 +31,7 @@ Then mount `/lib/modules` into the container: ```Bash docker run -it --rm --privileged \ -p 8888:8888 \ +-p 8120:8120 \ --entrypoint bash \ -v /lib/modules:/lib/modules \ -v /sys/kernel/debug:/sys/kernel/debug \ @@ -42,6 +43,7 @@ dataucon/duetector ```Bash docker run -it --rm --privileged \ -p 8888:8888 \ +-p 8120:8120 \ --entrypoint bash \ -v /lib/modules:/lib/modules \ -v /usr/src:/usr/src \ diff --git a/docs/how-to/run-with-kata-containers.md b/docs/how-to/run-with-kata-containers.md index 9fcd192..a5a1405 100644 --- a/docs/how-to/run-with-kata-containers.md +++ b/docs/how-to/run-with-kata-containers.md @@ -70,6 +70,7 @@ Linux 856b63ebabcc 6.1.38 #2 SMP Fri Aug 11 16:20:36 CST 2023 x86_64 Linux sudo nerdctl run \ -it \ -p 8888:8888 \ +-p 8120:8120 \ --runtime=io.containerd.kata.v2 \ --cap-add=sys_admin \ --entrypoint bash \ diff --git a/docs/usercases/README.md b/docs/usercases/README.md index 0d538fa..eb0ee2a 100644 --- a/docs/usercases/README.md +++ b/docs/usercases/README.md @@ -6,7 +6,8 @@ - Tracking ML jobs in kata containers. [tracking-mljob-in-kata-containers](./tracking-mljob-in-kata-containers/README.md) - # Configurations Reference -TODO: Wait for readthedocs +Docs available at: https://readthedocs.org/projects/duetector/ + +All configurable class inherit from [`Configurable` class](https://duetector.readthedocs.io/en/latest/config.html#duetector.config.Configuable). diff --git a/docs/usercases/tracking-mljob-in-kata-containers/README.md b/docs/usercases/tracking-mljob-in-kata-containers/README.md index dd550ba..b57d084 100644 --- a/docs/usercases/tracking-mljob-in-kata-containers/README.md +++ b/docs/usercases/tracking-mljob-in-kata-containers/README.md @@ -22,6 +22,7 @@ mkdir ./duetector-kata sudo nerdctl run \ -it --rm \ -p 8888:8888 \ +-p 8120:8120 \ -e DUETECTOR_DAEMON_WORKDIR=/duetector-kata \ -v $(pwd)/duetector-kata:/duetector-kata \ --runtime=io.containerd.kata.v2 \ diff --git a/duetector/config.py b/duetector/config.py index 62712dd..3b5f17e 100644 --- a/duetector/config.py +++ b/duetector/config.py @@ -172,8 +172,9 @@ def dump_config(self, config_dict: Dict[str, Any], path: Union[str, Path]): class Configuable: """ - A base class for all configuable classes + A base class for all configuable classes. + It's recommended to use CLI to generate config file as ``config_scope`` may be masked ``manager``. Attributes: default_config (Dict[str, Any]): default config for this class diff --git a/examples/README.md b/examples/README.md index e69de29..ede2b6f 100644 --- a/examples/README.md +++ b/examples/README.md @@ -0,0 +1,5 @@ +# Duetector's examples + +## Contents + +- [extension](./extension/): Duetector's extension examples diff --git a/examples/config/README.md b/examples/config/README.md deleted file mode 100644 index 028e8de..0000000 --- a/examples/config/README.md +++ /dev/null @@ -1,3 +0,0 @@ -The default configuration file is now included in the distribution. - -Please refre to [config.toml](../../duetector/static/config.toml)