Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/linux: syz-describe: auto generate syzlang #3129

Closed
wants to merge 0 commits into from
Closed

sys/linux: syz-describe: auto generate syzlang #3129

wants to merge 0 commits into from

Conversation

ZHYfeng
Copy link
Contributor

@ZHYfeng ZHYfeng commented May 7, 2022

Our tool~ (SyzDescribe) could automated static generate basic syscall descriptions~ (syzlang format) for Linux kernel drivers.
Based on our evaluation, SyzDescribe is far more accurate than DIFUZE in identifying kernel drivers~ (no FP and only FN) and their device file names~ (still have a few FP and FN).
We will open source SyzDescribe after the paper is accepted~ (already submitted, hope we can open source it soon).

(Related Issue (#590))

  • Scope of SyzDescribe:
    • Character or Block drivers in Linux kernel
      • syscall handler structure
      • device file names for open()
      • command value and argument type for ioctl()
    • Non-open file descriptor dependency from ioctl()
      • fd returned from ioctl() and used in new ioctl(), e.g., resource fd_kvmvm[fd] in KVM
  • Techniques of SyzDescribe:
    • Static analysis in C++ based on LLVM 13

The generated syscall descriptions could be directly used in syzkaller based on doc

make extract TARGETOS=linux SOURCEDIR=$KSRC
make generate
make

However, those auto generated syscall descriptions are not friendly to humans.
I am working on giving them some meaningful names or file paths now.

Current plan:

  • file name of those generated ones, e.g., syzdescribe + kernel version + source code relative path + something else?
  • give resource fd meaningful names
  • try to use macro instead of value of cmd in ioctl() (not sure whether this is possible for LLVM bitcode)
  • try to use name in source code of structure and their fields
  • add some comments in generated syscall description from debug info for human, e.g., source code relative path + line number

The commit is current results directly from SyzDescribe, which are mainly for drivers with existing syscall descriptions~ (will generate results for the whole kernel in the future).
How do you think about this and what else are needed to accept them in syzkaller?

@dvyukov
Copy link
Collaborator

dvyukov commented May 9, 2022

This is cool! I am running syz-manager with the following syscall filter:

	"enable_syscalls": [
		"openat$_*",
		"syz_open_dev$_*",
		"ioctl$_*"
	]

and it's slowly gaining a substantial amount of coverage.

Some initial thoughts:

  1. CI complains on missing copyright headers, I think it's reasonable to add "Code generated" comments to the generated files. It will prevent the warnings.
grep "Code generated" pkg/*/*.go
pkg/build/linux_generated.go:// Code generated by pkg/build/linux.go. DO NOT EDIT.
pkg/compiler/const_file.go:	fmt.Fprintf(buf, "# Code generated by syz-sysgen. DO NOT EDIT.\n")
pkg/compiler/const_file_test.go:	const serialized = `# Code generated by syz-sysgen. DO NOT EDIT.
pkg/csource/generated.go:// Code generated by gen.go from executor/*.h. DO NOT EDIT.
pkg/csource/gen.go:	fmt.Fprintf(out, "// Code generated by gen.go from executor/*.h. DO NOT EDIT.\n\n")
pkg/html/generated.go:// Code generated by pkg/html/html.go. DO NOT EDIT.
  1. Some other tests are failing on the CI. One way or another they need to pass. In some cases it will mean adjusting test expectations. The tests in tools/syz-trace2syz probably can be just disabled (it's a research work that I don't know that anybody used and it only causes test failures).

  2. Yes, having field names extracted from source would be useful. Maybe they are available in the llvm debug info nodes.
    But we also don't need to fix all of the readability improvements at once (there is an infinite amount of possible improvements).

  3. Some general transparency of the descriptions may help. Namely, what source files/functions/lines things come from. For example I am looking at this one:

syz_open_dev$__mpt3sas_init_ctl_fops_syzgen_3(dev ptr[in, string["/dev/host#"]], id intptr, flags flags[open_flags]) __mpt3sas_init_ctl_fops_fd

and "/dev/host" is not present in my VM and I can't easily find even what driver was supposed to create this and where this "host" name come from.
Emitting source info annotations per each syscall may pollute the descriptions too much. I don't what the right solution here. Maybe annotating only openat/syz_open_dev, e.g.:

# See foobar_init() in drivers/foo/bar/source.c:123
syz_open_dev$__mpt3sas_init_ctl_fops_syzgen_3(dev ptr[in, string["/dev/host#"]], id intptr, flags flags[open_flags]) __mpt3sas_init_ctl_fops_fd

This is probably what you meant by:

add some comments in generated syscall description from debug info for human, e.g., source code relative path + line number

@dvyukov
Copy link
Collaborator

dvyukov commented May 9, 2022

file name of those generated ones, e.g., syzdescribe + kernel version + source code relative path + something else?

Yes, we need some common prefix, or maybe put them into a separate dir. Using kernel version is not good. We will re-generate them periodically and names shouldn't change. We need to capture the revision somewhere else.
We also need some prefix for syscalls (the current "_" looks a bit hacky). Something generic, short and unique. Something like "auto", but "auto" can probably conflict with manual descriptions.

@dvyukov
Copy link
Collaborator

dvyukov commented May 9, 2022

  1. The problem with ioctl values is not just naming, the values are also arch-specific.
    We've recently got support for per-file metadata and per-arch descriptions:
    44a5ca6
    So think this should generate files like "sys/linux/*_amd64.txt" and include:
meta arches["amd64"]

also we could add:

meta autogenerated

if we need if for something.

@dvyukov
Copy link
Collaborator

dvyukov commented May 9, 2022

  1. Some issues I've spotted in the generated file names:
    There presumably should be turned into syz_open_dev since openat does not substitute "#":
disabling openat$_azx_driver_init_snd_ctl_f_ops_syzgen_3: open(/dev/hwC#D#) failed: no such file or directory
disabling openat$_azx_driver_init_snd_hwdep_f_ops_syzgen_3: open(/dev/snd/hwC#D#) failed: no such file or directory
disabling openat$_chr_dev_init_console_fops_syzgen_5: open(/dev/vcs%u) failed: no such file or directory
disabling openat$_chr_dev_init_console_fops_syzgen_7: open(/dev/vcsa%u) failed: no such file or directory
disabling openat$_chr_dev_init_console_fops_syzgen_9: open(/dev/vcsu%u) failed: no such file or directory
disabling openat$_init_ch_module_changer_fops_syzgen_1: open(/dev/s%s) failed: no such file or directory
disabling openat$_usb_init_usbdev_file_operations_syzgen_1: open(/dev/ep_%02x) failed: no such file or directory

tun is located at /dev/net/tun:

disabling openat$_tun_init_tun_fops_syzgen_1: open(/dev/tun) failed: no such file or directory

@codecov
Copy link

codecov bot commented May 18, 2022

Codecov Report

Merging #3129 (7847986) into master (e60b110) will increase coverage by 0.0%.
The diff coverage is n/a.

Impacted Files Coverage Δ
pkg/instance/instance.go 13.1% <0.0%> (-0.1%) ⬇️
dashboard/app/api.go 61.1% <0.0%> (-<0.1%) ⬇️
pkg/vcs/git.go 58.4% <0.0%> (ø)
dashboard/app/dashboard.go 0.0% <0.0%> (ø)
pkg/runtest/run.go 65.6% <0.0%> (+0.1%) ⬆️
dashboard/app/main.go 72.3% <0.0%> (+0.8%) ⬆️
dashboard/app/handler.go 79.6% <0.0%> (+4.9%) ⬆️
dashboard/app/access.go 66.3% <0.0%> (+5.0%) ⬆️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants