Seccomp, standing for Secure Computing mode, is a security feature of the Linux kernel designed to filter system calls. It restricts processes to a limited set of system calls (exit(), sigreturn(), read(), and write() for already-open file descriptors). If a process tries to call anything else, it gets terminated by the kernel using SIGKILL or SIGSYS. This mechanism doesn't virtualize resources but isolates the process from them.
There are two ways to activate seccomp: through the prctl(2) system call with PR_SET_SECCOMP, or for Linux kernels 3.17 and above, the seccomp(2) system call. The older method of enabling seccomp by writing to /proc/self/seccomp has been deprecated in favor of prctl().
An enhancement, seccomp-bpf, adds the capability to filter system calls with a customizable policy, using Berkeley Packet Filter (BPF) rules. This extension is leveraged by software such as OpenSSH, vsftpd, and the Chrome/Chromium browsers on Chrome OS and Linux for flexible and efficient syscall filtering, offering an alternative to the now unsupported systrace for Linux.
Original/Strict Mode
In this mode Seccomp only allow the syscallsexit(), sigreturn(), read() and write() to already-open file descriptors. If any other syscall is made, the process is killed using SIGKILL
seccomp_strict.c
#include<fcntl.h>#include<stdio.h>#include<unistd.h>#include<string.h>#include<linux/seccomp.h>#include<sys/prctl.h>//From https://sysdig.com/blog/selinux-seccomp-falco-technical-discussion///gcc seccomp_strict.c -o seccomp_strictintmain(int argc,char**argv){int output =open("output.txt", O_WRONLY);constchar*val ="test";//enables strict seccomp modeprintf("Calling prctl() to set seccomp strict mode...\n");prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT);//This is allowed as the file was already openedprintf("Writing to an already open file...\n");write(output, val, strlen(val)+1);//This isn't allowedprintf("Trying to open file for reading...\n");int input =open("output.txt", O_RDONLY);printf("You will not see this message--the process will be killed first\n");}
Seccomp-bpf
This mode allows filtering of system calls using a configurable policy implemented using Berkeley Packet Filter rules.
seccomp_bpf.c
#include<seccomp.h>#include<unistd.h>#include<stdio.h>#include<errno.h>//https://security.stackexchange.com/questions/168452/how-is-sandboxing-implemented/175373//gcc seccomp_bpf.c -o seccomp_bpf -lseccompvoidmain(void) { /* initialize the libseccomp context */ scmp_filter_ctx ctx =seccomp_init(SCMP_ACT_KILL); /* allow exiting */printf("Adding rule : Allow exit_group\n");seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group),0); /* allow getting the current pid *///printf("Adding rule : Allow getpid\n");//seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(getpid), 0);printf("Adding rule : Deny getpid\n");seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EBADF), SCMP_SYS(getpid),0); /* allow changing data segment size, as required by glibc */printf("Adding rule : Allow brk\n");seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(brk),0); /* allow writing up to 512 bytes to fd 1 */printf("Adding rule : Allow write upto 512 bytes to FD 1\n");seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write),2, SCMP_A0(SCMP_CMP_EQ,1), SCMP_A2(SCMP_CMP_LE,512)); /* if writing to any other fd, return -EBADF */printf("Adding rule : Deny write to any FD except 1 \n");seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EBADF), SCMP_SYS(write),1, SCMP_A0(SCMP_CMP_NE,1)); /* load and enforce the filters */printf("Load rules and enforce \n");seccomp_load(ctx);seccomp_release(ctx);//Get the getpid is denied, a weird number will be returned like//this process is -9printf("this process is %d\n", getpid());}
If you want for example to forbid a container of executing some syscall like uname you could download the default profile from https://github.com/moby/moby/blob/master/profiles/seccomp/default.json and just remove the uname string from the list.
If you want to make sure that some binary doesn't work inside a a docker container you could use strace to list the syscalls the binary is using and then forbid them.
In the following example the syscalls of uname are discovered:
In the above profile, we have set default action to “allow” and created a black list to disable “chmod”. To be more secure, we can set default action to drop and create a white list to selectively enable system calls.
Following output shows the “chmod” call returning error because its disabled in the seccomp profile
Following output shows the “docker inspect” displaying the profile:
"SecurityOpt": [ "seccomp:{\"defaultAction\":\"SCMP_ACT_ALLOW\",\"syscalls\":[{\"name\":\"chmod\",\"action\":\"SCMP_ACT_ERRNO\"}]}"
<div data-gb-custom-block data-tag="hint" data-style='success'>Learn & practice AWS Hacking:<img src="/.gitbook/assets/arte.png" alt="" data-size="line">[**HackTricks Training AWS Red Team Expert (ARTE)**](https://training.hacktricks.xyz/courses/arte)<img src="/.gitbook/assets/arte.png" alt="" data-size="line">\
Learn & practice GCP Hacking: <img src="/.gitbook/assets/grte.png" alt="" data-size="line">[**HackTricks Training GCP Red Team Expert (GRTE)**<img src="/.gitbook/assets/grte.png" alt="" data-size="line">](https://training.hacktricks.xyz/courses/grte)
<details><summary>Support HackTricks</summary>* Check the [**subscription plans**](https://github.com/sponsors/carlospolop)!* **Join the** 💬 [**Discord group**](https://discord.gg/hRep4RUj7f) or the [**telegram group**](https://t.me/peass) or **follow** us on **Twitter** 🐦 [**@hacktricks\_live**](https://twitter.com/hacktricks\_live)**.**
* **Share hacking tricks by submitting PRs to the** [**HackTricks**](https://github.com/carlospolop/hacktricks) and [**HackTricks Cloud**](https://github.com/carlospolop/hacktricks-cloud) github repos.
</details></div></details></div></details></div></details></div></details></div></details></div></details></div></details></div></details></div></details></div>