Performance Profiling
Flame Graph Profiling
FlameGraph is a great tool for generating interactive flamegraphs based collected profiling data. The github page has thorough documentation on how to use the tool, but an overview is presented below:
First, install perf
, which on Linux can be done via:
sudo apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`
Tip: If you're on Windows using WSL2 with Ubuntu 20.04, you may find that the above fails with this error message:
E: Unable to locate package linux-tools-4.19.104-microsoft-standard E: Couldn't find any package by glob 'linux-tools-4.19.104-microsoft-standard' E: Couldn't find any package by regex 'linux-tools-4.19.104-microsoft-standard'
You may have luck building and installing
perf
yourself using Microsoft's WSL2 Linux Kernel sources. At the time of writing, this doesn't work with themaster
branch because theperf
build appears to be broken with newer versions of glibc. Instead, we can use thelinux-msft-wsl-5.10.16.3
tag:sudo apt install flex bison git clone https://github.com/microsoft/WSL2-Linux-Kernel --depth 1 --branch linux-msft-wsl-5.10.16.3 cd WSL2-Linux-Kernel/tools/perf make -j8 sudo cp perf /usr/local/bin
Technique courtesy of: https://gist.github.com/abel0b/b1881e41b9e1c4b16d84e5e083c38a13
Modify the system settings to allow perf
record to be run by a standard user:
sudo su # Run the following as root
cat /proc/sys/kernel/perf_event_paranoid
echo "1" > /proc/sys/kernel/perf_event_paranoid
exit
Invoke clamscan
via perf record
as follows, and run perf script
to collect the profiling data. Note that in this example, ClamAV was compiled in a build
subdirectory and installed to build/install
:
perf record -F 100 -g -- ./install/bin/clamscan -d ./unit_tests/clamav.hdb --allmatch ./test/
perf script > /tmp/out.perf
The -F
parameter indicates how many samples should be collected during program execution. If your scan will take a long time to run, a lower value should be sufficient. Otherwise, consider choosing a higher value (on Ubuntu 18.04, 7250 is the max frequency, but it can be increased via /proc/sys/kernel/perf_event_max_sample_rate
.
Clone out the FlameGraph project and run the following commands inside the FlameGraph directory to generate the flame graph:
git clone https://github.com/brendangregg/FlameGraph.git
cd FlameGraph
perl stackcollapse-perf.pl /tmp/out.perf > /tmp/out.folded
perl flamegraph.pl /tmp/out.folded > /tmp/test.svg
The SVG that is generated is interactive, but some viewers don't support this. Be sure to open it in a web browser like Chrome to be able to take full advantage of it.
Here's an example flamegraph generated by scanning ClamAV test files:
Call Graph Profiling - Callgrind
Callgrind is a profiling tool included with valgrind
. This can be done by prepending valgrind --tool=callgrind
to the clamscan
command.
kcachegrind is a follow-on tool that will graphically present the profiling data and allow you to explore it visually, although if you don't already use KDE you'll have to install lots of extra packages to use it.
System Call Tracing / Fault Injection
strace can be used to track the system calls that are performed and provide the number of calls / time spent in each system call. This can be done by prepending strace -c
to a clamscan
command. Results will look something like this:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
95.04 0.831430 13 62518 read
3.22 0.028172 14 2053 munmap
0.69 0.006005 3 2102 mmap
0.28 0.002420 7 344 pread64
0.16 0.001415 5 305 1 openat
0.13 0.001108 3 405 write
0.11 0.000932 23 40 mprotect
0.07 0.000632 2 310 close
0.07 0.000583 9 67 30 access
0.05 0.000395 1 444 lseek
0.04 0.000344 2 162 fstat
0.04 0.000338 1 253 brk
0.03 0.000262 1 422 fcntl
0.02 0.000218 16 14 futex
0.01 0.000119 1 212 getpid
0.01 0.000086 14 6 getdents
0.00 0.000043 7 6 dup
0.00 0.000040 1 31 unlink
0.00 0.000038 19 2 rt_sigaction
0.00 0.000037 19 2 rt_sigprocmask
0.00 0.000029 1 37 stat
0.00 0.000022 11 2 prlimit64
0.00 0.000021 21 1 sysinfo
0.00 0.000020 1 33 clock_gettime
0.00 0.000019 19 1 arch_prctl
0.00 0.000018 18 1 set_tid_address
0.00 0.000018 18 1 set_robust_list
0.00 0.000013 0 60 lstat
0.00 0.000011 0 65 madvise
0.00 0.000002 0 68 geteuid
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 uname
0.00 0.000000 0 1 getcwd
------ ----------- ----------- --------- --------- ----------------
100.00 0.874790 69970 31 total
strace
can also be used for cool things like system call fault injection. For instance, let's say you are curious whether the read
bytecode API call is implemented in such a way that the underlying read
system call could handle EINTR
being returned (which can happen periodically). To test this, write the following bytecode rule:
VIRUSNAME_PREFIX("BC.Heuristic.Test.Read.Passed")
VIRUSNAMES("")
TARGET(0)
SIGNATURES_DECL_BEGIN
DECLARE_SIGNATURE(zeroes)
SIGNATURES_DECL_END
SIGNATURES_DEF_BEGIN
DEFINE_SIGNATURE(zeroes, "0:0000")
SIGNATURES_DEF_END
bool logical_trigger()
{
return matches(Signatures.zeroes);
}
#define READ_S(value, size) if (read(value, size) != size) return 0;
int entrypoint(void)
{
char buffer[65536];
int i;
for (i = 0; i < 256; i++)
{
debug(i);
debug("\n");
READ_S(buffer, sizeof(buffer));
}
foundVirus("");
return 0;
}
Compiled the rule, and make a test file to match against it. Then run it under strace
to determine what underlying read system call is being used for the bytecode read
function:
clambc-compiler read_test.bc
dd if=/dev/zero of=/tmp/zeroes bs=65535 count=256
strace clamscan -d read_test.cbc --bytecode-unsigned /tmp/zeroes
It uses pread64
under the hood, so the following command could be used for fault injection:
strace -e fault=pread64:error=EINTR:when=20+10 clamscan -d read_test.cbc --bytecode-unsigned /tmp/zeroes
This command tells strace
to skip the first 20 pread64
calls (these appear to be used by the loader, which didn't seem to handle EINTR
correctly) but to inject EINTR
for every 10th call afterward. We can see the injection in action and that the system call is retried successfully:
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15007744) = 65536
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15073280) = 65536
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15138816) = 65536
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15204352) = 65536
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15269888) = 65536
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15335424) = 65536
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15400960) = 65536
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15466496) = 65536
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15532032) = 65536
pread64(3, 0x7f6a7ff43000, 65536, 15597568) = -1 EINTR (Interrupted system call) (INJECTED)
pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15597568) = 65536
More documentation on using strace
to perform system call fault injection, see this presentation from FOSDEM 2017.