Performance Profiling

Flame Graph Profiling

FlameGraph is a great tool for generating interactive flamegraphs based collected profiling data. The github page has thorough documentation on how to use the tool, but an overview is presented below:

First, install perf, which on Linux can be done via:

sudo apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`

Tip: If you're on Windows using WSL2 with Ubuntu 20.04, you may find that the above fails with this error message:

E: Unable to locate package linux-tools-4.19.104-microsoft-standard
E: Couldn't find any package by glob 'linux-tools-4.19.104-microsoft-standard'
E: Couldn't find any package by regex 'linux-tools-4.19.104-microsoft-standard'

You may have luck building and installing perf yourself using Microsoft's WSL2 Linux Kernel sources. At the time of writing, this doesn't work with the master branch because the perf build appears to be broken with newer versions of glibc. Instead, we can use the linux-msft-wsl-5.10.16.3 tag:

sudo apt install flex bison
git clone https://github.com/microsoft/WSL2-Linux-Kernel --depth 1 --branch linux-msft-wsl-5.10.16.3
cd WSL2-Linux-Kernel/tools/perf
make -j8
sudo cp perf /usr/local/bin

Technique courtesy of: https://gist.github.com/abel0b/b1881e41b9e1c4b16d84e5e083c38a13

Modify the system settings to allow perf record to be run by a standard user:

sudo su     # Run the following as root
cat /proc/sys/kernel/perf_event_paranoid
echo "1" > /proc/sys/kernel/perf_event_paranoid
exit

Invoke clamscan via perf record as follows, and run perf script to collect the profiling data. Note that in this example, ClamAV was compiled in a build subdirectory and installed to build/install:

perf record -F 100 -g -- ./install/bin/clamscan -d ./unit_tests/clamav.hdb --allmatch ./test/
perf script > /tmp/out.perf

The -F parameter indicates how many samples should be collected during program execution. If your scan will take a long time to run, a lower value should be sufficient. Otherwise, consider choosing a higher value (on Ubuntu 18.04, 7250 is the max frequency, but it can be increased via /proc/sys/kernel/perf_event_max_sample_rate.

Clone out the FlameGraph project and run the following commands inside the FlameGraph directory to generate the flame graph:

git clone https://github.com/brendangregg/FlameGraph.git
cd FlameGraph

perl stackcollapse-perf.pl /tmp/out.perf > /tmp/out.folded
perl flamegraph.pl /tmp/out.folded > /tmp/test.svg

The SVG that is generated is interactive, but some viewers don't support this. Be sure to open it in a web browser like Chrome to be able to take full advantage of it.

Here's an example flamegraph generated by scanning ClamAV test files:

Example Flamegraph

Call Graph Profiling - Callgrind

Callgrind is a profiling tool included with valgrind. This can be done by prepending valgrind --tool=callgrind to the clamscan command.

kcachegrind is a follow-on tool that will graphically present the profiling data and allow you to explore it visually, although if you don't already use KDE you'll have to install lots of extra packages to use it.

System Call Tracing / Fault Injection

strace can be used to track the system calls that are performed and provide the number of calls / time spent in each system call. This can be done by prepending strace -c to a clamscan command. Results will look something like this:

    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
    95.04    0.831430          13     62518           read
    3.22    0.028172          14      2053           munmap
    0.69    0.006005           3      2102           mmap
    0.28    0.002420           7       344           pread64
    0.16    0.001415           5       305         1 openat
    0.13    0.001108           3       405           write
    0.11    0.000932          23        40           mprotect
    0.07    0.000632           2       310           close
    0.07    0.000583           9        67        30 access
    0.05    0.000395           1       444           lseek
    0.04    0.000344           2       162           fstat
    0.04    0.000338           1       253           brk
    0.03    0.000262           1       422           fcntl
    0.02    0.000218          16        14           futex
    0.01    0.000119           1       212           getpid
    0.01    0.000086          14         6           getdents
    0.00    0.000043           7         6           dup
    0.00    0.000040           1        31           unlink
    0.00    0.000038          19         2           rt_sigaction
    0.00    0.000037          19         2           rt_sigprocmask
    0.00    0.000029           1        37           stat
    0.00    0.000022          11         2           prlimit64
    0.00    0.000021          21         1           sysinfo
    0.00    0.000020           1        33           clock_gettime
    0.00    0.000019          19         1           arch_prctl
    0.00    0.000018          18         1           set_tid_address
    0.00    0.000018          18         1           set_robust_list
    0.00    0.000013           0        60           lstat
    0.00    0.000011           0        65           madvise
    0.00    0.000002           0        68           geteuid
    0.00    0.000000           0         1           execve
    0.00    0.000000           0         1           uname
    0.00    0.000000           0         1           getcwd
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.874790                 69970        31 total

strace can also be used for cool things like system call fault injection. For instance, let's say you are curious whether the read bytecode API call is implemented in such a way that the underlying read system call could handle EINTR being returned (which can happen periodically). To test this, write the following bytecode rule:

    VIRUSNAME_PREFIX("BC.Heuristic.Test.Read.Passed")
    VIRUSNAMES("")
    TARGET(0)

    SIGNATURES_DECL_BEGIN
    DECLARE_SIGNATURE(zeroes)
    SIGNATURES_DECL_END

    SIGNATURES_DEF_BEGIN
    DEFINE_SIGNATURE(zeroes, "0:0000")
    SIGNATURES_DEF_END

    bool logical_trigger()
    {
        return matches(Signatures.zeroes);
    }

    #define READ_S(value, size) if (read(value, size) != size) return 0;

    int entrypoint(void)
    {
        char buffer[65536];
        int i;

        for (i = 0; i < 256; i++)
        {
            debug(i);
            debug("\n");
            READ_S(buffer, sizeof(buffer));
        }

        foundVirus("");
        return 0;
    }

Compiled the rule, and make a test file to match against it. Then run it under strace to determine what underlying read system call is being used for the bytecode read function:

clambc-compiler read_test.bc
dd if=/dev/zero of=/tmp/zeroes bs=65535 count=256
strace clamscan -d read_test.cbc --bytecode-unsigned /tmp/zeroes

It uses pread64 under the hood, so the following command could be used for fault injection:

strace -e fault=pread64:error=EINTR:when=20+10 clamscan -d read_test.cbc --bytecode-unsigned /tmp/zeroes

This command tells strace to skip the first 20 pread64 calls (these appear to be used by the loader, which didn't seem to handle EINTR correctly) but to inject EINTR for every 10th call afterward. We can see the injection in action and that the system call is retried successfully:

    pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15007744) = 65536
    pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15073280) = 65536
    pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15138816) = 65536
    pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15204352) = 65536
    pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15269888) = 65536
    pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15335424) = 65536
    pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15400960) = 65536
    pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15466496) = 65536
    pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15532032) = 65536
    pread64(3, 0x7f6a7ff43000, 65536, 15597568) = -1 EINTR (Interrupted system call) (INJECTED)
    pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15597568) = 65536

More documentation on using strace to perform system call fault injection, see this presentation from FOSDEM 2017.