Libclamav

Libclamav provides an easy and effective way to add a virus protection into your software. The library is thread-safe and transparently recognizes and scans within archives, mail files, MS Office document files, executables and other special formats.

License

Libclamav is licensed under the GNU GPL v2 license. This means you are not allowed to link commercial, closed-source software against it. All software using libclamav must be GPL compliant.

Supported formats and features

Executables

The library has a built-in support for 32- and 64-bit Portable Executable, ELF and Mach-O files. Additionally, it can handle PE files compressed or obfuscated with the following tools:

  • Aspack (2.12)
  • UPX
    • PE (Windows) built-in
    • ELF, Mach-O enabled via bytecode signatures
  • FSG (1.3, 1.31, 1.33, 2.0)
  • Petite (2.x)
  • PeSpin (1.1)
  • NsPack
  • wwpack32 (1.20)
  • MEW
  • Upack
  • Y0da Cryptor (1.3)

Mail files

Libclamav can handle almost every mail file format including TNEF (winmail.dat) attachments.

Archives and compressed files

The following archive and compression formats are supported by internal handlers:

  • Zip (+ SFX)
  • RAR (+ SFX)
  • 7Zip
  • Tar
  • CPIO
  • Gzip
  • Bzip2
  • DMG
  • IMG
  • ISO 9660
  • PKG
  • HFS+ partition
  • HFSX partition
  • APM disk image
  • GPT disk image
  • MBR disk image
  • XAR
  • XZ
  • MS OLE2
  • MS Cabinet Files (+ SFX)
  • MS CHM (Compiled HTML)
  • MS SZDD compression format
  • BinHex
  • SIS (SymbianOS packages)
  • AutoIt
  • NSIS
  • InstallShield

Documents

The most popular file formats are supported:

  • MS Office and MacOffice files
  • RTF
  • PDF
  • HTML

In the case of Office, RTF and PDF files, libclamav will only extract the embedded objects and will not decode the text data itself. The text decoding and normalization is only performed for HTML files.

Data Loss Prevention

Libclamav includes a DLP module which can detect the following credit card issuers: AMEX, VISA, MasterCard, Discover, Diner’s Club, and JCB and U.S. social security numbers inside text files.

Future versions of Libclamav may include additional features to detect other credit cards and other forms of PII (Personally Identifiable Information) which may be transmitted without the benefit of being encrypted.

Others

Libclamav can handle various obfuscators, encoders, files vulnerable to security risks such as:

  • JPEG (exploit detection)
  • RIFF (exploit detection)
  • uuencode
  • ScrEnc obfuscation
  • CryptFF

API

Header file

Every program using libclamav must include the header file clamav.h:

    #include "clamav.h"

Initialization

Before using libclamav, you should call cl_init() to initialize it. CL_INIT_DEFAULT is a macro that can be passed to cl_init() representing the default initialization settings. When it’s done, you’re ready to create a new scan engine by calling cl_engine_new(). To free resources allocated by the engine use cl_engine_free(). Function prototypes:

    int cl_init(unsigned int options);
    struct cl_engine *cl_engine_new(void);
    int cl_engine_free(struct cl_engine *engine);

cl_init() and cl_engine_free() return CL_SUCCESS on success or another code on error. cl_engine_new() return a pointer or NULL if there’s not enough memory to allocate a new engine structure.

Database loading

The following set of functions provides an interface for loading the virus database:

    const char *cl_retdbdir(void);

    int cl_load(const char *path, struct cl_engine *engine,
            unsigned int *signo, unsigned int options);

cl_retdbdir() returns the default (hardcoded) path to the directory with ClamAV databases. cl_load() loads a single database file or all databases from a given directory (when path points to a directory). The second argument is used for passing in the pointer to the engine that should be previously allocated with cl_engine_new(). A number of loaded signatures will be added to signo. The last argument can pass the following flags:

  • CL_DB_STDOPT This is an alias for a recommended set of scan options.
  • CL_DB_PHISHING Load phishing signatures.
  • CL_DB_PHISHING_URLS Initialize the phishing detection module and load .wdb and .pdb files.
  • CL_DB_PUA Load signatures for Potentially Unwanted Applications.
  • CL_DB_OFFICIAL_ONLY Only load official signatures from digitally signed databases.
  • CL_DB_BYTECODE Load bytecode.

cl_load() returns CL_SUCCESS on success and another code on failure.

        ...
        struct cl_engine *engine;
        unsigned int sigs = 0;
        int ret;

    if((ret = cl_init(CL_INIT_DEFAULT)) != CL_SUCCESS) {
        printf("cl_init() error: %s\n", cl_strerror(ret));
        return 1;
    }

    if(!(engine = cl_engine_new())) {
        printf("Can't create new engine\n");
        return 1;
    }

    ret = cl_load(cl_retdbdir(), engine, &sigs, CL_DB_STDOPT);

Database verification

The cl_load() API will verify that the database is signed and is correct, although it will also return CL_SUCCESS for non-database files that of course cannot be loaded.

You can, however, use the cl_cvdverify() API to verify a database directly:

/**
 * @brief Verify a CVD file by loading and unloading it.
 *
 * @param file          Filepath of CVD file.
 * @return cl_error_t   CL_SUCCESS if success, else a CL_E* error code.
 */
extern cl_error_t cl_cvdverify(const char *file);

As the comment block explains, this will load-test the database. Be advised that for some larger databases, this may use a fair bit system RAM.

Error handling

Use cl_strerror() to convert error codes into human readable messages. The function returns a statically allocated string:

    if(ret != CL_SUCCESS) {
        printf("cl_load() error: %s\n", cl_strerror(ret));
        cl_engine_free(engine);
        return 1;
    }

Engine structure

When all required databases are loaded you should prepare the detection engine by calling cl_engine_compile(). In case of failure you should still free the memory allocated to the engine with cl_engine_free():

    int cl_engine_compile(struct cl_engine *engine);

In our example:

    if((ret = cl_engine_compile(engine)) != CL_SUCCESS) {
        printf("cl_engine_compile() error: %s\n", cl_strerror(ret));
        cl_engine_free(engine);
        return 1;
    }

Limits

When you create a new engine with cl_engine_new(), it will have all internal settings set to default values as recommended by the ClamAV authors. It’s possible to check and modify the values (numerical and strings) using the following set of functions:

    int cl_engine_set_num(struct cl_engine *engine,
    enum cl_engine_field field, long long num);

    long long cl_engine_get_num(const struct cl_engine *engine,
    enum cl_engine_field field, int *err);

    int cl_engine_set_str(struct cl_engine *engine,
    enum cl_engine_field field, const char *str);

    const char *cl_engine_get_str(const struct cl_engine *engine,
    enum cl_engine_field field, int *err);

Please don’t modify the default values unless you know what you’re doing. Refer to the ClamAV sources (clamscan, clamd) for examples.

Database checks

It’s very important to keep the internal instance of the database up to date. You can watch database changes with the cl_stat..() family of functions.

    int cl_statinidir(const char *dirname, struct cl_stat *dbstat);
    int cl_statchkdir(const struct cl_stat *dbstat);
    int cl_statfree(struct cl_stat *dbstat);

Initialization:

        ...
        struct cl_stat dbstat;

    memset(&dbstat, 0, sizeof(struct cl_stat));
    cl_statinidir(dbdir, &dbstat);

To check for a change you just need to call cl_statchkdir and check its return value (0 - no change, 1 - some change occurred). Remember to reset the cl_stat structure after reloading the database.

    if(cl_statchkdir(&dbstat) == 1) {
        reload_database...;
        cl_statfree(&dbstat);
        cl_statinidir(cl_retdbdir(), &dbstat);
    }

Libclamav includes and additional call to check the number of signatures that can be loaded from a given directory:

    int cl_countsigs(const char *path, unsigned int countoptions,
        unsigned int *sigs);

The first argument points to the database directory, the second one specifies what signatures should be counted: CL_COUNTSIGS_OFFICIAL (official signatures), CL_COUNTSIGS_UNOFFICIAL (third party signatures), CL_COUNTSIGS_ALL (all signatures). The last argument points to the counter to which the number of detected signatures will be added (therefore the counter should be initially set to 0). The call returns CL_SUCCESS or an error code.

Data scan functions

It’s possible to scan a file or descriptor using:

    int cl_scanfile(
        const char *filename,
        const char **virname,
        unsigned long int *scanned,
        const struct cl_engine *engine,
        struct cl_scan_options *options);

    int cl_scandesc(
        int desc,
        const char *filename,
        const char **virname,
        unsigned long int *scanned,
        const struct cl_engine *engine,
        struct cl_scan_options *options);

Both functions will store a virus name under the pointer virname, the virus name is part of the engine structure and must not be released directly. If the third argument (scanned) is not NULL, the functions will increase its value with the size of scanned data (in CL_COUNT_PRECISION units). The last argument (options) requires a pointer to a data structure that specifies the scan options. The data structure should be memset() Each variable in the structure is a bit-flag field. The structure definition is:

    struct cl_scan_options {
        uint32_t general;
        uint32_t parse;
        uint32_t alert;
        uint32_t heuristic_alert;
        uint32_t mail;
        uint32_t dev;
    };

Supported flags for each of the fields are as follows:

general - General scanning options.

  • CL_SCAN_GENERAL_ALLMATCHES Scan in all-match mode
  • CL_SCAN_GENERAL_COLLECT_METADATA Collect metadata (--gen-json)
  • CL_SCAN_GENERAL_HEURISTICS Option to enable heuristic alerts. Required for any of the heuristic alerting options to work.

parse - Options to enable/disable specific parsing capabilities. Generally you will want to enable all parsers. The easiest way to do this is to set the parse flags to ~0.

  • CL_SCAN_PARSE_ARCHIVE This flag enables transparent scanning of various archive formats.
  • CL_SCAN_PARSE_ELF Enable support for ELF files.
  • CL_SCAN_PARSE_PDF Enables scanning within PDF files.
  • CL_SCAN_PARSE_SWF Enables scanning within SWF files, notably compressed SWF.
  • CL_SCAN_PARSE_HWP Enables scanning of Hangul Word Processor (HWP) files.
  • CL_SCAN_PARSE_XMLDOCS Enables scanning of XML-formatted documents (e.g. Word, Excel, PowerPoint, HWP).
  • CL_SCAN_PARSE_MAIL Enable support for mail files.
  • CL_SCAN_PARSE_OLE2 Enables support for OLE2 containers (used by MS Office and .msi files).
  • CL_SCAN_PARSE_HTML This flag enables HTML normalization (including ScrEnc decryption).
  • CL_SCAN_PARSE_PE This flag enables deep scanning of Portable Executable files and allows libclamav to unpack executables compressed with run-time unpackers.

heuristic - Options to enable specific heuristic alerts

  • CL_SCAN_GENERAL_HEURISTIC_PRECEDENCE Allow heuristic match to take precedence. When enabled, if a heuristic scan (such as phishingScan) detects a possible virus/phish it will stop scan immediately. Recommended, saves CPU scan-time. When disabled, virus/phish detected by heuristic scans will be reported only at the end of a scan. If an archive contains both a heuristically detected virus/phishing, and a real malware, the real malware will be reported.
  • CL_SCAN_HEURISTIC_ENCRYPTED_ARCHIVE With this flag the library will mark encrypted archives as viruses (encrypted .zip, .7zip, .rar).
  • CL_SCAN_HEURISTIC_ENCRYPTED_DOC With this flag the library will mark encrypted documents as viruses (encrypted .pdf).
  • CL_SCAN_HEURISTIC_BROKEN libclamav will try to detect broken executables and mark them as Broken.Executable.
  • CL_SCAN_HEURISTIC_EXCEEDS_MAX Alert when the scan of any file exceeds maximums such as max-filesize, max-scansize, max-recursion level.
  • CL_SCAN_HEURISTIC_PHISHING_SSL_MISMATCH Heuristic for phishing module: alert on SSL mismatches in URLs.
  • CL_SCAN_HEURISTIC_PHISHING_CLOAK Heuristic for phishing module: alert on cloaked URLs.
  • CL_SCAN_HEURISTIC_MACROS OLE2 containers, which contain VBA macros will be marked infected (Heuristics.OLE2.ContainsMacros).
  • CL_SCAN_HEURISTIC_PARTITION_INTXN alert if partition table size doesn't make sense
  • CL_SCAN_HEURISTIC_STRUCTURED Enable the data loss prevention (DLP) module which scans for credit card and SSN numbers. i.e. alert when detecting personal information
  • CL_SCAN_HEURISTIC_STRUCTURED_SSN_NORMAL Search for [and alert when detecting] SSNs formatted as xx-yy-zzzz.
  • CL_SCAN_HEURISTIC_STRUCTURED_SSN_STRIPPED Search for [and alert when detecting] SSNs formatted as xxyyzzzz.

mail - Options to enable specific mail parsing features

  • CL_SCAN_MAIL_PARTIAL_MESSAGE Scan RFC1341 messages split over many emails. You will need to periodically clean up $TemporaryDirectory/clamav-partial directory.

dev - Options designed for use by ClamAV developers

  • CL_SCAN_DEV_COLLECT_SHA Enables hash output in sha-collect builds - for internal use only
  • CL_SCAN_DEV_COLLECT_PERFORMANCE_INFO Collect performance timings

All functions return CL_CLEAN when the file seems clean, CL_VIRUS when a virus is detected and another value on failure.

        ...
        const char *virname;

    if((ret = cl_scanfile("/tmp/test.exe", &virname, NULL, engine,
    &options)) == CL_VIRUS) {
        printf("Virus detected: %s\n", virname);
    } else {
        printf("No virus detected.\n");
        if(ret != CL_CLEAN)
            printf("Error: %s\n", cl_strerror(ret));
    }

Memory

Because the engine structure occupies a few megabytes of system memory, you should release it with cl_engine_free() if you no longer need to scan files.

Forking daemons

If you’re using libclamav with a forking daemon you should call srand() inside a forked child before making any calls to the libclamav functions. This will avoid possible collisions with temporary filenames created by other processes of the daemon. This procedure is not required for multi-threaded daemons.

clamav-config

Use clamav-config to check compilation information for libclamav.

    $ clamav-config --libs
    -L/usr/local/lib -lz -lbz2 -lgmp -lpthread

    $ clamav-config --cflags
    -I/usr/local/include -g -O2

Example

You will find an example scanner application in the clamav source under ./examples.

In ClamaV 0.104+, you can build the example programs alongside ClamAV by configuring with -D ENABLE_EXAMPLES=ON.

Or, if you have ClamAV already installed, execute the following to compile it:

gcc -Wall ex1.c -o ex1 -lclamav

CVD format

CVD (ClamAV Virus Database) is a digitally signed tarball containing one or more databases. The header is a 512-bytes long string with colon separated fields:

    ClamAV-VDB:build time:version:number of signatures:functionality
    level required:MD5 checksum:digital signature:builder name:build time (sec)

sigtool --info displays detailed information on CVD files:

    $ sigtool -i daily.cvd
    File: daily.cvd
    Build time: 10 Mar 2008 10:45 +0000
    Version: 6191
    Signatures: 59084
    Functionality level: 26
    Builder: ccordes
    MD5: 6e6e29dae36b4b7315932c921e568330
    Digital signature: zz9irc9irupR3z7yX6J+OR6XdFPUat4HIM9ERn3kAcOWpcMFxq
    Fs4toG5WJsHda0Jj92IUusZ7wAgYjpai1Nr+jFfXHsJxv0dBkS5/XWMntj0T1ctNgqmiF
    +RLU6V0VeTl4Oej3Aya0cVpd9K4XXevEO2eTTvzWNCAq0ZzWNdjc
    Verification OK.