Diploma Thesis University of Applied Sciences Furtwangen, Germany Faculty of Computer Science - Computer Networking

(1)

Diploma Thesis

University of Applied Sciences Furtwangen, Germany Faculty of Computer Science - Computer Networking

Server-based

Virus-protection On Unix/Linux

by Rainer Link

<mail@rainer-link.de>

Advisor: Prof. Hannelore Frank Advisor: Prof. Dr. Rainer Mueller Finished: May, 28 2003

Public Release: August, 2003

(2)

(3)

Preface

Abstract

Evaluation and development of server-based anti-virus solutions, running on Linux/Unix, using the Internet Content Adaption Protocol (ICAP). The diploma thesis covers proof-of-concept solutions for web proxy (Squid), eMail server (sendmail/postfix) and file server (Samba), with focus on the latter one aiming to provide a (fully-featured) product.

Motivation

On 07/21/1999, I sent the first patch to the maintainer of the AMaViS project (A Mail Virus Scanner, http://www.amavis.org/, GPL’ed ¹ ) fixing the An- tiViral Toolkit Pro/Linux call. Since then - among other stuff - I wrote and maintained several anti-virus modules (and still do). So, with the help of other people, AMaViS supports a wide range of anti-virus products. But wouldn’t it be easier to maintain only one anti-virus module, implementing a common protocol, to support all those anti-virus scanners?

Also, back in 1999, I was looking for an on-access virus scanning solution for Samba fileservers ² , receiving a first Linux kernel-based solution via email in June ’99. More than a year later, I came across the Samba Virtual File System (VFS) ³ . A half year later, I digged into the Samba VFS and started to work on a small piece of code which eventually became the samba-vscan project: on- access file scanning directly integrated into Samba (GPL’ed, too).

As nearly all the code I wrote past years was put under an Open Source License, I decided to release this thesis under the terms of the GNU Free Documentation License.

1

GNU General Public License, see http://www.gnu.org/copyleft/gpl.html

2

see e.g. http://www.geocrawler.com/archives/3/281/1999/4/0/1652065/

3

see e.g. http://sourceforge.net/mailarchive/forum.php?thread id=219140&forum id=4829

(4)

Overview of the Thesis

Chapter 1 gives an overview of computer-viruses and some other types of malware. As well as anti-virus technologies and anti-virus deployment.

Chapter 2 explains possible means to integrate third party anti-virus scanners into scripts and programs.

Chapter 3 discusses the Internet Content Adaption Protocol (ICAP) with the focus to use this protocol for an anti-virus service. The developed ”icap- client” utility for scanning any file on disk using an ICAP anti-virus facility will be dissected, too. The results of some performance testings will be discussed as well.

Chapter 4 explains briefly the use of AMaViS for protecting the mail server and the ICAP integration.

Chapter 5 shows two possible concepts for on-access, real-time scanning of Samba shares; focused on the direct Samba integration as implemented by the samba-vscan project. Results of file retrieval tests illustrates impacts on performance.

Chapter 6 discusses concepts for protecting HTTP/FTP transfers.

Chapter 7 summerizes the results and gives a short future outlook.

Credits

First of all, I’d like to thank my advisors Prof. Hannelore Frank and Prof. Dr.

Rainer Mueller for their support, feedback and suggestions.

A professional thank you goes to the following persons and/or companies:

• SuSE Linux AG for funding this diploma thesis and my AMaViS work for three years.

• Travis Priest, Rui Ataide (Symantec USA) and Gerald Maronde (Syman- tec Germany) for providing me with the latest Symantec AntiVirus Engine product before it was public available and for various ICAP/Symantec AntiVirus Scan Engine related discussions.

• Martin Stecher (WebWasher AG) for some email exchange about ICAP and WebWasher; Oxana Herzog and Elka Plattmann for sending a special trial evaluation key for the WebWasher CSM suite.

• Christian Hofmann of DATSEC for offering the latest Kaspersky An-

tiVirus for File servers and a one year license key.

(5)

iii

Feedback et al

Please send feedback, corrections, suggestions or even flames to

Rainer Link <mail@rainer-link.de>

I plan to maintain this thesis and release updated versions once in a while.

History

1.00 - final version (May, 28 2003); non-public

1.01 - changed title page, added history, added Appendix A (GNU FDL).

released to public

1.01a - corrected the vfs options setting, thanks for Stefan Metzmacher for the report (August, 9 2003)

License

This document is licensed under the terms of the GNU Free Documentation License (see http://www.fsf.org/licenses/fdl.html).

Copyright (c) 2002-2003 Rainer Link, OpenAntiVirus.org.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the

Free Software Foundation; with the Invariant sections "History"

and "Credits; no Front-Cover Texts, and the Back-Cover Text

"Diploma Thesis by Rainer Link. Published by OpenAntiVirus.org".

A copy of the license is included in the section entitled "GNU

Free Documentation License".

(6)

(7)

Introduction

This chapter gives an overview about computer-viruses and anti-virus tech- niques. If the reader is interested in a particular topic, the given reference(s) are worth a reading ¹ .

1.1 Computer Viruses and Malware

The term ”computer virus” was first applied to self-reproducing computer pro- grams by Len Adelman back in 1983. One year later, Fred Cohen ”scientifically defined the term computer virus” ([EK2001, p. 6]):

”We define a computer ’virus’ as a program that can ’infect’ other programs by modifying them to include a possibly evolved copy of itself. With the infection property, a virus can spread throughout a computer system or network using the authorisations of every user using it to infect their programs. Every program that gets infected may also act as a virus and thus the infection grows” ([FC1984]).

So, in short a virus is a program which is able to replicate with little or no user intervention, and the replicated program(s) are able to replicate further.

Like its biological counterpart it needs a host. In general, a computer virus is platform-dependent, i.e. a virus written for MS-DOS will not run under Linux/Unix (but it may rung of course under DOS emulators like DosEmu or

”Virtual PC systems” like VMWare). One exception are macro viruses (see below) or Java viruses (like BeanHive, see [CR1999, pp. 9]); and some few examples of computer viruses written for Windows and Linux, like Lindose aka Winux ([JK2001, p. 150]). The first virus for Apple II was the Elk cloner virus back in 1981. In 1986, the first virus for the IBM PCs and compatibles appeared: the Brain (aka Pakistani or Ashar) virus ([EK2001, p. 6], [AM2000]).

[EK2001], [FP2000] or [AM2000] cover history in depth.

1

If possible, more than one reference has been mentioned. A bibliography reference without

a page reference covers a topic solely and/or in depth.

(10)

1.1.1 Introduction & Definition

Generally speaking, a computer virus consists of three parts ([MR1995]):

• the infection mechanism,

• the trigger,

• the payload.

As mentioned above, a computer virus must at least have the infection mechanism part.

1.1.1.1 The Infection Mechanism

As the name already implies the infection mechanism ([CS1995, p. 10]) searches for one or more suitable victims and checks to avoid multiple infections if the host is already infected or not (not every virus does this; some viruses infect a host multiple times due to bugs). After that, simply speaking, the virus body is copied into the victim. The easiest method to do so is (by) overwriting the code of the victim. Other methods are putting the code in front of or at the end of a file.

1.1.1.2 The Trigger

A trigger ([CS1995, p. 10]) is used for starting the possible payload, i.e. on a particular event, the payload is executed. Such an event could be a special day (Friday, 13th) or when the infection counter has reached a pre-defined value.

1.1.1.3 The Payload

Figure 1.1: Payload of Ambulance Car virus, taken from [FSC2003]

A possible payload ([CS1995, p. 8]) causes transient or permanent damage,

e.g. displaying an animation on the screen (e.g. a red cross car moves along

the screen, see figure 1.1); or formatting the hard disk drive or manipulation of

data.

(11)

1.1 Computer Viruses and Malware 3

Of course, damage may even happen unintentionally, e.g. due to a pro- gramming error or if an old DOS virus causes trouble within the Windows environment (see e.g. [MO1997]). Damage may be caused by over-reaction by the user ([BSI1994, p. 1-8]), too.

1.1.2 Classification of Computer Viruses

The classification of computer viruses ([MR1995]) can be done via several ways:

• type of host victim,

• type of infection technique,

• special virus features.

1.1.2.1 Type of Host Victim

As of type of host victim ([MR1995]) we can distinguish between

• boot (DBR) sector and master boot record (MBR) virus,

• file virus,

• companion virus,

• multipartite virus.

Figure 1.2: Overwriting / Appending virus

(12)

A boot virus infects the boot sector of a floppy disc and/or master boot record or boot sector of a hard disc. Such a virus can infect the com- puter system, when the computer is booted from an infected floppy disk.

As the code in the MBR/DBR is started by the BIOS after it does the POST (Power On Self Test), the virus gets activated even before the Op- eration System has been started and most likely ”hooks” some particular Interrupts (e.g. BIOS INT13h or DOS INT 21h) for performing its tasks.

Most boot sector viruses are memory-resident, so they can easily infect every non-write protected floppy when it is accessed. Most viruses of this type save a copy of the original boot sector/master boot record in an un- used sector of the disk. A boot virus may be ”placed” into the computer system by a so-called ”dropper”, i.e. a program which simply drops the boot virus.

A file virus infects (executable) files, either by overwriting the file (overwrit- ing virus) or by appending the virus code at the beginning or end of the file (appending virus). An overwriting virus destroys the original file upon infection. Most appending viruses put their virus code at the end of the file and put a jump to the virus code at the beginning of the file, so that the virus code is started first upon execution.

A companion virus looks for programs with the extension .BAT or .EXE and then creates a .COM file with the same name (i.e. TETRIS.COM, if a program TETRIS.EXE exists). If only the program name is entered (here: TETRIS), DOS per default looks up first for a matching .COM, .EXE and then .BAT file. So, TETRIS.COM will be started (instead of TETRIS.EXE, which was originally the intention of the user). Therefore, the companion virus is started first and can then start TETRIS.EXE ([RS1996, p. 24-25]).

A multipartite or hybrid virus uses more than one infection technique, e.g.

a combination of a boot sector and file virus and therefore infects DBR / MBR and files. Or viruses which infect Office files via Visual Basic for Applications (VBA) and Visual Basic Script (VBS) files (see section macro viruses, p. 6); or viruses which infect Win32 files and office files, like Win32/W97M.Beast ([PS1999, pp. 6]).

The basic infection technique of file viruses for Windows systems are some- what similar to DOS viruses (see above), but (much) more complicated as the file format is more complex, too ([PS1998], [PS2000]). This applies basically to Linux viruses ([JK2000], [MVO2000]). Even viruses for both platforms are possible ([JK2001]).

1.1.2.2 Type of (Infection) Technique

Basically, the technique ([MR1995]) can be distinguished between

• direct action (non-memory resident),

(13)

1.1 Computer Viruses and Malware 5

• memory resident.

A direct action virus ([MR1995]) does not stay in memory, so it’s only ac- tive when an infected program has been started and only by this event it can replicate . A direct action virus is not very complex and can therefore be very small (the Trivial-31 virus is only 31 bytes ”big”) . In most cases, a direct action virus does not spread as fast as a memory resident virus.

A memory resident virus ([MR1995]) installs itself into RAM and may be active as long as the computer is running. This can be achieved via several ways, depending on the operation system: DOS provides a mechanism called ”terminate-and-stay-resident” (TSR), for Windows as a ”virtual device driver” (VxD), for Windows NT as an NT-service, for Linux as a loadable kernel module. Only a memory resident virus may use some

”modern” virus techniques like stealth capabilities. For the memory res- ident virus, one can differentiate between a fast infector ([RS1996, p.

174]) and slow infector ([CS1995, p. 19], [MR1995]). Both got their name due to the speed they spread. The first one infects every program which is being accessed (read/write) or even all files being listed in a di- rectory listning (e.g. when the ”dir” command is being executed). The latter one, in contrast, infects only a file, when it’s being written (e.g.

during compilation of a new program or some older programs stored their configuration settings directly into the executable file). Therefore, a slow infector may bypass file integrity checkers.

1.1.2.3 Special Virus Features

The following special virus features ([MR1995]) will be explained briefly:

• stealth technique,

• retro capabilities,

• polymorphism.

Some special virus features can only be used by memory-resident viruses.

A stealth virus ([RS1996, p. 173], [MR1995], [CS1995, p. 18]) tries to hide itself by hooking several interrupts like BIOS Int 13h or DOS Int 21h.

Assumed, an anti-virus program reads the MBR via BIOS Int 13h to scan

for viruses, the virus can intercept this and ”redirect” the read call to

the saved copy of the original, uninfected MBR. Therefore, the anti-virus

program won’t find any virus. Or, if a virus scanner scans a file, this file

must be opened first. The open call, ”redefined” by the virus, will first

remove the virus from the file and then call the original open call. After

the scanning of the file has been finished, the file will be closed by the

virus scanner. And the modified close call will infect the file again.

(14)

A retro virus ([CN1999, p. 12]) avoids to infect particular file names, like

”scan.exe” or ”f-prot.exe” (as in most cases those files belong to McAfee VirusScan or F-Prot AntiVirus respectively) as most anti-virus software checks their integrity upon start. This mechanism can be used by non- memory resident viruses, too, of course. A resident virus may even inter- cept the execution of ”scan.exe” and display a ”faked” error message like

”not enough memory”.

A polymorphic virus ([CS1995, pp. 20], [MR1995], [RS1996, pp. 174], [CN1996]) is being ”encrypted” and changes infection its shape and struc- ture of the (de|en)cryption routine by each infection but the basic func- tionality is always the same. Here’s a very easy example to simply get the basic idea how it works: a CPU has a set of registers, e.g. the accumulator register AX. For example, this register should be set to zero. This can be done by setting the register to zero, i.e. MOV AX, 0. Or by subtracting the current value of the AX register with itself, i.e. SUB AX, AX. Or by the exclusive-or operation, i.e. XOR AX, AX. In short, the effect is just the same, but each operation will result in a different opcode. This technique is also known as ”mutation”.

1.1.3 Macro & Script Viruses 1.1.3.1 Macro viruses

Probably the best definition of a macro virus has been given by Vesselin Bontchev ([VB1997, p. 178]): ”A macro virus is a set of one or more macros which set is capable of replication itself recursively”.

It’s believed, the first macro virus, written by an US-security specialist, was ”WM/DMV” and ”XM/DMV” - the so called ”demo macro virus” for Microsoft Word (WM = Word Macro) and Excel (XM = Excel Macro), back in 1994. The first big ”impact” was caused by WM/Concept (1995), because it was the first macro virus found ”in-the-wild”([MH1998, p. 289], [IM1999a, p. 13]), i.e. it was reported by end users - even Microsoft shipped it on CD to customers ([VB1996, p. 97]). We will focus on Word macro viruses here. For doing automatic ”tasks”, Microsoft invented a macro language called WordBasic (Word 2.0-Word97), and later Visual Basic for Applications . Those macros are stored within the ”document” itself, and not as e.g. by Ami Pro ² in a separate file. So, if you get a document, you’ll receive the macros in it, too. Strictly spoken, only template files (.DOT files) can contain macros (Word 2-Word 6).

Actually, the virus has to convert a normal word document (.DOC file) first into a template, infect it, and rename the file from .DOT to .DOC, so the user thinks it’s just a normal word document file. The DOC → DOT conversion is actually done by Word itself, the macro virus must only set a specific flag. When the user opens such an infected ”document”, the virus gets activated (mostly) automatically due to several so-called Auto macros (e.g. AutoOpen, AutoExec and AutoClose). The next step is to infect the global template file, in most

2

a text processing software

(15)

1.1 Computer Viruses and Malware 7

cases this is the NORMAL.DOT file. As NORMAL.DOT is launched every time Word is started, the macro virus is active every time and is able to infect every word document. But the mentioned Auto macros are not the only way a macro virus can rely on. Other possibilities are shortcuts (like ALT+S), forms or buttons. Many variants of macro viruses have been created automatically:

when a word file is being saved, functions from the OLE32.DLL (OLE = Object Linking and Embedding, DLL = Dynamic Link Library) are being used. But as some versions of this DLL are buggy, a file may get slightly damaged when it is saved. Therefore, the macro code might by changed slightly, but the macro virus itself may remain intact and is able to spread. As the macro code has changed, a new variant has been created ([VB1997, pp. 177], [SK1999]).

By Office97, Microsoft introduced a new (macro) language: Visual Basic for Applications ([DAJ1997], [SK1999]). Now, even regular documents can contain macros, so conversion from template to document is no longer necessary.

Moreover, Word97 provides an upconversion feature for documents created with older Word releases ([DAJ1997], [VB1997, pp. 188], [VB1998]). So, in most cases macros are converted automatically, too. The commands are not really converted from WordBasic to VBA, but simply speaking just ”WordBasic” is placed in front of every instruction. With Service Release 1 (SR1) for Office97, Microsoft tried to make it harder for macro viruses to spread: macros can no longer be copied from the global template into the word document ([VB1998, pp. 157], [JK1998, pp. 144]). But this very weak kind of protection can be bypassed by some tricks, e.g. export the VBA code into a file and re-import this file later into the word document ([KT1999, p. 302]). Upconversion is also done in Office 2000 ([RJZ1999, pp. 223]).

Macro viruses can ”snatch” ([VB1996, p. 115], [VB1997, pp. 180]) existing macros, i.e. a macro is replaced by a macro with the same name, which is already present in the global template (e.g. user macros, macros from another macro virus, which has infected the template already before). As an example, many macro viruses have snatched macros from the mostly useless Macro pro- tection tool ScanProt from Microsoft ([VB1997, p. 183]). By macro snatching, a macro virus can mutate to a new variant or a new macro virus. Multiple infections of the normal template can create problems for anti-virus tools, e.g.

cleaning it may create a new macro virus if the anti-virus tool thinks it is only infected by one macro virus (see [VB1997, pp. 180] for details). Multiple in- fections by so called Word Class Infectors ³ are sometimes called ”sandwiches”

([KT1999, p. 304]).

Some ”advanced features”, already known by (DOS) viruses, have been ”re- invented” by macro viruses, too. A stealth macro virus can display the error message ”not enough memory” or a faked dialog box, when the user calls the menu item ”Tools/Macro” (this could be used to see which macros exist).

The start of the VBA Editor could also be blocked or the for- and background colour is set to white, so that no lines of code can be viewed ([VB1996, pp. 115], [JK1999], [IM1999b, pp. 19]). A polymorphic macro virus can change the

3

”Class infectors are Office97 macro viruses that consist of a single module - the class

module, which is always named the same (usually ThisDocument).” ([KT1999, p. 301])

(16)

code by every infection, i.e. adding a random comment to the code, change some instructions and so on. The polymorphic capabilities are limited, because Word Basic/VBA is rather slow ([VB1996, pp. 112], [VB1997, pp. 192], [ANMK1999, pp. 14], [VBKT2002]). A macro virus is anti heuristic if the virus tries to make detection more difficult, e.g. by hiding the macro code in (document) variables, autotexts or encrypted strings ([VB1996, p. 110]).

1.1.3.2 Scripting Viruses

Viruses written in VBScript (Visual Basic Script) ([VBM1998, pp. 13]) / Jscript (Windows) or in a shell script language (like bash in Linux/Unix) are called script(ing) viruses. ”Microsoft VBScript (VBS) is a subset of the Microsoft Visual Basic programming language [..] VBS files can be embed- ded into HTML documents to make them more interactive. [..]” ([MVO1999, pp. 227]). VBS is supported in MS Outlook, Microsoft Internet Explorer or via the Windows Scripting Host. It’s even possible to write a virus, which works with VBA and VBS and can infect Office files and vice versa ([KT1999, pp. 311]). Some other applications, like Corel Draw, offer their own scripting language, e.g. CorelScript, which are used by viruses, too ([NF1999, pp. 7]).

Script viruses for Linux/Unix can be written e.g. in a shell script language (like bash) or Perl, Phython and so on ([SB1996, p. xxviii]). Some advanced tech- niques, known from DOS or macro viruses, are possible, too, e.g. polymorphism ([VBKT2002]). Figure 1.3 shows a screenshot of a VBS virus construction kit (construction kits are available for other types of viruses, too).

Figure 1.3: Screenshot of VBS virus/worm construction kit, taken from [FSC2003]

1.1.4 Worms

The basic difference between a computer virus and a worm is: a worm does

not need a host ([RS1996, p. 4]). ”The computer Worm is a program

that is designed to copy itself from one computer to another, leveraging some

network medium: email, TCP/IP, etc.” ([CN1999, p. 1]). Worms may use

(17)

1.2 Anti-Virus Technologies 9

bugs, i.e. buffer overflows ([ECPZ2002]), in some server software to ”infect”

other machines via the Internet, e.g. MS SQL Server, MS Internet Information Server ([ECPZ2002, pp. 86]) or in FTP daemons or the line printer daemon on Linux/Unix systems ([JK2001, pp. 152]).

According to [CN1999, p. 2-3] Computer Worms can be classified based on the transport and launch mechanism. The transport mechanism are eMail, i.e. using MS Outlook to send itself as an email attachment or the worm may have an SMTP module implemented to create and send mails on its own. A worm may use arbitrary protocols like IRC (Internet Relay Chat), TCP/IP or peer-to-peer networks. If a worm does not ”require user interaction in order to gain control of a system” ([CN1999, p. 3]), it’s called ”self-launching worm”.

An ”user-launched” worm must be started by an user, e.g. double-clicking on an infected email attachment. If a worm uses both mechanism, it’s called

”hybrid-launch worm”. Another classification approach was given by [TM2002, p. 237-242]: he classifies worms by compiler and by method of replication. Most worms are compiled with Visual Basic, C and Delphi. Method of replication includes SMTP (worm creates mail on its own), MAPI (Message API), Outlook or network (e.g. via network shares).

Of course, a worm may implement advanced features, as discussed above, like retro capabilities or polymorphism ⁴ .

1.2 Anti-Virus Technologies

1.2.1 Scanner

Virus scanners are the far most used method to detect (and clean) a virus. It may either work on-demand (i.e. the user has to start the virus scanner) or on-access, which means the program runs in the background and scans a file while it’s being accessed.

Modern virus scanners must be able to parse various file formats, e.g. dif- ferent types of executables (i.e. DOS EXE-File, Windows NE/PE- Files), Of- fice files or MIME-encoded files. And of course, self-extractor formats like PkExe or the nowadays very common UPX and archive formats like RAR or ZIP ([KC2002, page 3] [AM2002]). Therefore, an exact file type recognition is needed. Some file types, e.g. images are not susceptible of viruses, so they do not need to be scanned. Or depending on the file type, only certain areas are scanned first; which is faster than doing always a dumb, full scan from the beginning to the end of a file ([IM2000, p. 150]).

Several virus detection methods are possible, and which are used may de- pend of the type of the virus and/or file type (see above):

Pattern matching: for each known virus, a particular sequence of code is

”extracted”, mostly called pattern, signature or search string, and stored in a virus-definition file (some kind of database). Therefore, the virus scanner ”is looking for an exact match which will identify the code as

4

Polymorphism refers to the generated mails here also, e.g. random subject lines.

(18)

a virus” ([KC2002, page 5]). To detect variants or minor modifications of a virus, a search string may contain wildcards. Not only the search string is stored, but also information which file types can by infected by this particular virus and at which byte position/offset the search string may occur. This is used to speed up the virus scan process ([IM2000, p. 150]) and to avoid false positives ⁵ . Moreover, the virus definition file may not only contain the virus signatures, but also some machine code or some pseudo-code for performing various scanning tasks ([IM2000, p.

146]. For identifying a virus, more than one signature could be used; once again, to reduce the likeliness of a false positive ([FF2001, p. 408]). An exact identification is also important for cleaning a virus, otherwise it may happen that the cleaning process removes not only the virus part(s). The basic advantage of pattern matching is that the virus can be named (e.g.

”file infected with XYZ.A virus”), whereas heuristics (see below) may only report ”file looks suspicious” back to the user. The basic disadvantage obviously is that only known viruses can be detected, i.e. the signature has been added to the virus definition file ([FF2001, p. 409]).

Heuristics: heuristics is used to ”detect” new viruses. Simply speaking, by the heuristic approach a program is ”analysed” for instructions (or set of instructions) which are known as typical for viruses. Each of such suspi- cious instruction is given a special weight, which is summed up. If the sum exceeds a particular threshold, the file is regarded as suspect of infec- tion. Another approach is a rule-based system, which ”simply compares found functionality with a set of rules. If a predefined rule is found within the code, the rule-based system returns with a positive result. Depend- ing on the exactness of the complete system, results like generic virus or e.g. VBS/Loveletter variant are realizable” ([MS2002]). ”There are two different ways of applying heuristic rules: static and dynamic. The static method checks the presence of suspicious code fragments (whether they are executed or not). The dynamic method emulates the program and checks which actions are really performed (that is simulation of a virus execution in a virtual environment, frequently called a sandbox or an em- ulator buffer)” ([IM2000, p. 146], [FF2001, p. 416-417]). Those methods are sometimes also referred as ”passive” and ”active” approach ([RS2002, p. 109]). Both can be combined, too ([FF2001, p. 417]). See also code emulation, below.

Code emulation: code emulation was originally developed to detect polymor- phic viruses ([FF2001, p. 419], [KC2002, p. 6]). So, if a program is being scanned by the anti-virus program, this program is being executed in a virtual environment (aka ”sandbox”). Therefore, ”when a scanner loads a file infected by a polymorphic virus into this virtual computer, the virus decryption routine executes and decrypts the encrypted virus body. This exposes the virus body to the scanner, which can then search for signa-

5

false positive = a file is reported as infected, although it is clean. False negative = a file

is reported as virus free, although it is not

(19)

1.3 Anti-Virus Strategy 11

tures in the virus body that precisely identify the virus strain” ([CN1996, p. 5]). As mentioned above, code emulation is also used together with heuristics. Of course, code emulation is slow ([FF2001, p. 420]). So, code emulation should only be used when really needed ([IM2000, p. 150-151]).

1.2.2 Integrity Checker

An integrity checker basically generates a checksum for files, sectors (i.e boot sector) and the macros, stored in e.g. an Office document (it would make no sense to create a CRC for the whole Office file, obviously). The checksums are being stored in a kind of database and later being compared. If a checksum does not match, a file has been modified (which could be caused due to a virus infection) . Obviously, when generating the checksum, it must be assured the file is clean ([FF2001, p. 411-412], [AN1999]).

1.2.3 Behaviour Blocker

A behaviour blocker runs in the background and monitors the execution of the currently running program(s) on the computer. If a program tries to do a suspicious action (e.g. open a file and appending code or formatting hard disc), this will be intercepted. The behaviour blocker may then terminated this program or ask the user which action should be taken (e.g. allow, do not allow, move program into quarantine) ([CN2002], [FF2001, p. 410-411]). But for most users this decision is a ”tough choice” and behaviour blocking may generate a high level of false positives ([FF2001, p. 411]), although some techniques are possible to reduce the likelihood of false positives ([LL2001]).

1.3 Anti-Virus Strategy

Nowadays, a basic anti-virus strategy is a 3-tier approach:

• tier 1: the desktop,

• tier 2: file & print, email or web server(s),

• tier 3: the internet gateway, like mail gateways or web proxy servers.

A virus should be stopped as early as possible, before it can enter the net- work (i.e. tier 3). According to [ICSA2002, p. 24-25] since 2000 more than 80% of the virus incidents have been caused by infected email attachments, whereas diskettes as source of infection are next to nothing. Of course, en- crypted emails / attachments can not be checked at this level (this has to be done by an on-access scanner running on the desktop). Virus scanning requires lots of resources, so this task should probably be off-loaded onto another ma- chine. Anti-virus software on the gateway must take precautions to not suffer from a Denial-of-Service (DoS) attack by special crafted mails and/or mail at- tachments (e.g. the (in)famous 42.zip attack ⁶ ).

6

see http://www.corpit.ru/pipermail/avcheck/2001q3/000110.html for a discussion on

the avcheck and amavis-user mailing list

(20)

As files, esp. documents are shared via file server(s), those are a vector for distributing infected documents (some viruses/worms use network shares to propagate itself). Therefore, on-access scanning of file servers is the next line of defense (tier 2).

The last resort is the desktop, e.g. for scanning an encrypted file when it’s being decrypted.

A full anti-virus strategy is beyond the scope of this paper. Of course, a good backup/restore concept and user education plays an important role in such a concept. Only with a backup, erased data (caused by accident, hardware failure or the payload of a virus) can be restored. User education should help to minimize the number of people who double-click on any email attachment.

To reduce the risk of virus infection (and propagation), you may choose ”safe”

file formats (e.g. plain ascii text or PDF ⁷ instead of Word), ”safe” applications (e.g. PegasusMail instead of Outlook or OpenOffice instead of MS Office) or Linux/FreeBSD instead of Windows. But, of course, changing software/OS may not be an easy task.

7

PostScript/PDF viruses are possible and some ”proof-of-concept” viruses exist. So, the

PS/PDF reader must not execute malicious instructions.

(21)

Chapter 2

Server-based Virus Protection on Unix/Linux

This chapter outlines the requirements, possible means to integrate 3rd party virus scanners and whether those requirements are full-filled or not.

2.1 Requirements & Aims

As the title already implies, the thesis is focused on server-based anti-virus solutions running on Unix/Linux servers, i.e. protection for Internet gateways (tier 3) and file servers (tier 2) to mainly protect Windows clients within the internal network. So, we will discuss Open Source solutions for Linux/Unix serves acting as

• Mail servers, running sendmail or postfix (chapter 4, p. 49). This topic will be covered shortly, as many solutions exist (e.g. AMaViS, qmail- scanner, exiscan, MailScanner) and some of them for many years (the AMaViS project was started back in 1997).

• File and print servers, running Samba (chapter 5, p. 55). Not many solu- tions are available until now; some features and implementation details of samba-vscan will be discussed, which up-to-now is the only Open Source solution supporting several anti-virus products.

• Proxy server running Squid (chapter 6, p. 71) . Again, OSS solutions are rare. Three concepts by example will be presented shortly.

As those OSS solutions are not anti-virus products per se, but acting as

”clue code” between the service and one (or more) virus scanners, how the integration can be done will be discussed first. The following requirements should be fulfilled:

• easy integration, e.g. into shell or Perl scripts

• open, non-proprietary protocol, to be independent from anti-virus ven-

dor(s)

(22)

• low implementation and maintenance efforts, to reduce costs

• load balancing / load separation, to not suffer from resources loss which slows down the main service (e.g. mail server)

• high performance, to reduce latency caused by the virus scan process in whole

2.2 Integration of Anti-Virus Products

Several means to integrate an anti-virus product into any 3rd party application exist. Each has it’s pros and cons, some can be implemented very easily, others are more time consuming.

2.2.1 Command line Scanner

A command line scanner can be used on demand to scan a specific directory or the whole disk. Or via a cron job, calling a specific script, for a scheduled scan, e.g. every day at 12pm.

Depending on the return value (or exit status) of the called program it’s possible to determine whether an infection was found or not. As an example, the list of return codes (shortened) for H+BEDV AntiVir/Linux (as of version 2.07):

0: Normal program termination, nothing found, no error 1: Found infected file or boot sector

2: A signature was found in memory

Calling a virus scanner is possible from any (shell) script, e.g.

#! /bin/sh

/usr/sbin/antvir /path/to/check ret=$?

if [ $ret -eq 0 ] ; then echo "No virus found"

elif [ $ret -eq 1 ] ; then echo "Virus found"

else

echo "An error occurred"

Checking only for the return code has one major drawback - you can’t get the virus name(s). Of course, grepping the output is possible, like in the Perl exam- ple below, taken from the hbedv module ¹ of the AMaViS project ² (simplified).

[..]

chop($output = ‘$antivir -allfiles -noboot -s -z $TEMPDIR/parts‘);

1

full source at http://cvsweb.amavis.org/amavis/amavis/av/hbedv

2

A Mail Virus Scanner, http://www.amavis.org/ – Perl script for virus scanning at the

email gateway level.

(23)

2.2 Integration of Anti-Virus Products 15

$errval = retcode($?);

do_log(2,$output);

if ($errval == 0) { # no errors, no viruses found

$scanner_errors = 0;

} elsif ($errval == 1) { # no errors, viruses discovered

$scanner_errors = 0;

if ( $output =~ /ALERT:/ ) {

@virusname = ($output =~ /ALERT: \[(\S+)\s.*?\]/g);

} } [..]

The basic format is ”ALERT: [<name> <type>] some text”, whereas <name>

is the name of the virus and <type> e.g. virus or dialer. But those types are subject to change ³ .

So, are the requirements fulfilled?

✔ easy integration (partly), at least if used in scripts. Calling a command line scanner using exec(2) family in a C program may not always be possible (e.g. in the samba-vscan case).

✔ the implementation and maintenance efforts are in general low. Only in- formation about the command line switches, the return values and the output being matched is needed. From the authors experience as the unofficial AMaViS av-subsystem maintainer: once in a while, it may hap- pen anti-virus vendors change return values or the output without prior notice.

✘ no open ”protocol”, each anti-virus product has it’s own set of return values and output. Switching vendor means changing existing scripts.

✘ load balancing / separation is not possible. The virus scanning task has to be done on the same machine as the service is running and could not offloaded to another machine.

✘ performance is low. The program start needs time, i.e. creating the pro- gram executing environment, self-check, loading the anti-virus database.

2.2.2 Application Programming Interface

An application programming interface (API) allows a 3rd party developer to integrate a virus scanning facility in his program(s). Many anti-virus products offer such an API, like the one(s) from Sophos, Trend Micro, H+BEDV or Net- work Associates. But for most (nearly all) products, details about the API is not freely available, i.e. only after signing a Non-Disclosure-Agreement (NDA).

To my knowledge, the only vendor which provides a complete documentation of its API is Sophos Plc, which is called SAVI (Sophos AntiVirus Interface)

3

based on an email from John Ogness of H+BEDV Datentechnik GmbH, Germany.

(24)

([SAVI, page 13]): ”SAVI consists of a set of interfaces and enumerators that provide access to various objects which are used internally by SAVI. [..]

The interfaces are retrieved:

• By querying the class factory, or

• By allowing COM to supply them automatically (only when using C++

syntax), or

• From the SAVI interface itself.

The interfaces can be used with C++ syntax or C syntax.”

SAVI provides various functions, e.g. Initialise for initialising an SAVI object, SweepFile to scan a single file for viruses or DesinfectFile for attempt to disinfect a file ([SAVI, page 63]). For illustration, a sample code on how to initialise SAVI using the C programming language ([SAVI, page 19-20]):

CISavi2* pSAVI;

CISweepClassFactory2*

pFactory; HRESULT hr;

const char* ClientName = "SAVIDemo";

/*

* Load the SAVI DLL and then request a class factory

* interface.

*/

hr=DllGetClassObject((REFIID)&SOPHOS_CLSID_SAVI2, (REFIID)&SOPHOS_IID_CLASSFACTORY2, (void **)&pFactory );

if( hr==SOPHOS_S_OK ) {

/*

* Ask the class factory for a CSAVI2 interface.

*/

hr=pFactory->pVtbl->CreateInstance(pFactory, NULL, &SOPHOS_IID_SAVI2, (void**)&pSAVI );

/*

* Drop the factory immediately, we don t need it

* again in this example.

*/

pFactory->pVtbl->Release(pFactory);

/*

* Did we get the CSAVI2 interface we requested?

*/

if( hr==SOPHOS_S_OK ) {

/*

* Ask SAVI to initialise itself.

(25)

2.2 Integration of Anti-Virus Products 17

*/

hr=pSAVI->pVtbl->InitialiseWithMoniker(pSAVI, ClientName);

/*

* If the initialisation failed, then release the

* SAVI interface and

* set the pointer to NULL.

*/

if( SOPHOS_FAILED(hr) ) {

printf("ERROR: Initialise [%ld].", (long)hr);

pSAVI->pVtbl->Release(pSAVI);

pSAVI = NULL;

} } }

Now, let’s have a look whether the requirements are meet or not.

✔ good performance, as only the library has to be imported once and can then be used. Loading library is faster than starting a program.

✘ easy integration is not possible, e.g. it can not be used in scripts

✘ each anti-virus vendor has it’s proprietary API. So, changing the anti- virus vendor means rewrite from scratch.

✘ high implementation efforts. Needed to get in touch with API, efforts depend on the complexity of the API and on API documentation. If API is stable, changes are rather unlikely so maintenance efforts should be rather small, assumed the API has not been completely redesigned.

✘ load balancing / splitting not possible, as the library can only be called if it’s installed on the same host on which the service being protected is run.

2.2.3 Client-server Communication

Using a command line scanner has once again a major drawback: speed. Even if the OS/filesystem provides a good caching mechanism, the start takes some time. This can be avoided when the program is run as a daemon. ”Daemons are processes that live for a long time. They are often started when the system is bootstrapped and terminate only when the system is shutdown. We say they run in the background, because they don’t have a controlling terminal. Unix sys- tems have numerous daemons that perform day-to-day activities” ([WRS1992, page 415]). So, the anti-virus daemons has to load the virus signature database only once on start-up, and not for each scan.

As an example, using Sophie (a daemon, using SAVI) with AMaViS the

speed-up is about 2.5x compared the command line scanner Sophos Sweep:

(26)

”Based on the average delay times (as logged by postfix for the vscan trans- port ⁴ ), I am experiencing a roughly 2.5x speed-up in mail processing compared to Sophos sweep. This is on a real life, production mail server, not some fancy benchmark, and it’s the first time ever I have seen log entries with ’relay=vscan, delay=0’ :-)”[LH2001].

We refer to the daemon, as it provides a virus scanning service, as a server, which waits for requests from (any) program. Such a program is called client, as it sends a request to the server to scan a file or directory.

Client and server communicate via (BSD) sockets, either Unix Domain sock- ets or TCP sockets. By using Unix Domain sockets, communication is limited to the host, i.e. client and server have to run on the same host.

The used protocol (for communication) could be either a ”proprietary” or standardised one (e.g. the Content Vectoring Protocol or Internet Content Adaption Protocol), which will be discussed in the following to sections.

2.2.3.1 Proprietary Protocols

Currently, most anti-virus products running as daemon use a ”proprietary”

protocol. samba-vscan (chapter 5, page 55), a program for on-access virus scanning with the Samba file server, supports seven anti-virus products, which means five different communication protocols (one pair use the same protocol).

On the one hand, this means some work for each virus scanner, but on the other hand most of those protocols are easy to implement.

The protocol for the OpenAntiVirus Scanner daemon is very simply and straightforward. It waits on port 8127 for a connection and expects e.g. the

”SCAN filename|path” command ⁵ . The response may either ”OK” (file is clean), ”FOUND virus-name” or ”ERROR: error message”. The connection is then closed by the server. For illustration, a simple telnet session to scan the file eicar.com, which contains the EICAR ⁶ Test File ⁷ virus (not a real virus).

$ telnet localhost 8127 Connected to localhost.

Escape character is ’^]’.

SCAN /tmp/eicar.com

FOUND: Eicar-Test-Signature

Connection closed by foreign host.

Some products, ”re-use” existing protocols for their own purpose, e.g. F-Prot Daemon uses the Hyper Text Transfer Protocol (HTTP) 1.0, as specified in RFC1945 [RFC1945]. The F-Prot Daemon binds on port 10200 (up to 10204).

F-Prot Daemon supports only the GET method to send the name of the file to be scanned. The request is therefore more simple than the one specified in RFC1945 [RFC1945, chapter 5]:

4

the vscan transport is the one calling AMaViS

5

European Institute for Computer Antivirus Research, http://www.eicar.org

7

available at http://www.eicar.org/anti_virus_test_file.htm

(27)

2.2 Integration of Anti-Virus Products 19

Request = Request-Line CLRF

The Request-Line is simplified, too:

Request-Line = Method SP Request-URI SP HTTP-Version CLRF Method = GET

Request-URI = abs_path

whereas abs path is an absolute file name here. RFC1945 [RFC1945, section 5.1.2] mentions: ”The Request-URI is transmitted as an encoded string, where some characters may be escaped using the ”% HEX HEX” encoding defined by RFC 1738. The origin server must decode the Request-URI in order to properly interpret the request.” Of course, this applies here, too.

The response send back by the daemon complies to RFC1945 [RFC1945, section 4.1] ”Full Response” definition. The ”entity body” is XML ⁸ output, as demonstrated by the following simple telnet session:

$ telnet localhost 10200 Connected to localhost.

Escape character is ’^]’.

GET /tmp/eicar.com HTTP/1.0 HTTP/1.0 200 Ok

Server: fprotd

Date: Fri, 10 Jan 2003 14:00:52 GMT Content-Type: text/plain

Connection: close

<?xml version="1.0" encoding="ISO-8859-1">

<!DOCTYPE fprot-results PUBLIC "" "">

<fprot-results version="0.0" engine="3.11b">

<arguments>

<arg></arg>

</arguments>

<filename>/tmp/eicar.com</filename>

<detected type="malware">

<name>EICAR_Test_File</name>

<accuracy>8</accuracy>

<disinfectable>yes</disinfectable>

</detected>

<summary code="11">infected</summary>

</fprot-results>

Connection closed by foreign host.

Both protocols as shown as example here have at least one drawback: they accept a file name only (ScannerDaemon accepts pathname, too). So, they must

8

Extensible Markup Language, see http://www.w3.org/XML/

(28)

run on the same host as the client. Therefore, load separation is not possible (i.e. running an Mail Transfer Agent on host X and the virus scanning facility on host Y).

Once again, are the requirements fulfilled?

✔ easy integration, mostly yes, as most protocols are simple and may even used within scripts (e.g. with tools like netcat/nc).

✔ the implementation and maintenance efforts are relatively low, as the protocols are simple or existing code can be re-used (e.g. XML libs for parsing XML output).

✔ good performance, as forking a child is faster than creating a complete program context. Self-check and loading virus database must be done upon startup of the daemon only.

✘ although the protocol is open (e.g. derived from HTTP), they are still proprietary, as each anti-virus program uses its own protocol. Switching vendor means re-write of the client

✘ load balancing / load separation can not be done with very simple proto- cols, which only accept the file name (and not the file contents).

2.2.3.2 Generalized Frameworks

All mentioned techniques so far have two (major) drawbacks.

• Each anti-virus product has it’s own set of return codes, or specific API or communication protocol. This means a lot of work to develop and maintain support for each anti-virus product if you are an 3party appli- cation developer. So, e.g. the AMaViS program ships with more than 25 antivirus-specific modules ⁹ .

• In some environments it may be desired to have service separation on a per host basis, i.e. running email server, proxy server and anti-virus server each on an own host. This is only possible, when/if the complete data of the file to be scanned is transfered via network. Of course, the network bandwidth could be the bottle-neck then, but this could be avoided by using more than one virus-scanning host (i.e. per subnet/LAN segment) and load-balancing.

Probably the most well-know protocol, which isn’t ”flawed” with the issues mentioned above, is the ”Content Vectoring Protocol” (CVP ¹⁰ ) by Checkpoint Software, only for use with their Firewall-1 product. The other one is the Internet Content Adaption Protocol (ICAP), mainly developed by Network Appliances Inc and Akamai Technologies. Of course, both are not limited to

9

see http://cvsweb.amavis.org/amavis/amavis/av/

10

please do not mix with the ”Certificate Validation Protocol”, abbreviated as CVP, too.

See http://www.ietf.org/internet-drafts/draft-ietf-pkix-cvp-01.txt

(29)

2.2 Integration of Anti-Virus Products 21

virus scanning at all; they provide a generalized framework for various kinds of content inspection and modification.

CVP is part of Checkpoint’s OPSEC (Open Platform for Security), http://

www.opsec.com/. As taken from [CVP2002, page 2]: ”CVP (Content Vec- toring Protocol) inspection is an integral component of VPN-1/FireWall-1 s Content Security feature. It enables third-party Content Vectoring Servers to examine all files transferred for various protocols and considerably reduces the vulnerability of protected hosts. CVP configuration (which files to inspect, how to handle the invalid files) is available for all resource definitions. All VPN-1/FireWall-1 auditing tools are available for logging CVP inspection and issuing alerts if necessary.” (see figure 2.1)

Figure 2.1: Connection invoking a Content Vectoring Server [CVP2002, page 2]

Figure 2.1 shows an FTP client, running on host ”Priscilla”, which is con- nected to the FTP server on ”Elvis” via the firewalled gateway ”Graceland”.

We assume, the client tries to retrieve a file from the FTP server, therefore the Firewall-1 Security Server invokes the Content Vectoring Server (on ”Opry”).

So, VPN-1/Firewall-1 sends the file to the CVP server. The latter one performs a virus scan and may optionally send back the auto-cleaned file. Depending on the policy setting, the Security Server allows or disallows the file transfer.

The data flow is illustrated in 2.2 [CVP2002, page 7].

1. Input flow – from the source of the connection to the CVP client 2. Server flow – from the CVP client to the CVP server

3. Client to destination – from the CVP client to the connection destination 4. Server to destination – from the CVP server to the connection destination Destination flow is a combination of the client to destination and server to destination flows

5. Source flow – from the CVP server to the source of the connection

(30)

Figure 2.2: CVP data flows [CVP2002, page 6]

Actually, CVP is a somewhat complex protocol (at least when compared with the ICAP protocol, as discussed below). Moreover, the SDK and documenta- tion is written for developing a CVP server and not for a CVP client. As an example the SDK ships with several example servers in C, an example client is only available as a binary. But for our purpose, a client must be devel- oped, not a server. So, the documentation/SDK is not that valuable. It seems, it’s Checkpoint’s aim to not provide any information on how to write a client [KA2001].

In contrast, the Internet Content Adaption Protocol (ICAP) is freely avail- able, i.e. the technical documentation can be downloaded without prior reg- istration. Moreover, the example server implementation is licensed under the terms of the GNU Public License (GPL) ¹¹ , probably the most prominent and most widely used ¹² Open Source/Free Software license. The ICAP protocol is dissected in the next chapter.

11

http://www.fsf.org/licenses/gpl.html

12

see freshmeat.net license breakdown statistic, http://freshmeat.net/stats/license.

”freshmeat maintains the Web’s largest index of Unix and cross-platform software, themes

and related ’eye-candy’, and Palm OS software”, taken from http://freshmeat.net/about/

(31)

Chapter 3

ICAP

In this chapter, the ICAP protocol will be dissected, i.e. who introduced it?

What were the requirements? How does it work? At the end of this chapter, the example ”icap-client” will be discussed, i.e. it’s usage and implementation.

3.1 Overview

3.1.1 Introduction

ICAP was introduced by the so called ICAP forum back in 1999. The ICAP forum is a coalition of Internet businesses and was co-founded (and still co- chaired) by Network Appliances and Akamai Technologies. The requirements to be full-filled are [ICAP01, page 2]:

• ”Be simple.

• Be scalable.

• Use existing infrastructure.

• Be modular in its service. That is, services must be able to be added and subtracted without affecting the intervening architecture or its per- formance.

• Use existing communication methods and standards.

• Provide resource savings by leveraging edge services.”

In short, ”ICAP in its most basic form is a ’lightweight’ HTTP based remote procedure call protocol. In other words, ICAP allows its clients to pass HTTP based (HTML) messages (Content) to ICAP servers for adaptation. Adaptation refers to performing the particular value added service (content manipulation) for the associated client request/response.” [ICAP01, page 2]

The benefits of ICAP are [ICAP01, page 3]:

• ”ICAP leverages existing equipment available today. In fact, if NetCache

(a proxy appliance) proxies are already installed, then no new equipment

is necessary, with the exception of the ICAP servers.

(32)

Figure 3.1: Request Modification [ICAP01, page 7]

Figure 3.2: Request Satisfication [ICAP01, page 7]

• ICAP is HTTP based, enabling access through security barriers that only allow port 80 traffic. Therefore, no security changes to the existing net- work are likely.

• ICAP is an open protocol and allows any server or application provider to implement it. ICAP is easy to implement since it leverages Apache code. ISPs and enterprises can then choose the appropriate value-added application provider.

• ICAP can also collect client interest information for use in targeting more focused advertising toward these individuals.

• ICAP off-loads these value-added services to ICAP servers, freeing up the resources of the Web servers. This reduces the access times on these sites.

• ICAP simplifies the implementation, reliability, and scalability of value-

added services. ICAP leverages edge device and infrastructure to deliver

edge-based value-added services that require content adaptation.”

(33)

3.1 Overview 25

Services to be implemented by using ICAP are [ICAP01, page 4-6]:

• ”Virus scanning”. Virus scanning can be performed ”on-the-fly”. If used for scanning Web traffic, only new traffic will be scanned. Previously scanned content, ”marked” as virus free, can be cached by any web cache (like Squid), which improves performance.

• ”Markup Language Translation.

• Advertising Insertion

• Human Language Translation.

• Content Filtering

• Data Compression”

3.1.2 Architecture

ICAP basically offers four operations: two for modifications of an HTTP request (header), and two for HTTP response (body).

Figure 3.3: Response Modification [ICAP01, page 8]

Via the Request Modification method the request issued by a client may

be modified by an ICAP server before the request will eventually fulfilled by the

origin server (see figure 3.1). Let’s assume, the client wants to visit the banned-

host.com web site, so the browser sends an HTTP GET request, which will be

redirected to ICAP server by the ICAP ”switch box”. As the site ”banned-

host.com” is in the list of banned URLs, the GET request will be rewritten to

retrieve an error message (stored on the proxy server). As another example, the

ICAP server could ”filter out” sensitive data from the HTTP (GET) request

before it will be sent back to the proxy, which will then send this request to ”the

outside world”. Request Satisfication (see figure 3.2) works quite similar, but

the (probably) modified request is sent directly to the origin server by the ICAP

(34)

server (and not send back to the proxy server) and also the response by the origin server will be sent back via the ICAP server.

Figure 3.4: Result Modification [ICAP01, page 8]

The Response Modification (see figure 3.3) and Result Modification (see figure 3.4) are very similar. Here, the request of a client is answered by the origin server and the response is then directed to the ICAP server for modification (if any).

Service Request Modification

Request Satisfaction

Response Modification

Result Modification Content

Filtering

Yes Yes Yes Yes

Gateway Translation

Yes Yes

Language Translation

Yes Yes Yes

Virus Scanning

Yes Ad

Insertion

Yes Yes Yes Yes

Data

Compression

Yes Yes

Table 3.1: Service Architecture Summary [ICAP01, page 9]

(35)

3.1 Overview 27

Product Windows Solaris Linux

Symantec AntiVirus Scan Engine Yes Yes Yes

WebWasher (NAI/CAI Engine) Yes Yes Yes

Finjan SurfinGate for Web Yes Yes No

TrendMicro InterScan WebProtect for ICAP No Yes No Table 3.2: Available ICAP AntiVirus Servers

3.1.3 ICAP-enabled Anti-Virus Solutions

For this thesis, the Symantec AntiVirus Scan Engine and WebWasher CSM (with the virus scan engine from Network Associates/McAfee) have been used.

Table 3.2 shows currently available ICAP AntiVirus Servers.

3.1.3.1 Symantec AntiVirus Scan Engine

Figure 3.5: Web-frotend of Symantec AntiVirus Scan Engine

The Symantec AntiVirus Scan Engine (SAVSE) 4.x ¹ was the first ICAP en- abled anti-virus solution I received from an anti-virus vendor, which really had

1

see http://enterprisesecurity.symantec.com/products/products.cfm?ProductID=

173&EID=0

(36)

a working ICAP support ² . So, most testing and development of the ICAP client has been done using SAVSE. According to [SAVSE02, p. 14], ”the Symantec AntiVirus Scan Engine provides virus scanning and repair capabilities to any application on an IP network, regardless of platform, using one of three pro- tocols. Any application can pass files to the Symantec AntiVirus Scan Engine for scanning, which in turn scans the files for viruses and returns a cleaned file if necessary”. A scan request can be send via SAVSE’s own native proto- col, Internet Content Adaption Protocol (ICAP) and Remote Procedure Call (RPC) ³ ([SAVSE02, p. 15]). While it’s not possible to mention all features and configuration settings, I’d like to point out the following settings:

• ICAP scan policy allows you to configure the action taken by SAVE, i.e.

file is only scanned, scan and delete or scan and repair ([SAVSE02, p. 62])

• In-memory file limits, as SAVSE has it’s own in-memory file system used for decomposing and scanning for container and archive files, which is faster than on-disk scanning ([SAVSE02, p. 74]).

• Limits for container files (e.g. archive file), i.e. maximum file size, maxi- mum number of nested archive and maximum amount of time for decom- posing ([SAVSE02, p. 87]).

As already mentioned, SAVSE may use the so-called native protocol, which

”is a request/reply-based protocol. The protocol version, the request, and the file are all transmitted by the client upon connection with the scan engine. The reply consists of a reply code, scan results, and the file if the file has been modified” ([SSS02, p. 72]). The basic syntax of a client request is:

Version 2<CRLF>

<socket-command><CRLF>

<filename><CRLF>

<filesize><CRLF>

<filesize-bytes-of-data>

And for the server response:

Reply-Code<CRLF>

<scan-results>

<receive-file>

3.1.3.2 WebWasher Content Security Management (CSM) Suite The WebWasher (WW) Content Security Management (CSM) Suite provides the most features of the WebWasher product family ⁴ : Internet Access Man- agement, Internet Content Filtering, E-Mail Filtering and Reporting. Spam

2

the previous version had broken ICAP support. And I missed the fact, WebWasher offers anti-virus capabilities via third party anti-virus modules.

3

proprietary implementation

4

see http://www.webwasher.com/enterprise/products/webwasher_products/index.

html?lang=de_EN

(37)

3.2 Technical Specification 29

Figure 3.6: Web-frontend of Web Washer CSM Suite

Filtering and Virus Scanning are optional. I used the WW CSM suite with the Network Associates/McAfee virus engine. WW acts itself as ICAP server, but also as ICAP client (e.g. the Internet Content Filtering module can be config- ured to not use the internal WW-ICAP server but an external one). Moreover, WW CSM can be used with Squid, but provides an own HTTP proxy, too (without caching capabilities).

3.2 Technical Specification

3.2.1 Overview

The ICAP protocol is specified in RFC3507, published in April 2003 as an informational memo. We will discuss relevant information for performing AV scanning tasks only. For full details please refer to RFC3507 and related RFCs, esp. RFC 2119 (keywords used in RFCs) and RFC 2616 (HTTP/1.1).

Diploma Thesis University of Applied Sciences Furtwangen, Germany Faculty of Computer Science - Computer Networking

Diploma Thesis

University of Applied Sciences Furtwangen, Germany Faculty of Computer Science - Computer Networking

Server-based

Virus-protection On Unix/Linux

by Rainer Link

<mail@rainer-link.de>

Advisor: Prof. Hannelore Frank Advisor: Prof. Dr. Rainer Mueller Finished: May, 28 2003

Public Release: August, 2003

Preface

Abstract

Motivation

As nearly all the code I wrote past years was put under an Open Source License, I decided to release this thesis under the terms of the GNU Free Documentation License.

GNU General Public License, see http://www.gnu.org/copyleft/gpl.html

see e.g. http://www.geocrawler.com/archives/3/281/1999/4/0/1652065/

see e.g. http://sourceforge.net/mailarchive/forum.php?thread id=219140&forum id=4829

Overview of the Thesis

Chapter 1 gives an overview of computer-viruses and some other types of malware. As well as anti-virus technologies and anti-virus deployment.

Chapter 2 explains possible means to integrate third party anti-virus scanners into scripts and programs.

Chapter 4 explains briefly the use of AMaViS for protecting the mail server and the ICAP integration.

Chapter 5 shows two possible concepts for on-access, real-time scanning of Samba shares; focused on the direct Samba integration as implemented by the samba-vscan project. Results of file retrieval tests illustrates impacts on performance.

Chapter 6 discusses concepts for protecting HTTP/FTP transfers.

Chapter 7 summerizes the results and gives a short future outlook.

Credits

First of all, I’d like to thank my advisors Prof. Hannelore Frank and Prof. Dr.

Rainer Mueller for their support, feedback and suggestions.

A professional thank you goes to the following persons and/or companies:

• SuSE Linux AG for funding this diploma thesis and my AMaViS work for three years.

• Travis Priest, Rui Ataide (Symantec USA) and Gerald Maronde (Syman- tec Germany) for providing me with the latest Symantec AntiVirus Engine product before it was public available and for various ICAP/Symantec AntiVirus Scan Engine related discussions.

• Martin Stecher (WebWasher AG) for some email exchange about ICAP and WebWasher; Oxana Herzog and Elka Plattmann for sending a special trial evaluation key for the WebWasher CSM suite.

• Christian Hofmann of DATSEC for offering the latest Kaspersky An-

tiVirus for File servers and a one year license key.

iii

Feedback et al

Please send feedback, corrections, suggestions or even flames to

Rainer Link <mail@rainer-link.de>

I plan to maintain this thesis and release updated versions once in a while.

History

1.00 - final version (May, 28 2003); non-public

1.01 - changed title page, added history, added Appendix A (GNU FDL).

released to public

1.01a - corrected the vfs options setting, thanks for Stefan Metzmacher for the report (August, 9 2003)

License

This document is licensed under the terms of the GNU Free Documentation License (see http://www.fsf.org/licenses/fdl.html).

Copyright (c) 2002-2003 Rainer Link, OpenAntiVirus.org.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the

Free Software Foundation; with the Invariant sections "History"

and "Credits; no Front-Cover Texts, and the Back-Cover Text

"Diploma Thesis by Rainer Link. Published by OpenAntiVirus.org".

A copy of the license is included in the section entitled "GNU

Free Documentation License".

Contents

1 Introduction 1

1.1 Computer Viruses and Malware . . . . 1

1.1.1 Introduction & Definition . . . . 2

1.1.1.1 The Infection Mechanism . . . . 2

1.1.1.2 The Trigger . . . . 2

1.1.1.3 The Payload . . . . 2

1.1.2 Classification of Computer Viruses . . . . 3

1.1.2.1 Type of Host Victim . . . . 3

1.1.2.2 Type of (Infection) Technique . . . . 4

1.1.2.3 Special Virus Features . . . . 5

1.1.3 Macro & Script Viruses . . . . 6

1.1.3.1 Macro viruses . . . . 6

1.1.3.2 Scripting Viruses . . . . 8

1.1.4 Worms . . . . 8

1.2 Anti-Virus Technologies . . . . 9

1.2.1 Scanner . . . . 9

1.2.2 Integrity Checker . . . . 11

1.2.3 Behaviour Blocker . . . . 11

1.3 Anti-Virus Strategy . . . . 11

2 Server-based Virus Protection on Unix/Linux 13 2.1 Requirements & Aims . . . . 13

2.2 Integration of Anti-Virus Products . . . . 14

2.2.1 Command line Scanner . . . . 14

2.2.2 Application Programming Interface . . . . 15

2.2.3 Client-server Communication . . . . 17

2.2.3.1 Proprietary Protocols . . . . 18

2.2.3.2 Generalized Frameworks . . . . 20

3 ICAP 23 3.1 Overview . . . . 23

3.1.1 Introduction . . . . 23

This chapter gives an overview about computer-viruses and anti-virus tech- niques. If the reader is interested in a particular topic, the given reference(s) are worth a reading ¹ .