------------------------------------------------
 

SVMlight: Support Vector Machine

------------------------------------------------
 
Author: Thorsten Joachims <thorsten@ls8.informatik.uni-dortmund.de>
Version: 0.9
Date: 13.11.97

Overview

SVMlight is an implementation of Support Vector Machines (SVMs) in C. The main features of the program are:

Description

SVMlight is a fully functional implementation of Vapnik's Support Vector Machine [Vapnik/95]. The optimization algorithm used is a refined version of the decomposition algorithm proposed in [Osuna, et al., 1997]. It will be described in detail in a forthcoming paper. The algorithm has modest memory requirements and can handle problems with many thousands of support vectors efficiently. So far the code was mainly used for learning text classifiers [Joachims, 1997].  Text classification tasks have the property of sparse instance vectors. This implementation makes use of this property which leads to a very compact and efficient representation.

Getting SVMlight

The source code and binaries are available here. Both are limited to scientific use! If you get the binaries, please be aware that they include code from the  DONLP2 optimization package written by  P. Spellucci. Make sure you read his copyright information. For now binaries are available for the following platforms: The SunOS 4.1.3 version is less memory efficient, since it uses the f2c-converted version of DONLP2.

Source Code

The source code is available for scientific use only. It must not be modified and distributed without prior permission of the author. The implementation was developed on Solaris 2, but compiles also on SunOS 3.1.4. Although I have not tried it yet, I do not see why it should not run on other platforms, too. The source code is available at the following location: SVMlight uses the DONLP2 optimization package written by  P. Spellucci for solving intermediate quadratic programming problems. It can be downloaded from Please read the copyright information. DONLP2 is written in Fortran, but compiles and links just fine into C code using gcc. If this does not work for you, you might want to download the f2c converted version. It is less flexible and uses more memory, though:

Installation

To install SVMlight you need to download svm_light.tar.gz and donlp2.tar.gz (or donlp2_c.tar.gz). Create a new directory: and move both files in there. Unzip and untar the files with the following commands: Now run the sh-batch file which compiles the system and created the two executables If the system does not compile because you do not have the f2c library, check here.

How to use?

svm_learn is called with the following parameters: Available options are: The input file example_file contains the training examples. The first line contains contains comments and is ignored. Each of the following lines represents one training example an is of the following format: The class label and each of the feature/value pairs are separated by a space character. Feature/value pairs MUST be ordered by increasing feature number. Features with value zero can be skipped.

The result of svm_learn is the model which is learned from the training data in example_file. The model is written to model_file. To classify test examples, svm_classify reads this file. svm_classify is called with the following parameters:

Available options are: All test examples in example_file are classified and the predicted classes are written to output_file.  The example file has the same format as the one for svm_learn. Additionally <class> can have the value zero indicating unknown.

Getting started: an Example Problem

You will find an example text classification problem at Download this file into your svm_light directory and unpack it with This will create a subdirectory example1. Documents are represented as feature vectors. Each feature corresponds to a word stem (9947 features). The task is to learn which Reuters articles are about "corporate acquisitions". There are 1000 positive and 1000 negative examples in the file train.dat.  The file test.dat contains 600 test examples. The feature numbers correspond to the line numbers in the file words. To run the example, execute the commands: The accuracy on the test set is printed to stdout.

Questions and Bug Reports

If you find bugs or you have problems with the code you cannot solve by yourself, please contact me via email <thorsten@ls8.informatik.uni-dortmund.de>.

References

[Joachims, 1997]        T. Joachims, Text Categorization with Support Vector
                        Machines, to be published,
                        http://www.cs.cmu.edu/~thorsten/tcatsvm.ps, 1997.

[Osuna, et al., 1997]   E. Osuna, R. Freund, and F. Girosi, An Improved Training
                        Algorithm for Support Vector Machines, IEEE NNSP, 1997.

[Vapnik, 1995]          V. Vapnik, The Nature of Statistical Learning Theory,
                        Springer, New York, 1995.
 

------------------------------------------------

Last modified November 13th, 1997 by Thorsten Joachims <thorsten@ls8.informatik.uni-dortmund.de>