SVM-Dlight

What is SVM-Dlight?

SVMDlight is a slightly modified version of SVMLight that allows a programmer to create plugins for reading arbitrary data types (e.g. graphs / relational data) and calculating a custom kernel on them. The two programs svm_Dlearn and svm_Dclassify behave exactly as T. Joachims' svm_learn and svm_classify, but accept an extra option (-D) for loading a custom dynamic library that supports a particular data type with an associated kernel function.

SVMDlight is more a recipe for writing new plugins than something interesting per se. You may find it useful if you wish to use SVMLight with your own data types and kernels.

Availability

SVMDlight can be downloaded from

http://www.dsi.unifi.it/neural/src/svm-Dlight/svm-Dlight.tgz

Usage

svm_Dlearn and svm_Dclassify accept the same options as svm_learn and svm_classify. Without the -D option they behave exactly in the same way when dealing with sparse vector data.

Suppose you have a new interesting data type, say 'gnats', and have just found a kernel function on gnats. To program your kernel in SVMDlight you need to take the following steps:

  1. Define a C struct (or C++ class) associated with your data type, say struct Gnat { ... };
  2. Define your own text format for representing each example in the data file. This format should resemble the text format used in SVMLight, i.e. the target (y) followed by a textual representation of the input (x). The entire example must be written on a single text line.
  3. Write some C code for parsing the above data format and reading each example into your own internal memory data structure. Wrap this code into a C function with signature void* plugin_parse_document(char*, double*, long*,long*, double*, long*, long, char**); that is going to replace the similar function in SVMLight (see comments in the file plugin.c for details) and that returns a void pointer to the constructed struct.
  4. Write some C code for writing a text line in the format defined at point 2 starting from a pointer to your data structure. Wrap this code in a function with signature void plugin_write(FILE*, void*);
  5. Write your own kernel function in C. Wrap it into a function with signature double plugin_kernel(void* a, void* b); At run time, the SVMLight optimizer will call this function passing two pointers to the SVM-Light struct SVECTOR. In turn this struct has a pointer member called words that points to your own data structure. For example, you can use a code fragment like: struct Gnat* my_ptr_a = (struct Gnat*)(((SVECTOR*)a)->words) to access your own data
  6. Write the following auxiliary functions:
  7. Compile your code in a dynamic library, e.g. gnats.so under Linux

At this point you can run SVMDlight as follows

svm_Dlearn -D /path/to/your/lib/gnats.so [other opts] mygnats.trainset model

All usual svm_learn options can be still given. When using -D, the kernel type option (-t) has a special meaning:

Test goes as ususal:

svm_Dclassify -D /path/to/your/lib/gnats.so [other opts] mygnats.testset model

Additionally you may pass parameters to your kernel by using the -u command line switch. For example

svm_Dlearn -D /path/to/your/lib/gnats.so -u "2 true" mygnats.trainset model

will pass the string "2 true" containing hyperparameters specific of your own kernel. To parse this string write some code in the function void plugin_kernel_setparm(char*);

The files helloworld.h and helloworld.c contain a very simple plugin example where the 'arbitrary' datum consists of a single real number and the kernel k(a,b) is either a*b or min(a,b) depending on the parameter passed with -u

Platforms

SVMDlight has been tested on Linux Debian Sarge and MacOS X 10.4 but it should work (or should be easy to adapt) in any system that supports dynamic library loading via dlopen(), dlsym() etc.

Credits, acknowledgements, license, etc.

This code is just a modest but convenient interface to SVMLight 6.0.1 and adds very little functionality to it. We are releasing it in the public domain because it is necessary to run experiments based on some kernels developed by our group. If you find this code useful for your own project you should still give credit for SVMLight to T. Joachims, who kindly granted us permission to repost his modified code here. Our small additions can be freely used by anyone. However the full working program should be regarded as subject to the same licensing conditions as SVMLight (see the enclosed license for SVMLight). To ask permissions for modifying and using this program in a commercial context you should contact the author of SVMLight, not us.

Authors

Alessio Ceroni, Paolo Frasconi
Machine Learning and Neural Networks Group
Dipartimento di Sistemi e Informatica
Universita degli Studi di Firenze
Via Santa Marta, 3
50139 Firenze - Italy
Email: aceroni AT dsi.unifi.it

Generated on Wed Sep 5 17:26:48 2007 for SVM-Dlight by  doxygen 1.5.1