Main Page | Class Hierarchy | Class List | File List | Class Members

Weighted Decomposition Kernel

0.9

Introduction

The code in wdk.h is a C++ implementation of a family of kernels on discrete data structures within the general class of decomposition kernels called weighted decomposition kernel (WDK) for discrete data structures (e.g. graphs and sequences). WDK is computed by dividing objects into substructures indexed by a selector. Two substructures are then matched if their selectors satisfy an equality predicate, while the importance of the match is determined by a probability kernel on local distributions fitted on the substructures. Under reasonable assumptions, a WDK can be computed efficiently and can avoid combinatorial explosion of the feature space.

Compilation and Usage

The WDK kernel comes with an interface to SVM-Dlight, a modified version of T. Joachims' SVM-Light that allows dynamic loading of plugins for arbitrary data types and kernels. SVM-Dlight is the required host application for running the present plugin. SVM-Dlight can be downloaded from:OB

http://www.dsi.unifi.it/neural/src/svm-Dlight/

This will give you two programs for training and testing: svm_Dlearn and svm_Dclassify.

The WDK code with some example dataset file can be downloaded from:

http://www.dsi.unifi.it/neural/src/WDK/WDK0.9.tgz

To compile the WDK plugin, edit the enclosed Makefile to reflect details of your system and type:

make plugin

from a shell to produce wdk.so (Linux) or wdk.dylib (Mac). The plugin has been developed on Linux Debian Sarge and MacOS X 10.4 but it should be easy to adapt it to any system that support dynamic library loading via dlopen(), dlsym() etc.

Once compiled the plugin, install it in a convenient location. To use the kernel, launch an SVM-Dlight program with the -D option, for example

svm_Dlearn -D /your/path/wdk.so [other options] <train_data_filename> <model_filename>

Documentation

To use the WDK code just include "wdk.h" in your C++ source program. Below are the main classes, see the full documentation or the code itself and the paper [1] for more details.

WDKDataClass: container for an example. Public methods are: WDKDataClass() Constructs the null instance
WDKDataClass(const char*) Constructs an instance from a formatted line of text
double operator*(const WDKDataClass& aInstance) const Calculates the WDK between *this and the argument
stream operators << and >> are supported as usual

Parameters

Using the -u option in the svm_Dlearn software you can pass to the Weighted Decomposition Kernel plugin the following parameters:

when used through the UserKernelClass these other additional options are available:

Format

The input data set files have a special format that generalizes the sparse vector data format in svm-light. Each example is on a single line of text that is parsed according to the following EBNF:

Informally the data format is as follows:

Example: 1 dim:2 part:1 dim:1 attribute:1 dim:2 1:1 2:2 part:3 dim:2 attribute:1 dim:2 1:2 3:1 attribute:2 dim:3 1:2 2:2 3:2

Verbose explanation of the example: data with target=1, made of 2 parts, a part of type=1 with 1 attribute, attribute is of type=1 and is an histogram with 2 bins, the histogram has in bin=1 the value=1 and in bin=2 the value=2, then another part of type=3 with 2 attributes, one attribute of type=1 and is an histogram with 2 bins, the histogram has in bin=1 the value=2 and in bin=3 the value=1, then another attribute of type=2 which is an histogram of 3 bins, the histogram has in bin=1 the value=2 and in bin=2 the value=2 and in bin=3 the value=2.

References

  1. S. Menchetti, F. Costa, and P. Frasconi, Weighted Decomposition Kernels, ICML '05 [pdf]
  2. T. Joachims, Making Large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schoelkopf and C. Burges and A. Smola (ed.), MIT Press, 1999.

License

Copyright (C) Machine Learning and Neural Networks Group, Universita' di Firenze, Italia 2005

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

Author

Fabrizio Costa
Machine Learning and Neural Networks Group
Dipartimento di Sistemi e Informatica
Universita degli Studi di Firenze
Via Santa Marta, 3
50139 Firenze - Italy
Email: costa at dsi dot unifi dot it

Acknowledgments

For the SVM-Light code thanks to T. Joachims, for the SVM-Dligth code that has made it possible to use the WDK plugin thanks to Alessio Ceroni and Paolo Frasconi, for the invaluable help in the debugging process thanks to Sauro Menchetti.
Generated on Thu Aug 4 18:04:02 2005 for WeightedDecompositionalKernel by  doxygen 1.4.4