Weighted Decomposition Kernel
0.9
The code in wdk.h is a C++ implementation of a family of kernels on discrete data structures within the general class of decomposition kernels called weighted decomposition kernel (WDK) for discrete data structures (e.g. graphs and sequences). WDK is computed by dividing objects into substructures indexed by a selector. Two substructures are then matched if their selectors satisfy an equality predicate, while the importance of the match is determined by a probability kernel on local distributions fitted on the substructures. Under reasonable assumptions, a WDK can be computed efficiently and can avoid combinatorial explosion of the feature space.
The WDK kernel comes with an interface to SVM-Dlight, a modified version of T. Joachims' SVM-Light that allows dynamic loading of plugins for arbitrary data types and kernels. SVM-Dlight is the required host application for running the present plugin. SVM-Dlight can be downloaded from:OB
http://www.dsi.unifi.it/neural/src/svm-Dlight/
This will give you two programs for training and testing: svm_Dlearn and svm_Dclassify.
The WDK code with some example dataset file can be downloaded from:
http://www.dsi.unifi.it/neural/src/WDK/WDK0.9.tgz
To compile the WDK plugin, edit the enclosed Makefile to reflect details of your system and type:
make plugin
from a shell to produce wdk.so (Linux) or wdk.dylib (Mac). The plugin has been developed on Linux Debian Sarge and MacOS X 10.4 but it should be easy to adapt it to any system that support dynamic library loading via dlopen(), dlsym() etc.
Once compiled the plugin, install it in a convenient location. To use the kernel, launch an SVM-Dlight program with the -D option, for example
svm_Dlearn -D /your/path/wdk.so [other options] <train_data_filename> <model_filename>
To use the WDK code just include "wdk.h" in your C++ source program. Below are the main classes, see the full documentation or the code itself and the paper [1] for more details.
WDKDataClass: container for an example. Public methods are: WDKDataClass() Constructs the null instance
WDKDataClass(const char*) Constructs an instance from a formatted line of text
double operator*(const WDKDataClass& aInstance) const Calculates the WDK between *this and the argument
stream operators << and >> are supported as usual
Using the -u option in the svm_Dlearn software you can pass to the Weighted Decomposition Kernel plugin the following parameters:
- -KDot Flag for choosing the DotProduct kernel type
- -KHistInt Flag for choosing the HistogramIntersection kernel type
- -KNoProdNorm Unset the normalization of data kernel
- -KNoAttrNorm Unset the normalization of attribute kernel
when used through the UserKernelClass these other additional options are available:
- -Kd Degree coefficient for the polynomial kernel
- -Kr Constant coefficient for the polynomial kernel
- -Ks Multiplicative coefficient for the polynomial kernel
- -Kg Gamma coefficient for the exponential kernel
- -KNoTotNorm Unset the normalization of the overall kernel (ex. the exponential over the HistogramIntersection)
The input data set files have a special format that generalizes the sparse vector data format in svm-light. Each example is on a single line of text that is parsed according to the following EBNF:
- Target::= 1|-1
- Dim::= 'dim:'Integer{Integer}
- Bin::= Integer{Integer}
- Count::= Integer{Integer}
- HistogramBin::= Bin':'Count
- Histogram::= HistogramBin {HistogramBin}
- Attribute::= 'attribute:'Integer{Integer} Dim Histogram
- Part::= 'part:'Integer{Integer} Dim Attribute {Attribute}
- Data::= Target Dim Part {Part}
Informally the data format is as follows:
- target
- keyword <dim:> and the number of parts
- keyword <part:> and an integer representing the type of part (selector)
- keyword <dim:> and the number of attributes
- keyword <attribute:> and an integer representing the type of attribute (context)
- keyword <dim:> and the number of bins for the attribute histogram
- sequence of pairs <bin:value>
Example: 1 dim:2 part:1 dim:1 attribute:1 dim:2 1:1 2:2 part:3 dim:2 attribute:1 dim:2 1:2 3:1 attribute:2 dim:3 1:2 2:2 3:2
Verbose explanation of the example: data with target=1, made of 2 parts, a part of type=1 with 1 attribute, attribute is of type=1 and is an histogram with 2 bins, the histogram has in bin=1 the value=1 and in bin=2 the value=2, then another part of type=3 with 2 attributes, one attribute of type=1 and is an histogram with 2 bins, the histogram has in bin=1 the value=2 and in bin=3 the value=1, then another attribute of type=2 which is an histogram of 3 bins, the histogram has in bin=1 the value=2 and in bin=2 the value=2 and in bin=3 the value=2.
- S. Menchetti, F. Costa, and P. Frasconi, Weighted Decomposition Kernels, ICML '05 [pdf]
- T. Joachims, Making Large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schoelkopf and C. Burges and A. Smola (ed.), MIT Press, 1999.
Copyright (C) Machine Learning and Neural Networks Group, Universita' di Firenze, Italia 2005
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Fabrizio Costa
Machine Learning and Neural Networks Group
Dipartimento di Sistemi e Informatica
Universita degli Studi di Firenze
Via Santa Marta, 3
50139 Firenze - Italy
Email: costa at dsi dot unifi dot it
For the SVM-Light code thanks to T. Joachims, for the SVM-Dligth code that has made it possible to use the WDK plugin thanks to Alessio Ceroni and Paolo Frasconi, for the invaluable help in the debugging process thanks to Sauro Menchetti.
Generated on Thu Aug 4 18:04:02 2005 for WeightedDecompositionalKernel by
1.4.4