Click for other NIPS*97 workshops
 
Breckenridge, Colorado, December 5 1997
Organized by P. Frasconi and A. Sperduti
 
 
Morning session
P. Frasconi Introduction
C. Goller Learning Search-Control Heuristics for Automated Deduction Systems
Y. Bengio Learning Multi-Layered Transducers
P. Smyth Probabilistic Independence Networks for Sequential Data
Z. Ghahramani Variational Methods for Learning in Dynamic Bayesian Networks
J. Pollack Large Capacity RAAMs by Exploiting the IFS Dynamics
 
Afternoon session
M. Casey Representational Structure for Continuous Systems Implied by Discrete Behavior. A Case Study of Recurrent Neural Networks and Algorithmic Computation
S.C. Kremer A Grammatical Induction Perspective on the Learning of Structures
A. Küchler On the Correspondence between Neural Folding Architectures and  Tree Automata
M. Maggini Extraction of Frontier to Root Automata from Recursive Neural Networks
R. P. Ñeco Training recurrent neural networks to perform asynchronous translations
Y. Kalinke Recurrent Neural Networks to Approximate the Semantics of  
 Acceptable Logic Programs
R. Bakker Neural Learning of Chaotic Dynamics: Propagating Prediction Errors to Extend Prediction Limits

ABSTRACTS

 
 
 

Paolo Frasconi

Alessandro Sperduti

Dipartimento di Sistemi e Informatica 
Università di Firenze 
ITALY 
Dipartimento di Informatica 
Universita di Pisa 
ITALY
 

An introduction to connectionist models for data structures.

For a general introduction to  the problem of learning data structure, please see
 

Christoph Goller 

Institut für Informatik 
TU München 
D-80290 München, GERMANY
 

A Connectionist Approach for Learning Search-Control Heuristics for Automated Deduction Systems

Automated deduction has a long tradition in computer science and most of the symbolic AI systems perform some kind of logic-based deductive inference. The central problem in automated deduction is the explosive growth of search spaces with deduction length. Methods of guiding and controlling the search process are indispensable.

Search-control heuristics for automated deduction systems are traditionally represented by heuristic evaluation functions for states or inference steps in the search space. These heuristic evaluation functions compute ratings for symbolic structures (e.g. graphs, trees, terms, formulas) of arbitrary size. Most of the previous approaches for learning heuristic evaluation functions use a fixed, a priori defined
set of features for symbolic structures. However, the definition and selection of features introduces a very strong bias and severely limits the class of relations that can be expressed and learned. I present a
very powerful approach for adaptive structure processing which automatically finds the features which are relevant for the task at hand: ``folding architecture networks'' (FA-networks) with the training method ``backpropagation through structure'' (BPTS).

Backpropagation-networks (such as FA-networks) find (learn) approximations for mappings from input to output patterns from a finite set of examples by minimizing a continuous differentiable error measure. Therefore, the task of learning heuristic evaluation functions from examples of successful deductions has to be formulated as minimization problem in order to make BP-networks applicable. It is shown that standard error measures are not very suitable for learning search-control heuristics. A new more adequate error measure is presented.

The new cost function and FA-networks have been combined in a prototypical system for learning search-control heuristics for the theorem prover SETHEO. Experiments in group theory are presented, which show a considerable performance improvement compared to the original system.
 


 

Yoshua Bengio 

Dept. I.R.O. 
Universith de Montreal 
CANADA
Joint work with: Leon Bottou  , and Yann Le Cun  (AT & T Bell Labs.)
 

Learning Multi-Layered Transducers

Multi-Layered transducers are extensions of the idea of multi-layered neural networks in which each layer, instead of mapping a fixed-sized vector to another fixed-sized vector, maps a weighted directed acyclic graph to another such graph. The numerical information attached to the arcs of these graphs can represent scores for particular local hypotheses about the interpretation of some data, and the graph itself may represent a weighted set of sequential hypotheses about the data. Each layer may have trainable parameters, and the whole multi-layer transducer may be trained jointly to optimize some global criterion. This idea formalizes previous work in NN/HMM hybrids and it has been successfully applied to caracter recognition in documents.

 

Padhraic Smyth 

Information and Computer Science 
University of California, Irvine 
CA 92697-3425 
 
 

Probabilistic Independence Networks for Sequential Data


 

Zoubin Ghahramani 

Dept. of Computer Science 
University of Toronto 
CANADA
 
 

Variational Methods for Learning in Dynamic Bayesian Networks

Bayesian networks are directed acyclic graphs which represent dependencies between the variables in a probabilistic model. Many time series models, including the hidden Markov models (HMMs) used in speech recognition and Kalman filter models used in filtering and control applications, can be viewed as examples of dynamic Bayesian networks. I will present some dynamic Bayesian network models which can capture much richer structure than HMMs and Kalman filters, including spatial and temporal multiresolution structure, distributed hidden state representations, and multiple switching linear regimes. While exact probabilistic inference is intractable in these networks, one can obtain tractable variational approximations which call as subroutines the forward-backward and Kalman filter recursions. These approximations can be used to learn the model parameters by maximizing a lower bound on the likelihood.

 

Jordan B. Pollack 

Computer Science Department 
Center for Complex Systems 
Brandeis University
 
 

Large Capacity RAAMs by Exploiting the IFS Dynamics


 

Mike Casey 

Department of Psychology 
Rutgers University 
Newark, NJ
 
 

Representational Structure for Continuous Systems Implied by Discrete Behavior: A Case Study of Recurrent Neural Networks and Algorithmic Computation

The input/output behavior of a system can sometimes be used to infer a minimal internal knowledge representational structure of a system producing that behavior. For such a theory to be useful, however, it must be sensitive to not only the input/output behavioral description, but also must be tailored to a desired level of system description. An example of this process will be given for the case of input/output equivalent to an algorithmic computation, and a system description on the level of noisy, discrete-time finite dimensional dynamical systems (of which recurrent neural networks form a special subclass). Other examples or behaviors or systems will be given as time permits.

 

Stefan C. Kremer 

Department of Computing and Information Science 
University of Guelph 
Guelph, Ontario N1G 2W1, CANADA
 
 

A Grammatical Induction Perspective on the Learning of Structures


SORRY
NO PICTURE
AVAILABLE 

Andreas Küchler 

Department of Neural Information Processing 
University of Ulm 
GERMANY
 
 

On the Correspondence between Neural Folding Architectures and Tree Automata

Abstract
The folding architecture (FA) provided with adequate supervised training algorithms is a special recurrent neural network model designed to solve inductive learning tasks on structured domains. Recently, the generic architecture has been proven to be a universal approximator of mappings between rooted labeled ordered trees and real vector spaces.
Here we explore some formal correspondencies to the automata theory in order to characterize the computational/representational power of different instances of the generic folding architecture. As the main result we prove that very simple instances of the FA have the computational power of at least the class of deterministic bottom-up tree automata which in turn implies a well-known correspondence to the class of context-free languages.
All proofs are carried out in a constructive way, an analysis of the space complexity is provided. One can observe that most results and concepts well-known in the case of discrete-time recurrent models for sequence processing can be lifted to the FA and tree processing.
We briefly discuss implications for the application of the FA model to structural pattern recognition tasks, give hints for the placement into the "knowledge injection->refinement->extraction" framework and take a critical view on the adequacy of the automata theory to capture the dynamical behaviour of this class of neural network models.

 

Marco Maggini 

Dipartimento di Ingegneria dell'Informazione Universit` di Siena 
ITALY
Joint work with: Marco Gori   , and Enrico Martinelli
 

Extraction of Frontier to Root Automata from Recursive Neural Networks

The computation of recursive networks rely on on a set of real-valued state and input variables which are processed using a set of continuous operators such as multipliers, adders and sigmoidal functions. When processing each node in a graph, a real valued state vector is computed using the node label and the state vector of each child. Thus the state memorizes the relevant information extracted from the node descendants and the information available at the current node. On the other side when the graph nodes are labelled with symbols, classical symbolic processing models can be used. In the case of Acyclic Directed Graphs, Frontier to Root Automata (FRA) are a possible extension of the computation performed by Finite State Automata (FSA) in the case of sequences. This category of computational devices is characterized by a finite number of internal states and by a finite input alphabet. The state transitions are thus obtained from a state transition table which permits to determine the node state given the states of its children and its label. The continuous state representation developed by a recursive neural network during learning can be given a symbolic interpretation in terms of a FRA. In particular, a proper quantization of the neuron activation trajectory makes it possible to extract the FRA states and transition table from the recursive dynamics. Algorithms based on clustering techniques can be used to obtain the discrete state transition table from the continuous state transition function of the recursive network. We present some examples of FRA extraction and we show how in some cases the extracted rules can be given a very clear meaning as explicit classification rules.
 
 

Ramon P. Ñeco

Department of Information Systems and Languages 
Universitat d'Alacant 
SPAIN
Joint work with: Mikel L. Forcada
 

Training recurrent neural networks to perform aynchronous translations

Recent work has explored the relation between discrete-time recurrent neural networks (DTRNN) and finite-state machines (FSM) either by showing their computational equivalence (Casey, 1996; Blair and Pollack, 1995) or by training them to perform as finite-state recognizers from examples (Cleeremans et al., 1989; Pollack, 1991; Giles et al., 1992; Watrous and Kuhn, 1992, Manolios and Fanelli, 1994; Forcada and Carrasco, 1995). Most of this work has focused on the class of deterministic FSM that can perform synchronous translations, that is, translations that produce output symbols at the same time as they read input symbols, such as Mealy and Moore machines (Hopcroft and Ullman 1979:45). We present and analyse two neural architectures to perform asynchronous translations: a simple DTRNN model inspired on a class of extended FSM Neco and Forcada, 1996) and a modification of Pollack's (1990) recursive auto-associative memories (RAAM) named recursive hetero-associative memories (RHAM, Forcada and Neco, 1997). We train the networks using derivative-free methods such as Alopex (Unnikrishnan and Venugopal, 1991) as well as derivative-based methods such as backpropagation through time. We study the training and generalization performance of the networks to inputs both in the same length range and longer, and relate these to the internal representations of inputs developed by these models. We also compare the capacity of these models with the capacity of other neural approaches such as dual-ported RAAMs (Chrisman, 1991) and of discrete algorithmic approaches such as OSTIA (Oncina et al., 1993) using formal translations and real translations in restricted domains.

 

Steffen Hölldobler 

Artificial Intelligence Institute 
Department of Computer Science 
Dresden University of Technology 
D-01062 Dresden, GERMANY
Joint work with: Yvonne Kalinke
 

Recurrent Neural Networks to Approximate the Semantics of Acceptable Logic Programs

In this talk we show that a feedforward neural network with at least one hidden layer can approximate the meaning function for an acceptable logic program. This is found by using the property of acceptable logic programs that for this class of programs the meaning function is a contraction mapping on the complete metric space of the interpretations for the programm as shown by Fitting [Fit94].

[Fit94] M. Fitting: Metric Methods - Three Examples and a Theorem. Journal of Logic Programming, 21(3), pp. 1113-127, 1994.


 

Rembrandt Bakker 

Chemical Reactor Engineering Lab. 
Delft University of Technology 
Nederland
 
Joint work with:  Jaap C. Schouten, Cor M. van den Bleek  (Delft U.), 

C. Lee Giles  (NECI, Princeton) 

 
 

Neural Learning of Chaotic Dynamics: Propagating Prediction Errors to Extend Prediction Limits

An algorithm is introduced that trains a neural network to identify chaotic dynamics from measured spatio-temporal data. The algorithm has four special features:
1. The state of the system is extracted by using delays from each time-series, followed by weighted Principal Component Analysis (PCA) data reduction to remove linear correlations in time and space.
2. The prediction model consists of a linear model, a multi-layer-perceptron, and a recurrent unit that updates the state with newly predicted time-series values.
3. The effective prediction horizon during training is user-adjustable, due to 'error propagation': prediction errors are partially propagated to the next time step.
4. A criterion is monitored during training to select the model that has a chaotic attractor most similar to the real system's attractor.
The algorithm is applied to univariate laser data from the Santa Fe time-series competition (set A), and to a dataset consisting of timeseries of bubble patterns, measured at six different locations in an experimental gas-solids fluidized bed. In both cases, the resulting model is able to generate time-series with similar chaotic characteristics as the measured data.

Go back to the workshop's home page.

Paolo Frasconi

Last modified: Thu Nov 6 19:27:34 MET 1997