
| P. Frasconi | Introduction |
| C. Goller | Learning Search-Control Heuristics for Automated Deduction Systems |
| Y. Bengio | Learning Multi-Layered Transducers |
| P. Smyth | Probabilistic Independence Networks for Sequential Data |
| Z. Ghahramani | Variational Methods for Learning in Dynamic Bayesian Networks |
| J. Pollack | Large Capacity RAAMs by Exploiting the IFS Dynamics |
|
|
|
Christoph GollerInstitut für InformatikTU München D-80290 München, GERMANY |
Automated deduction has a long tradition in computer science and most of the symbolic AI systems perform some kind of logic-based deductive inference. The central problem in automated deduction is the explosive growth of search spaces with deduction length. Methods of guiding and controlling the search process are indispensable.Search-control heuristics for automated deduction systems are traditionally represented by heuristic evaluation functions for states or inference steps in the search space. These heuristic evaluation functions compute ratings for symbolic structures (e.g. graphs, trees, terms, formulas) of arbitrary size. Most of the previous approaches for learning heuristic evaluation functions use a fixed, a priori defined
set of features for symbolic structures. However, the definition and selection of features introduces a very strong bias and severely limits the class of relations that can be expressed and learned. I present a
very powerful approach for adaptive structure processing which automatically finds the features which are relevant for the task at hand: ``folding architecture networks'' (FA-networks) with the training method ``backpropagation through structure'' (BPTS).Backpropagation-networks (such as FA-networks) find (learn) approximations for mappings from input to output patterns from a finite set of examples by minimizing a continuous differentiable error measure. Therefore, the task of learning heuristic evaluation functions from examples of successful deductions has to be formulated as minimization problem in order to make BP-networks applicable. It is shown that standard error measures are not very suitable for learning search-control heuristics. A new more adequate error measure is presented.
The new cost function and FA-networks have been combined in a prototypical system for learning search-control heuristics for the theorem prover SETHEO. Experiments in group theory are presented, which show a considerable performance improvement compared to the original system.
|
Yoshua BengioDept. I.R.O.Universith de Montreal CANADA |
Multi-Layered transducers are extensions of the idea of multi-layered neural networks in which each layer, instead of mapping a fixed-sized vector to another fixed-sized vector, maps a weighted directed acyclic graph to another such graph. The numerical information attached to the arcs of these graphs can represent scores for particular local hypotheses about the interpretation of some data, and the graph itself may represent a weighted set of sequential hypotheses about the data. Each layer may have trainable parameters, and the whole multi-layer transducer may be trained jointly to optimize some global criterion. This idea formalizes previous work in NN/HMM hybrids and it has been successfully applied to caracter recognition in documents.
|
Padhraic SmythInformation and Computer ScienceUniversity of California, Irvine CA 92697-3425 |
|
Zoubin GhahramaniDept. of Computer ScienceUniversity of Toronto CANADA |
Bayesian networks are directed acyclic graphs which represent dependencies between the variables in a probabilistic model. Many time series models, including the hidden Markov models (HMMs) used in speech recognition and Kalman filter models used in filtering and control applications, can be viewed as examples of dynamic Bayesian networks. I will present some dynamic Bayesian network models which can capture much richer structure than HMMs and Kalman filters, including spatial and temporal multiresolution structure, distributed hidden state representations, and multiple switching linear regimes. While exact probabilistic inference is intractable in these networks, one can obtain tractable variational approximations which call as subroutines the forward-backward and Kalman filter recursions. These approximations can be used to learn the model parameters by maximizing a lower bound on the likelihood.
|
Jordan B. PollackComputer Science DepartmentCenter for Complex Systems Brandeis University |
|
Mike CaseyDepartment of PsychologyRutgers University Newark, NJ |
The input/output behavior of a system can sometimes be used to infer a minimal internal knowledge representational structure of a system producing that behavior. For such a theory to be useful, however, it must be sensitive to not only the input/output behavioral description, but also must be tailored to a desired level of system description. An example of this process will be given for the case of input/output equivalent to an algorithmic computation, and a system description on the level of noisy, discrete-time finite dimensional dynamical systems (of which recurrent neural networks form a special subclass). Other examples or behaviors or systems will be given as time permits.
|
Stefan C. KremerDepartment of Computing and Information ScienceUniversity of Guelph Guelph, Ontario N1G 2W1, CANADA |
|
|
Andreas KüchlerDepartment of Neural Information ProcessingUniversity of Ulm GERMANY |
Abstract The folding architecture (FA) provided with adequate supervised training algorithms is a special recurrent neural network model designed to solve inductive learning tasks on structured domains. Recently, the generic architecture has been proven to be a universal approximator of mappings between rooted labeled ordered trees and real vector spaces.
Here we explore some formal correspondencies to the automata theory in order to characterize the computational/representational power of different instances of the generic folding architecture. As the main result we prove that very simple instances of the FA have the computational power of at least the class of deterministic bottom-up tree automata which in turn implies a well-known correspondence to the class of context-free languages.
All proofs are carried out in a constructive way, an analysis of the space complexity is provided. One can observe that most results and concepts well-known in the case of discrete-time recurrent models for sequence processing can be lifted to the FA and tree processing.
We briefly discuss implications for the application of the FA model to structural pattern recognition tasks, give hints for the placement into the "knowledge injection->refinement->extraction" framework and take a critical view on the adequacy of the automata theory to capture the dynamical behaviour of this class of neural network models.
|
Marco MagginiDipartimento di Ingegneria dell'Informazione Universit` di SienaITALY |
, and Enrico Martinelli
The computation of recursive networks rely on on a set of real-valued state and input variables which are processed using a set of continuous operators such as multipliers, adders and sigmoidal functions. When processing each node in a graph, a real valued state vector is computed using the node label and the state vector of each child. Thus the state memorizes the relevant information extracted from the node descendants and the information available at the current node. On the other side when the graph nodes are labelled with symbols, classical symbolic processing models can be used. In the case of Acyclic Directed Graphs, Frontier to Root Automata (FRA) are a possible extension of the computation performed by Finite State Automata (FSA) in the case of sequences. This category of computational devices is characterized by a finite number of internal states and by a finite input alphabet. The state transitions are thus obtained from a state transition table which permits to determine the node state given the states of its children and its label. The continuous state representation developed by a recursive neural network during learning can be given a symbolic interpretation in terms of a FRA. In particular, a proper quantization of the neuron activation trajectory makes it possible to extract the FRA states and transition table from the recursive dynamics. Algorithms based on clustering techniques can be used to obtain the discrete state transition table from the continuous state transition function of the recursive network. We present some examples of FRA extraction and we show how in some cases the extracted rules can be given a very clear meaning as explicit classification rules.
|
Ramon P. ÑecoDepartment of Information Systems and LanguagesUniversitat d'Alacant SPAIN |
Recent work has explored the relation between discrete-time recurrent neural networks (DTRNN) and finite-state machines (FSM) either by showing their computational equivalence (Casey, 1996; Blair and Pollack, 1995) or by training them to perform as finite-state recognizers from examples (Cleeremans et al., 1989; Pollack, 1991; Giles et al., 1992; Watrous and Kuhn, 1992, Manolios and Fanelli, 1994; Forcada and Carrasco, 1995). Most of this work has focused on the class of deterministic FSM that can perform synchronous translations, that is, translations that produce output symbols at the same time as they read input symbols, such as Mealy and Moore machines (Hopcroft and Ullman 1979:45). We present and analyse two neural architectures to perform asynchronous translations: a simple DTRNN model inspired on a class of extended FSM Neco and Forcada, 1996) and a modification of Pollack's (1990) recursive auto-associative memories (RAAM) named recursive hetero-associative memories (RHAM, Forcada and Neco, 1997). We train the networks using derivative-free methods such as Alopex (Unnikrishnan and Venugopal, 1991) as well as derivative-based methods such as backpropagation through time. We study the training and generalization performance of the networks to inputs both in the same length range and longer, and relate these to the internal representations of inputs developed by these models. We also compare the capacity of these models with the capacity of other neural approaches such as dual-ported RAAMs (Chrisman, 1991) and of discrete algorithmic approaches such as OSTIA (Oncina et al., 1993) using formal translations and real translations in restricted domains.
|
Steffen HölldoblerArtificial Intelligence InstituteDepartment of Computer Science Dresden University of Technology D-01062 Dresden, GERMANY |
In this talk we show that a feedforward neural network with at least one hidden layer can approximate the meaning function for an acceptable logic program. This is found by using the property of acceptable logic programs that for this class of programs the meaning function is a contraction mapping on the complete metric space of the interpretations for the programm as shown by Fitting [Fit94].[Fit94] M. Fitting: Metric Methods - Three Examples and a Theorem. Journal of Logic Programming, 21(3), pp. 1113-127, 1994.
|
Rembrandt BakkerChemical Reactor Engineering Lab.Delft University of Technology Nederland |
| Joint work with: | Jaap C. Schouten, Cor
M. van den Bleek
(Delft U.),
C. Lee Giles |
An algorithm is introduced that trains a neural network to identify chaotic dynamics from measured spatio-temporal data. The algorithm has four special features:
1. The state of the system is extracted by using delays from each time-series, followed by weighted Principal Component Analysis (PCA) data reduction to remove linear correlations in time and space.
2. The prediction model consists of a linear model, a multi-layer-perceptron, and a recurrent unit that updates the state with newly predicted time-series values.
3. The effective prediction horizon during training is user-adjustable, due to 'error propagation': prediction errors are partially propagated to the next time step.
4. A criterion is monitored during training to select the model that has a chaotic attractor most similar to the real system's attractor.
The algorithm is applied to univariate laser data from the Santa Fe time-series competition (set A), and to a dataset consisting of timeseries of bubble patterns, measured at six different locations in an experimental gas-solids fluidized bed. In both cases, the resulting model is able to generate time-series with similar chaotic characteristics as the measured data.