Rich Prior Knowledge in Learning for Natural Language Processing
We possess a wealth of prior knowledge about most prediction problems,
and particularly so for many of the fundamental tasks in natural
language processing. Unfortunately, it is often difficult to make
use of this type of information during learning, as it typically does
not come in the form of labeled examples, may be difficult to encode
as a prior on parameters in a Bayesian setting, and may be impossible
to incorporate into a tractable model. Instead, we usually have prior
knowledge about the values of output variables. For example, linguistic
knowledge or an out-of-domain parser may provide the locations of
likely syntactic dependencies for grammar induction. Motivated by
the prospect of being able to naturally leverage such knowledge, four
different groups have recently developed similar, general frameworks
for expressing and learning with side information about output variables.
These frameworks are Constraint-Driven Learning (UIUC), Posterior
Regularization (UPenn), Generalized Expectation Criteria (UMass Amherst),
and Learning from Measurements (UC Berkley).
This tutorial describes how to encode side information about output
variables, and how to leverage this encoding and an unannotated
corpus during learning. We survey the different frameworks, explaining
how they are connected and the trade-offs between them. We also survey
several applications that have been explored in the literature,
including applications to grammar and part-of-speech induction, word
alignment, information extraction, text classification, and multi-view
learning. Prior knowledge used in these applications ranges from
structural information that cannot be efficiently encoded in the model,
to knowledge about the approximate expectations of some features, to
knowledge of some incomplete and noisy labellings. These applications
also address several different problem settings, including unsupervised,
lightly supervised, and semi-supervised learning, and utilize both
generative and discriminative models. The diversity of tasks, types of
prior knowledge, and problem settings explored demonstrate the generality
of these approaches, and suggest that they will become an important tool
for researchers in natural language processing.
The tutorial will provide the audience with the theoretical background to
understand why these methods have been so effective, as well as practical
guidance on how to apply them. Specifically, we discuss issues that come
up in implementation, and describe a toolkit that provides “out-of-the-box”
support for the applications described in the tutorial, and is extensible
to other applications and new types of prior knowledge.
Date: 2011-May-27 Time: 15:30:00 Room: 336
For more information:
Workshop “Metabolism and mathematical models: Two for a tango” – 2nd Edition
Title: Workshop Metabolism and mathematical models: Two for a tango – 2nd Edition
Dates: October 25-26, 2022
Location: This workshop will be held in a virtual way
The topic of this workshop is metabolism in general, with a special focus, although not exclusive, on parasitology. Besides an exploration of the biological, biochemical and biomedical aspects, the workshop will also aim at presenting some of the mathematical modelling, algorithmic theory and software development that have become crucial to explore such aspects.
This workshop is being organised in the context of two projects, both with the Inria European Team Erable. One of the projects involves a partnership with the University of São Paulo (USP), in São Paulo, Brazil, more specifically the Institute of Mathematics and Statistics (IME) and the Institute of Biomedical Sciences – Inria Associated Team Capoeira – and the other involves the Inesc-ID/IST in Portugal, ETH in Zürich and EMBL in Heidelberg – H2020 Twinning Project Olissipo.
The workshop is open to all members of these two projects but also, importantly, to the community in general.
The program and more details are available here.