Perspective paper on second (and third and …) language learning as hierarchical inference

Posted on Updated on


We’ve just submitted a perspective paper on second (and third and …) language learning as hierarchical inference that I hope might be of interest to some of you (feedback welcome).

multilingualism
Figure 1: Just as the implicit knowledge about different speakers and groups of speakers (such as dialects or accents) contains hierarchical relations across different language models, the implicit knowledge about multiple languages can be construed as a hierarchical inference process.

We’re building on Bozena’s thesis work on L2 acquisition with Eric Baković and Roger Levy and motivating it through work on L1 language processing (focusing on speech perception and syntactic processing). Specifically, we review evidence that L1 language processing can be construed as hierarchical inference over generative models –roughly speaking, language models (or ‘mini grammars’, as some people called it during a workshop at the 2013 LSA Summer Institute) for specific context, speakers, etc. that are hierarchically organized, thereby allowing to capture generalizations across speakers and groups of speaker (a paper that further details this view for L1 speech perception is under revision; Kleinschmidt and Jaeger, draft available upon request).  While tentative, we think that this view provides a unifying framework for a variety of otherwise unrelated phenomena in L1, L2, etc. language learning and processing in terms of inference over uncertainty at multiples level. For me, well, read the paper. We provide an informal and, hopefully, somewhat intuitive introduction into this computational framework that draws on and incorporates earlier work in statistical / distributional learning in second language acquisition and adaptation during speech perception and sentence processing in L1.

Here are some figures (some are also in the paper) that illustrate the idea for speech perception, courtesy of Dave Kleinschmidt.

The relationship between category distributions along a single cue dimension (in this case voice onset timing [VOT]) and the classification function in phonological categorization according to an ideal observer model
Figure 2: The relationship between category distributions along a single cue dimension (in this case voice onset timing [VOT]) and the classification function in phonological categorization according to an ideal observer model
Differences in categorization boundaries (right) and their assumed corresponding differences in the underlying category distributions along the cue dimension.
Figure 3: Differences in speaker-specific categorization boundaries (right) and their assumed corresponding differences in the underlying speaker-specific category distributions along the cue dimension.
generative-model-samples-ellipse
Figure 4: Knowledge (beliefs) about speaker-specific category distributions (if parametrically interpreted) can be seen describing a speaker in the multi-dimensional parameter space (in the illustration in the right panel, only the uncertainty about the category-specific VOT means are shown).
clusters-and-hierarchy
Figure 5: Hierarchical knowledge (beliefs) about speakers and groups of speakers (e.g., accents or dialects; right panel) describes uncertainty about cue distributions (left panel). This knowledge is presumably being build throughout life, provided one keeps encountering new speakers and groups of speakers (for intriguing evidence that we can indeed induce such structure, see e.g. Bradlow and Bent 2008)
Advertisements

Questions? Thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s