Project Overview
The basic goal of our research is understanding the computational foundations of how humans extract information from language. If humans - especially young humans - are extracting information about the linguistic system underlying the observable language data, we tend to call this process language acquisition. We also tend to marvel at how good young humans are at it. If humans are extracting non-verbal information about the world from the observable language data, we tend to call this process information extraction (or perhaps comprehension). When we try to make machines extract this same information from language data, guess what? - we tend to marvel at how good humans are at it.
One reason humans may be so good at doing this is that they could have a proverbial leg up, in the form of biases about how to use the available language data. Computational models provide us a way to precisely explore this question by combining discrete hypotheses with probabilistic methods. Via computational modeling, we can examine what biases humans bring (both helpful and perhaps not-so-helpful) to different information extraction tasks, whether these biases are necessary for success, and what the nature of the necessary biases is.
Some specific areas of research
Language Acquisition
Language acquisition - that is, learning the underlying linguistic systems responsible for the observable language data - is a classically difficult problem. This is particularly true when the correct linguistic systems are underdetermined by the available data, a situation often referred to as "poverty of the stimulus" or "the inductive problem of language learning". Research in this area explores how children acquire the correct knowledge about the language that they do from the data they actually encounter, and what they need in order to do it. Projects include studies of the acquisition of basic word order, referential elements, stress contours, syntactic islands, and free relative clauses, among other phenomena.
This work has been supported by the National Science Foundation, via grants BCS-0843896 and BCS-1347028, in collaboration with Jon Sprouse. Project summaries are here and here. Check out the results of BCS-0843896 and the results of BCS-1347028. Also, check out the Input & Syntactic Acquisition 2009 workshop held at UC Irvine, the Input & Syntactic Acquisition 2012 workshop held at the 2012 annual meeting of the Linguistic Society of America in Portland, Oregon, and the SynLinks 2016 workshop held at UConn.
Models of Acquirability
A partially intersecting line of research focuses on language learning models that are constrained in the ways that humans are for accomplishing a particular acquisition task. That is, instead of asking, "Can it be learned at all by a model?", these models ask, "Can it be learned by a model that uses the input humans use in the way that humans use that input?" Often, this may involve adapting more general models of learnability so that they are models of "acquirability". Projects include studies of speech segmentation, categorization, the acquisition of referential elements, and the acquisition of syntactic knowledge.
Linguistic Cues to Information about the World
Beyond comprehending the straightforward meaning of language data (which isn't necessarily so straightforward at all to comprehend), people also seem able to extract more subtle information about social relationships, intentions, emotions, and attitudes, in the absence of information such as facial expressions and auditory cues. Since the only information available is the language text, humans must be using linguistic cues to do so. This area of research focuses on what these cues are, whether there are additional informative cues available, and how machines can learn to be as good as humans are (or perhaps even better one day). Projects include studies of mental state inference, authorship, and message tone, in the general areas of computational sociolinguistics.
This work has been supported by the UC Irvine Academic Senate Council on Research, Computing, and Libraries, multi investigator research grant MI 14B-2009-2010, in collaboration with Mark Steyvers and Padhraic Smyth, and by a Social Sciences Associate Professor Research Award in 2015. Short papers talking about this kind of project are here and here, and more recent work can be found in the publications section.