EMLAR XX 2024

Speakers

Raquel G. Alhama

Modeling linguistic productivity: the case of determiners

Having heard “a pimwit”, English-speakers know immediately that “the pimwit” is possible, even if they haven’t heard the phrase before. Researchers from diverse theoretical perspectives agree that this type of productivity can be explained with syntactic categories (namely, determiner and noun), but have long debated whether it is necessary to assume that such categories are present from birth, or instead they can be learned from the input. In our work, we track determiner production and onset of productivity in a large sample of children. Our approach differs from previous work in at least four interrelated ways. First, we model determiner productivity with a data-driven model that is not pre-equipped with any notion of syntactic categories, and investigate to what extent this model, when trained solely on caregiver child-directed utterances, can reproduce behavioral patterns of children. Second, rather than quantifying the strength of the evidence for abstract grammatical categories in children’s early speech, we propose a new metric that quantifies the onset of grammatical productivity for individual children. Third, to be able to observe the onset, we base our studies on a large longitudinal dataset that allows us to track determiner productivity at early learning stages. Finally, we use our model to find out instances of true generalization, i.e. determiner+noun productions that have not been seen in the input data (and hence cannot be a result of imitation). Results show gradual emergence of determiner productivity in child language, suggesting that the syntactic category is learned from the input in a bottom-up fashion.