American Child Texts

Overview

This umbrella project has the broad goal of collecting and analyzing of American child-directed texts. Stage one of this project involves the collection and aggregation of child-directed texts in the form of a text corpus. The long-term goal is to analyze this text corpus for interesting aspects of semantics, phonology, and orthographic properties that could inform our understand of the early childhood literacy environment. One arm of this work is looking at the genderedness of the language contained in these books, a collaboration with Molly Lewis and Gary Lupyan. In order to develop a high-quality corpus for such research, we will rely on a number of resources to validate the quality of the texts chosen for the corpus.

Communication

Communication for this project is held on a Slack team separate from the main LCNL Slack team. The URL for that team is readingteamlcnl@slack.com. If you need access to the team contact Matt and he can add you. The major channel of communication for this project on that team is #child_text_semantics, but others will be relevant as well, no doubt.

Research Team

Mark Seidenberg, PI (Slack: Mark S; seidenberg@wisc,edu)
Molly Lewis (U Chicago/UW)
Matt Cooper Borkenhagen, graduate student (Slack: mcooperborkenhagen; cooperborken@wisc.edu)
Ellen Converse, project lead/RA (Slack: Ellen Converse)

Data

Summary data from the gender norming on mturk
By-word and by-book analyses of genderedness of our corpus
Semantic space estimates for gender
Color distributions and genderedness of book covers
Character gender analyses
Cluster solutions for normed words

Documents and Procedures

The following resources point to web-based documentation of procedures and information pertaining to the project. If you have difficulty accessing any of these resources, notify Matt and he can grant you access. If any links are broken, let the leadership team on the project know.

The parent folder for the project can be found here
The folder specific to CDTC files is here
For acquiring and constructing the child-directed text corpus, follow this link
Here is the spreadsheet that outlines the titles we have targeted and their status

Articles

Montag et al. (2015) from Psych Science
Scott et al. on the Glasgow Norms
Hudson Kam & Mathewson (2016) in J. Child Lang.