This umbrella project has the broad goal of collecting and analyzing of American child-directed texts. Stage one of this project involves the collection and aggregation of child-directed texts in the form of a text corpus. The long-term goal is to analyze this text corpus for interesting aspects of semantics, phonology, and orthographic properties that could inform our understand of the early childhood literacy environment. One arm of this work is looking at the genderedness of the language contained in these books, a collaboration with Molly Lewis and Gary Lupyan. In order to develop a high-quality corpus for such research, we will rely on a number of resources to validate the quality of the texts chosen for the corpus.
Communication for this project is held on a Slack team separate from the main LCNL Slack team. The URL for that team is email@example.com. If you need access to the team contact Matt and he can add you. The major channel of communication for this project on that team is #child_text_semantics, but others will be relevant as well, no doubt.
- Mark Seidenberg, PI (Slack: Mark S; seidenberg@wisc,edu)
- Molly Lewis (U Chicago/UW)
- Matt Cooper Borkenhagen, graduate student (Slack: mcooperborkenhagen; firstname.lastname@example.org)
- Ellen Converse, project lead/RA (Slack: Ellen Converse)
- Summary data from the gender norming on mturk
- By-word and by-book analyses of genderedness of our corpus
- Semantic space estimates for gender
- Color distributions and genderedness of book covers
- Character gender analyses
- Cluster solutions for normed words
Documents and Procedures
The following resources point to web-based documentation of procedures and information pertaining to the project. If you have difficulty accessing any of these resources, notify Matt and he can grant you access. If any links are broken, let the leadership team on the project know.
- The parent folder for the project can be found here
- The folder specific to CDTC files is here
- For acquiring and constructing the child-directed text corpus, follow this link
- Here is the spreadsheet that outlines the titles we have targeted and their status
- Montag et al. (2015) from Psych Science
- Scott et al. on the Glasgow Norms
- Hudson Kam & Mathewson (2016) in J. Child Lang.