HelpSubscriptionsFeedbackSign In

dEbates: Submit a response to this article
 
Download to Citation Manager
Alert me when:
new articles cite this article
 
Search for similar articles in:
  Science Online
  PubMed
Search Medline for articles by:
Seidenberg, M. S. || Marcus;, G. F.

Do Infants Learn Grammar with Algebra or Statistics?

The report "Rule learning by seven-month-old infants" by G. F. Marcus et al. (1 Jan., p. 77) adds to a growing body of evidence concerning the remarkable learning abilities of infants. This evidence indicates that children acquire much more knowledge of language from experience than one might assume (1). However, the conclusion by Marcus et al. that the infants had learned rules rather than merely statistical regularities is unwarranted.

In the experiments in the report by Marcus et al., infants were familiarized with sequences of syllables that conformed to patterns such as ABB or AAB (for example, "wo fe fe" versus "wo wo fe"). They were then tested on sequences containing different syllables that either matched these patterns or not. Infants preferred (2) novel sequences that violated the pattern to which they had been pre-exposed, and so were said to have learned the rule governing the sequences' "grammar." This conclusion rests on the fact that the test sequences contained novel syllables; thus, the infants could not have learned anything about their statistical properties. However, these "grammatical rules" created other statistical regularities. AAB, for example, indicated that a syllable would be followed by another instance of the same syllable and then a different syllable. Thus, in the pretraining phase, the infant was exposed to a statistical regularity governing sequences of perceptually similar and different events. The report's discussion focused on what the infants could learn about the particular syllables used in training, but there is no reason to deny these infants the capacity to learn these same-different contingencies.


Figure 1
"Wo fe fe" or "wo wo fe"?

CREDIT: GARY F. MARCUS


There is also no reason to deny connectionist neural network models for this capacity. In our view, the goal of modeling is to understand children's behavior by endowing networks with the same capacities and experiences as children. The networks that Marcus et al. studied were not provided with either, so it is not unexpected that they behaved differently. A 7-month-old child has already developed a rich representation of the structure of acoustic and speech events on the basis of several thousand hours of exposure to examples, including the "novel" test syllables. In the model used by Marcus et al., in contrast, there was no knowledge of the structure of utterances, no exposure to these syllables, and no way to represent phonological similarity.

A model with the same kinds of capacities and experiences as infants will perform in a similar manner. To demonstrate this, we implemented a simple model (3), which is not a general account of all aspects of the phenomena, but serves to illustrate that the limitations that Marcus et al. described are not intrinsic to all connectionist models.

Rather than showing that rule learning is "there from the start" (4), the findings in Marcus et al.'s report indicate that infants are able to encode multiple types of statistical regularities. This feat places them squarely on the path toward acquiring a central aspect of the adult's linguistic competence (5).

Mark S. Seidenberg
Neuroscience Program,
University of Southern California,
Los Angeles, CA 90089-2520, USA.
E-mail: marks@gizmo.usc.edu

Jeff L. Elman
Department of Cognitive Science,
University of California, San Diego,
La Jolla, CA 92093-0515, USA

References and Notes
  1. N. Chomsky, Knowledge of Language (Praeger, New York, 1986).
  2. As described on page 78 of the report, preference was indicated by an infant "looking longer at the flashing side light during presentations of [novel] sentences."
  3. Discussion and model are at crl.ucsd.edu/~elman/Papers/MVRVsim.html .
  4. As stated in the Perspective "Out of the minds of babes" (S. Pinker, p. 40) that accompanied the report.
  5. M. S. Seidenberg, Science 275, 1599 (1997); J. L. Elman et al., Rethinking Innateness (MIT Press, Cambridge, MA, 1996).

Marcus et al. report that 7-month-old infants learned language tasks that required rule learning. They also state that these tasks are not learnable by statistical algorithms, including simple recurrent networks (SRNs). This statement is not correct.

After noting that some stimuli could be learned statistically (such as those used in experiment 1 in the report), Marcus et al. used a refined phoneme set (for their experiments 2 and 3). The fact that they assumed that the refined phoneme set was not statistically learnable indicates that their experimental paradigm was based on binary feature representations. For instance, vowel height (1, 2) would be represented by two features, +/-high and +/-low. However, if one adopts a continuous vowel height as in the cardinal vowel scale (English low, middle, and high vowels would be represented by 0.00, 0.67, and 1.00), statistical algorithms can accomplish the learning (3).

I conducted computer simulations with the use of a variant of SRN with continuous vowel height and place of articulation (POA) (3, 4). In all cases, as expected, the network made larger prediction errors with the inconsistent sentences (3). These results suggest that the report's experimental design does not exclude the possibility that children used a statistical learning strategy. I agree with Marcus et al. that standard SRNs cannot generalize learned rules to novel independent features; however, SRNs can apply learned mappings [for example, f (x, y) = x] to novel real values.

Michiro Negishi
Department of Cognitive and Neural Systems,
Boston University,
Boston, MA 02215, USA.
E-mail: negishi@cns.bu.edu

References and Notes
  1. Vowel height is the index of the vertical position of the tongue body with respect to the roof of the mouth: /i/ as in "bee" is a high vowel, whereas /a/ as in "Sam" is a low vowel (2).
  2. H. J. Giegerich, English Phonology: An Introduction (Cambridge Univ. Press, Cambridge, 1992).
  3. Examples, simulations, and results are at cns-web.bu.edu/pub/mnx/sci.html.
  4. The rationale for my using continuous POA, which helps the learning in experiment 1 in the report, but not in experiments 2 and 3, comes from sonority scale [section 6.2 in (2)] and the distribution of ejectives and implosives [J. Greenberg, Int. J. Am. Linguist. 36, 123 (1970)].

Marcus et al. propose that 7-month-old infants, when listening to speech, can extract abstract algebraic rules "that represent relationships between placeholders...such as 'the first item X is the same as the third item Y'...." Marcus et al. refer to an earlier report, "Statistical learning by 8-month-old infants" (13 Dec. 1996, p. 1926), in which Saffran et al. showed that infants of like age abstracted statistical relationships from speech in order to segment words. These two reports ascribe to infants (among other cognitive achievements) two powerful means to acquire language: associative and rule-learning procedures. However, the evidence for algebraic rule learning in the report by Marcus et al. is open to serious question.

Marcus et al. state (note 18 in the report), "In principle, an infant who paid attention only to the final two syllables [words] of each sentence could distinguish the AAB grammar from the ABB grammar purely on the basis of reduplication...." We would add that this is a strong possibility, in that syllables were separated by 250-millisecond pauses and each three-syllable sentence was separated by a 1-second pause. Moreover, there is evidence that 7-month-old infants can discriminate objects by means of the abstract relations, same or different (1). Marcus et al. then state, in note 18, "but [the infants] could not have succeeded in the experiment of Saffran et al." in demonstrating "word" segmentation if they had been using a strategy of reduplication. Consequently, Marcus et al. apparently did not explore or eliminate this possibility in their own studies of rule learning. This comparison, however, is highly problematic--there are important differences in procedural details between these two studies. Saffran et al. presented their infants with frequently repeated, randomly ordered sequences of four trisyllabic "words." There was, moreover, no pause between syllables or between words, and the syllables were coarticulated, making it highly unlikely, and perhaps impossible, that only the final two syllables of each word were perceived as the final two syllables of each sentence, as they might have been in the studies of Marcus et al.

A control study of the following nature is needed to begin to eliminate the strategy of reduplication as one that infants could be using: familiarize infants with an AAB sentence format and test with new sentences with BAB and AAB formats. If there is a preference for the novel format despite the unchanging arrangement of the final two syllables, as would be expected had infants acquired an algebraic rule, there would then be support for the conclusion made by Marcus et al. Until such control experiments are performed, we cannot conclude that infants at the age at which word segmentation has been evidenced are also able to acquire an algebraic rule.

Peter D. Eimas
Department of Cognitive and Linguistic Science,
Brown University,
Providence, RI 02912, USA.
E-mail: peter-eimas@brown.edu

References
  1. D. J. Tyrrell, L. B. Stauffer, L. B. Snowman, Infant Behav. Dev. 14, 125 (1991).

Response
Eimas suggests an additional control to rule out the possibility that infants could have relied only on the final two syllables. Although we maintain that such a control could bear only on the question of which rules an infant can learn, rather than the question of whether an infants could learn rules (because the generalization of identity itself requires a rule that holds for all instances in a class), we are grateful for the suggestion. We have now run that control, and the results (1) are consistent with our previous findings.

The other two letters state that various modifications of the simple recurrent network can handle our results, but no such network provides a genuine, empirically adequate alternative to our proposal. Seidenberg and Elman present a model that can capture our data, but only by resorting to a technique that Elman has criticized elsewhere (2): the incorporation of an all-knowing "external teacher" that provides the network with information that is not otherwise available in the environment. As we noted in our report, and as Negishi acknowledges in his letter, the standard version of the simple recurrent network--which uses a "predication task" that does not depend on information that is not directly available in the environment--does not succeed in generalizing our ABA or ABB patterns to novel words (3). Seidenberg and Elman appear to abandon (without comment) the usual "predication task" version of the network model in favor of a different kind of model, in which an external teacher decides whether each pair of successive words is identical. Such information is not "directly observable from the environment" (4); instead, it is provided by an external teacher (built by Seidenberg and Elman) that itself builds in an algebraic rule. Because, in the human, that external device must be something inside the child rather than something provided by the environment, Seidenberg and Elman have not gotten rid of the rule; they have simply hidden it (5).

We find Negishi's model to be more interesting. Negishi points out, quite rightly, that an SRN that uses real numbers rather than binary encoding can capture our results. Why should that be the case? As we noted in our report, "algebraic" rules are "open-ended abstract relationships for which we can substitute arbitrary items." Models that use real-number encoding use their nodes as variables and incorporate operations that treat all instances of a given variable equally. In other words, rather then presenting an alternative to rules, such devices wind up implementing them (6).

This is a subtle point, perhaps best understood in a comparison (3) between two models, one that represents numbers as sets of discrete binary features, and another that represents numbers as analog values, such as the identity function mentioned by Negishi, f(x) = x. Neither architecture is inherently superior: Models that represent inputs as sets of nonarbitrary discrete features can capture transitional probabilities between words such as would be present in the experiments in the 1996 report by Saffran et al., but cannot freely generalize the identity relationships that underlie our studies; models that use nodes as registers can freely generalize identity relationships, but cannot capture the transitional probabilities between words that underlie the experiments in that report. In some broad sense, both architectures might be characterized as "statistical," but the two architectures are suited to different problems.

Our results, in tandem with those of Saffran et al., suggest that infants are capable of discerning both rules and transitional probabilities. As we said in our report (note 24), we aimed "not to deny the importance of neural networks but rather to try to characterize what properties the right sort of neural network architecture must have."

Gary F. Marcus
Department of Psychology,
New York University,
New York, NY 10003, USA.
E-mail: gary.marcus@nyu.edu

References and Notes
  1. In the control experiment, we trained eight 7-month-old infants on sentences from a BAB or an AAB grammar and tested on BAB and AAB sentences made up of novel words. Seven of the eight infants looked longer at the inconsistent sentences than at the consistent sentences.
  2. J. L. Elman, in Mind as Motion: Explorations in the Dynamics of Cognition, R. F. Port, and T. v. Gelder, Eds. (MIT Press, Cambridge, MA, 1995), pp. 195-223.
  3. Discussion, examples, and models at psych.nyu.edu/~gary/science/discussion.html.
  4. The model was also given "negative evidence"; that is, in the habituation phase, the model was told not only which sentences are ABB sentences (positive evidence), but also which sentences were not (negative evidence). In contrast, the infants in our experiment were given only positive evidence, and not exposed to examples of "ungrammatical patterns." Our experiment, but not the Elman-Seidenberg model, is consistent with the assumption that children are able to learn grammar without negative evidence [R. W. Brown and C. Hanlon, in Cognition and the Development of Language, R. Hayes, Ed. (Wiley, New York, 1970); J. L. Morgan and L. L. Travis, J. Child Lang. 16, 531 (1989); G. F. Marcus, Cognition 46, 53 (1993)].
  5. Seidenberg and Elman appear to use the term "statistics" to refer to regularity, thus counting rules as statistical regularities. Weakening the terminology in this way does not take away from our point that infants can learn rules.
  6. A recent paper of ours, cited in note 22 in our report, made this point explicitly (7, p. 275): "While most networks represent inputs by pattern of activation across sets of nodes, in principle one could use a single node to represent all possible inputs, assigning each possible input to some real number...incorporating what is a transparent implementation of a register....The node in question would represent a variable; its value would represent the instantiation of that variable." For two other examples of neural network architectures that explicitly implement relationships between variables and that could capture our findings without a hidden teacher or negative evidence, see K. J. Holyoak and J. E. Hummel, in Cognitive Dynamics: Conceptual Change in Humans and Machines, E. Deitrich and A. Markman, Eds. (Erlbaum, Mahwah, NJ, 1999) and L. Shastri and V. Ajjanagadde, Behav. Brain Sci. 16, 417 (1993). For further discussion, see (7) and G. F. Marcus, The Algebraic Mind (MIT Press, Cambridge, MA, in press).
  7. G. F. Marcus, Cognit. Psychol. 37, 243 (1998).

dEbates: Submit a response to this article
 
Download to Citation Manager
Alert me when:
new articles cite this article
 
Search for similar articles in:
  Science Online
  PubMed
Search Medline for articles by:
Seidenberg, M. S. || Marcus;, G. F.

Volume 284, Number 5413, Issue of 16 Apr 1999, p. 433.
Copyright © 1999 by The American Association for the Advancement of Science. All rights reserved.