About this Issue

If you think you’re so smart, then why don’t you know what intelligence is? Because no one does! Is intelligence a unitary, general factor — the psychometrician’s famed g — or is it more plural and fragmented? What role do genes play in determining IQ? The environment? If intelligence is in the genes, then why do IQ scores continue to rise generation after generation all over the world? Are we actually getting smarter, or are we just getting better at taking tests? While these questions may seem recondite and academic, they are in fact central to ongoing, extremely heated controversies pertaining to education, welfare, and immigration policy. Which is why we have assembled a stellar panel of intelligence experts to delve into the IQ conundrum.

Leading off this month is James R. Flynn, discoverer of the famed “Flynn effect” and author of the new book What Is Intelligence? Beyond the Flynn Effect. Commenting on Flynn’s rich essay we have Linda Gottfredson, co-director of the Delaware-Johns Hopkins Project for the Study of Intelligence and Society; Stephen J. Ceci,the Helen L. Carr Professor of Developmental Psychology at Cornell University, and Eric Turkheimer, professor of psychology at the University of Virginia.

Lead Essay

Shattering Intelligence: Implications for Education and Interventions

The concept of a general intelligence or g factor has proved enormously fruitful in two respects. On the level of individual differences, it captures the fact that if one person outperforms another on one kind of conceptually demanding task, that advantage is likely to persist over a whole range of other cognitive tasks. On the level of group differences, we find that the average Full Scale IQ of two groups on a good IQ test often predicts things like their occupational profiles. Various occupations have minimum IQ thresholds. If 50 percent of group A scores above say an IQ of 100, while only 16 percent of Group B do, then Group A will have a three to one ratio in its favor in terms of the proportion of its members who are professionals or technicians or managers.

An example of a good IQ test is the WISC (Wechsler Intelligence Scale for Children). The reason it is good is that its ten subtests have enough cognitive complexity so that a high IQ person tends to beat the average person by a handy margin on all ten. That is equivalent to saying that it is a good measure of g. A test that included subtests of minimal cognitive complexity, let us say tying your shoes, would be a bad IQ test. The task is so simple that unintelligent people would perform it as well as intelligent people.

Shattering general intelligence

Despite all the triumphs of the concept of general intelligence, I believe intelligence is like the atom: you have to know both why its parts cohere and why they sometimes fly apart. Americans made massive IQ gains on the WISC between 1947 and 2002 amounting to almost 18 points of Full Scale IQ. These gains ranged from only 2 points on the WISC subtest called Information to 24 points on the subtest called Similarites (what do dogs and rabbits have in common?), despite the fact that both have the cognitive complexity that makes them good measures of g.

A bright person tends to accumulate more general information than a dull person at any given time and also tends to better at classifying things (will say that dogs and rabbits are both mammals). But over time, we find that society can develop these conceptual skills quite independently of one another. Children may progress a lot over time in putting on scientific spectacles, which means that many more of them will say “both mammals” rather than say something like “I use my dog to hunt rabbits.” While thanks to the rise of a visual culture that discourages reading, the average child today may have no greater store of general information than children did 55 years ago.

Diagnosing how conceptual skills actually develop

The fact that various conceptual skills develop so independently over time has wide implications for education. At any given time, a person that beats another at arithmetical reasoning is likely also to do better on a test called Raven’s Progressive Matrices. You are given a pattern of shapes that make up a square and you have to see the logical pattern across the rows and down the columns. If row one across is X O O O. and row two across is X X O O, then you must deduce that row three should be X X X O. Because g indicates that the skills go together, American schools since 1990 have been teaching children Matrices problems on the theory that it will develop their mathematical skills. But cognitive gains over time second-guess g: between 1947 and 2002, gains on the Arithmetic subtest of the WISC were only 4 points and gains on Raven’s were about 27.5 points. Clearly there is no functional relationship here. After all, there is a functional relationship between having a good vocabulary and reading serious literature, so if the ability to do the latter increased over time, the former would have to increase.

Thanks to the illusion fostered by g, the schools are wasting their time. I suspect that only for mathematicians is math a logical enterprise where all is proven and linked by logic. For 99 percent of us, it is a strange separate world in which very different rules hold than in everyday life. Therefore, we should try to isolate the fundamental concepts of that world, like measurement from a zero point, how numbers are created, why things are equivalent across an = sign, and so forth, and slowly get children to feel comfortable in the world of mathematics.

Once you break intelligence down into its autonomous components, many things become clear. For example, the Nation’s Report Card shows that today’s children are ahead of their parents in reading at early ages and then the gains fade away by the age of 17. How is that possible? The children are doing much better on heavily g loaded IQ tests like the WISC at all ages. Should not brighter people be able to read adult novels better?

This mystery is solved when you look at IQ trends over time. Since 1972 (when the NRC began), the big IQ gains have been on certain subtests and not others. There have been virtually no gains in vocabulary and information. You cannot enjoy War and Peace very much if you have to run to the dictionary or encyclopedia every other paragraph. We are doing a better job of teaching children the mechanics of reading at early ages. But their parents had mastered the mechanics by age 17 and at that age, neither generation has an information or vocabulary advantage. So we have made no progress is teaching young people how to enjoy adult literature.

A blinding obsession

The response of many g-men to IQ gains over time is to say: “You grant that gains on various subtests do not correspond to how well each subtest measures g. Well, that shows that the gains are hollow, that is, since they are not g gains they are not real intelligence gains and lack significance.” As you can see, this is just a way of saying that if all complex cognitive skills do not move together — that if they have different trends — then the trends just cannot be significant. This is the saddest result of the obsession with g: it makes the limitations of the concept no longer a matter of evidence. Any evidence that challenges the supremacy of g is not good evidence because it challenges the supremacy of g and that is that.

Note that we would not reason in this way in other areas. There is a musical g in the sense that whoever is better than me on the piano will probably outdo me on the organ. But skills could improve on one and not the other, and that would be of great significance to the world of music. There is a moral g in the sense that good people tend to be both more tolerant and more generous than the average. But over time, white Americans may have tended to become more tolerant of other races and no more generous in giving to charities. No black American would say that unless all of the components of moral g moved together, the trends were not significant.

The transience of intelligence

General intelligence or g has something to do with brain quality, and good genes have a lot to do with having an above average brain. Therefore, there was a tendency in differential psychology to think that our genes-determined brain accompanies us throughout our lives and that environment, except in extreme conditions (living with wolves since infancy), merely leaves minor imprints on that brain. After all, twin studies showed that even when identical twins were separated at birth, they had IQs at adulthood that were much more similar than the IQs of randomly selected people. What better evidence did you need that genes/brain went though life and environment just did a bit of tinkering along the way?

But this created a dilemma: if genes were so dominant, how could IQ gains over time be so huge? Unless you thought that there was a large genetic upgrading from one generation to the next, large intelligence gains should be impossible. Yet they occurred, which implied that there were environmental factors of huge potency. How could environment be both so feeble and so potent?

The Dickens/Flynn model resolved this dilemma. Two twins raised apart, thanks to having slightly better genes than average, would both get into increasingly privileged environments. Both would get more teacher attention, would be encouraged to do more homework, would get into a top stream, and by adulthood, they would both be far above average. Moreover, thanks to their identical genes, their environmental histories would be very similar. Their identical genes were getting all of the credit for the combination of identical genes plus nearly identical enriching environmental factors! The environmental factors were not feeble at all: they just tended to be similar for identical twins when raised apart, which made them look feeble.

This means a huge shift in perspective. The g-man view was that environment made little difference throughout life because environment makes very little difference at any point in life. The Dickens/Flynn view is that environment makes a lot of difference, which meant we had to look elsewhere for why its effects seem so transient. Our conclusion was that present environment swamps past environment in terms of effect on your level of cognitive functioning.

Cognitive exercise

The first implication of the new perspective is the benefit of persisting in cognitive exercise throughout life. There is the dramatic case of Richard Wetherill. He played chess in retirement and could think eight moves ahead. In 2001, he was alarmed because he could only think four moves ahead but he continued an active mental life until his death in 2003. Autopsy showed that his brain was riddled with the plaques and tangles that are characteristic of Alzheimer’s. Most people would have been reduced to a state of total confusion. This does not mean that cognitive abilities fail to decline with age. After all, at any given age, an athlete is better off for training. But however hard you train, your times will get slower as you age.

The brain is much more like our muscles than we had thought, even in the sense that specialized exercise affects different parts of the brain. Autopsies show that the brains of London taxi-drivers are peculiar. They have an enlarged hippocampus, which is the brain area used for navigating three-dimensional space. Here we see spatial abilities being developed without comparable development of other cognitive skills. To develop a wide variety of cognitive skills you need a wide variety of cognitive exercises.

Interventions

Interventions that may enhance IQ include the efforts of parents, programs that afford an enriched environment to children at risk, adoption, and university study. Beginning with the family, the Dickens/Flynn model posits a tug of war between two environments: the environment parents impose, which is not directly correlated with the child’s unique genetic endowment; and the environment the child creates by interacting with the world, which does tend to match the child’s unique genetic endowment. With each passing year, a child transcends parental influence and becomes an autonomous actor. Parents cannot prevent their child from rebelling against a teacher with whom there is little rapport or getting in with the wrong crowd.

Preschool interventions also impose an environment on children that is uncorrelated with their genes, usually a uniformly enriched one that includes stimulation through educational toys, books, contact with sub-professionals, and so forth. If these terminate as children enter school, the intervention is likely to lose the tug of war even earlier than a child’s parent do. After all, the parents retain some influence after the preschool years. Since the imposed environment was far more enriched than any available at school, the children will begin to match environments that get further and further below its quality.

The most radical form of environmental intervention is adoption into a privileged home. Adoptive parents often wonder why the adopted child loses ground on their natural children. If their own children inherit elite genes and the adopted child has average genes, then as parents slowly lose the ability to impose an equally enriched environment on both, the individual differences in genes begin to dominate.

Finally, note that university education is a partial attempt to impose an enriched environment on students regardless of their genetic differences – that is, it constitutes a quasi-environmental intervention on the adult level. It too will see its effects on IQ fade unless quality of environment is maintained, for example, unless thanks to a good university education a student of average ability qualifies for a cognitively demanding profession. Then the job takes over the university’s role of imposing duties that foster the intellect.

Note that none of this means that interventions have no lasting effect; it is just that their non-IQ effects are likely to be more permanent than their IQ effects. If parents encourage persistence, honesty, and self-esteem, their children have a good start in life that may prove far more important than their gaining a few jumps on the IQ hierarchy. Similar characterological enhancement has been claimed for intervention programs like Head Start. Contacts made at a good university may confer an enhanced income and socioeconomic status throughout life.

These comments about interventions may seem to imply that no one can really hope to improve on his or her genetic endowment. This pessimism is no more in order than pessimism about whether people can improve on their physical endowment for running. To do so, you must either have good luck or make your own luck. Either a happy chain of circumstances will force you to train throughout your life or you can develop a love for running and train without compulsion. Training will not override genes entirely, of course. There are runners I cannot beat even when I train more than they do. But I can run rings around every couch potato within 20 years of my age.

There is one way in which individuals can make their own luck. Internalize the goal of seeking challenging cognitive environments — seek intellectual challenges all the way from choosing the right leisure activities to wanting to marry someone who is intellectually stimulating. The best chance of enjoying enhanced cognitive skills is to fall in love with ideas, or intelligent conversation, or intelligent books, or some intellectual pursuit. If I do that, I create within my own mind a stimulating mental environment that accompanies me wherever I go. Then I am relatively free of needing good luck to enjoy a rich cognitive environment. I have constant and instant access to a portable gymnasium that exercises the mind. Books and ideas and analyzing things are possessions easier to access than even the local gym.

If only we who teach could make more of our “subjects” fall in love with ideas. Then we would have truly effective interventions.

Three levels and three concepts

All of this has implications for the theory of intelligence. There is nothing really the matter with the concept of g; it is just that we have misused it by making it the omnipresent concept in our study of cognitive abilities. Intelligence is important on three levels, namely, brain physiology, individual differences, and social trends (collectively, BIDS). The core of a BIDS approach to intelligence is that each of those levels has its own organizing concept, and it is a mistake to impose the architectonic concept of one level on another. We have to realize that intelligence can act like a highly correlated set of abilities on one level and act like a set of functionally independent abilities on other levels.

Take the brain. Highly localized neural clusters are developed differentially as a result of specialized cognitive exercise, but there are also important factors that affect all neural clusters such as blood supply, dopamine as a substance that renders synapses receptive to registering experience, and the input of the stress-response system. When we map the brain’s structure, we find a mixture of commonality and neural decentralization. The commonality may well give rise to g on the individual differences level, while the decentralization leads to the phenomenon of various cognitive skills developing independently over time.

As for individual differences, that is the proper kingdom of g. There is simply no doubt that performance differences between individuals on a wide variety of cognitive tasks are correlated primarily in terms of the cognitive complexity of the task or the posited cognitive complexity of the path toward mastery. However, we need to avoid the mistake of thinking that the interaction between genes and environment is less complex than the reality.

On the social level, it is also beyond doubt that various real-world cognitive skills show different trends over time as a result of shifting social priorities. The appropriate dominant concept on this level is not g but something like social utility.

In closing, I want to stress that the BIDS approach does not aim at the abolition of g. It merely endorses a separation of powers that gives each dominant construct the potency needed to rebuff the other two. The U.S. Constitution attempts to make the President, Congress, and Supreme Court dominant in the executive, legislative, and judicial areas, respectively. I want the same kind of separation of powers for the three levels of intelligence.

James R. Flynn is emeritus professor of political studies at the University of Otago in New Zealand and author of What is Intelligence: Beyond the Flynn Effect (Cambridge 2007), from which this essay is adapted.

Response Essays

Shattering Logic to Explain the Flynn Effect

Flynn brought world attention to the intriguing fact that IQ test scores rose steadily and rather dramatically throughout much of the Twentieth Century, at least in those countries for which we have good data. Years back, he interpreted such inexplicable increases as evidence that IQ tests must surely be flawed. Now he seems to accept unquestioningly their power to capture changes in human intelligence over the sweep of time. He has become the ultimate IQ-man.

The irony is that intelligence researchers themselves, Flynn’s “g-men,” do not accept IQ scores at face value. Unlike Flynn, they have no interest in debating the proper verbal definition of intelligence, but rather seek to understand a major discovery of the last century: the g factor. g refers to the continuum of differences among individuals in their general capacity to learn and reason, almost regardless of task content and context. IQ tests measure g well, and all mental tests measure mostly g, whatever their content. Only at the psychological-behavioral level is g unitary, however, and various disciplines are currently probing its multiple roots in genes and environments, its physiological manifestations in the brain, and its impact on the lives of individuals and nations (Gottfredson, 1997). If g-men are “obsessed,” it is with getting to the bottom of the phenomenon that is g. IQ tests are merely one tool in that endeavor.

Flynn’s Story

Flynn’s new book, What Is Intelligence?, details more fully the tale he sketches here at Cato Unbound. He first recounts how he discovered that performance on IQ tests was rising each decade despite the high heritability of IQ and then how, according to his account, he and William Dickens have resolved this most baffling mystery ever to confront intelligence researchers. He reports that he succeeded only by overthrowing the “conceptual imperialism” of g, which still leads g-men to deny all facts that threaten its hegemony. Once free of their “blinding obsession,” all became clear to him.

In his explanation, the industrial and scientific revolutions set in motion self-perpetuating feedback loops by which human intelligence not only ratcheted itself up, but also enlisted the power of our genes to do so. What many of us mistake as physiology or genetics at work is actually the imprint of shifting cultural priorities. Although recent generations do little better on IQ subtests such as Vocabulary, Arithmetic Reasoning, and General Information, mankind’s donning of “scientific spectacles” has enabled it to answer many more Raven’s Matrices and Similarities items than did earlier generations.

Flynn argues that we need not leave future advancement in human intelligence to chance: “Interventions that may enhance IQ include the efforts of parents, programs that afford an enriched environment to children at risk, adoption, and university study.” Readers might be perplexed how his novel insights point us back to old interventions already known not to raise IQ. He suggests that such socio-educational enrichment might work if “imposed” on us throughout our lives, regardless of our genetic differences. More self-directed individuals can create their own luck by “falling in love with ideas,” thus providing themselves constant access to a “portable gymnasium that exercises the mind.”

Flynn’s Argument

The chief riddle posed by the Flynn Effect is this: How can something so heritable as IQ change so quickly from one generation to the next? To my mind, this paradox signals that we have yet to learn something fundamental about intelligence or current measures of it. Although Flynn does not discuss the matter, there is no evidence that g itself has increased, let alone by strictly cultural factors. He can make his case for the latter only by denying that the empirical phenomenon of g is relevant, specifically, by seeming to reduce it to a collection of independent components for which he can generate separate explanations. Only in this way can he neuter the incontrovertible evidence for g’s existence as a highly replicable empirical phenomenon, its correlations with many aspects of brain physiology, the distributed nature of g-related brain activity, and the strong genetic basis of both g and brain physiology (Gottfredson, 1997; Jung & Haier, 2007)—all of which undercut a strictly cultural explanation for rising IQ scores.

Flynn (2007) makes his case mostly by appeal to analogy (usually sports), that which is “undoubtedly” true (an historical shift from pre- to post-scientific thinking caused an advance from concrete to formal thinking; p. 32), selected bits of evidence of uncertain relevance or accuracy (the brain “autopsies” for elite London cabbies, which were actually MRIs while they were alive), and a confusion of assumptions and metaphors. The g factor is “shattered” like an “atom” to let different cognitive skills “swim free.” An “imperialistic” g (pp. 55ff ) must be restricted to its “proper kingdom” by maintaining a “separation of powers” between the physiological, individual differences, and social levels of intelligence, thereby “giving each dominant construct the potency needed to rebuff the other two”—yet allowing “cross-fertilization” among them. A personified brain is similarly said to “unravel g into its component parts” in order to “pick and choose from the bundle of cognitive abilities wrapped up together by g” (p. 66). Without any empirical referents from him, I don’t know what such claims really mean

However, with g seemingly now dispersed into “separate components” at both the psychometric and physiological levels, all components at both levels now seem separately malleable: for example, cognitive skills on different IQ subtests may be differentially affected by shifting cultural priorities, and various parts of the brain can be subjected to “cognitive exercises” of different sorts, such as driving around London. Disconnected from their common core, g, these narrower cognitive “skills” can be examined without regard to the vast interlocking network of evidence implicating a cross-domain intelligence of great practical value in the social realm.

Flynn rules out biological influences on brain physiology for explaining rising IQs by appealing to the very sorts of evidence that would seem to confirm their importance. Specifically, he eliminates nutrition as a possible cause of rising IQ test performance by noting that the trends for height do not seem consistent, in his view, with the disproportionate gains in IQ in some countries at the lower end of the IQ distribution. However, the very fact that height and other biological traits have changed in tandem with overall increases in IQ in many countries would seem to exclude the strictly cultural explanation that Flynn favors, no matter how fecund the “social multipliers” that he and William Dickens postulate (Mingroni, 2007, p. 812 ).

Flynn’s Fallacies

With characteristic understatement, Flynn says that everything became clear to him when he awoke from “the spell of g” (pp. 41-42). The reader, feeling afloat in a rolling sea of images and warm words, might ask whether he succeeds only by loosing himself from the bonds of evidence and logic. More troubling, his core argument rests on logical fallacies that profoundly misinterpret the evidence. I describe three below. To be fair, they are among the common fallacies bedeviling debates over intelligence testing, and most reflect a failure to appreciate the inherent limitations of psychological tests, including tests of intelligence (Gottfredson, in press).

Averages vs. correlations

Taller people tend to weigh more; that is, height and weight are correlated. If everyone gained 10 pounds, this average gain would have no effect whatsoever on the correlation between height and weight. Taller people would still tend to be heavier. Likewise, the fact that average scores on the Similarities subtest have risen over time but average scores on Vocabulary and Arithmetic Reasoning have not says nothing about whether the correlation between them has changed. In fact, it remains very high. The case for having “shattered” g rests precisely on this confusion, however. The g factor is derived, via factor analysis from the correlations among subtests. Averages do not affect the calculation of correlations. A subtest’s g loading is simply its correlation with the common factor, g, extracted from such correlations. It is an interesting empirical fact that demographic groups (e.g., ages, races, nationalities) yield the same g despite often very different average levels of performance (number of items answered correctly).

I agree with Flynn that it is intriguing that subtest averages have not changed in tandem with their g loadings. If g itself were rising over time, one would expect the most g loaded tests to show the largest increases in raw scores. Because g constitutes the core of all mental abilities, one could construe these contrary results as evidence that it is not g that is not increasing, but perhaps one of the subsidiary factors captured by IQ tests but independent of g (e.g., see Carroll’s, 1993, the 3-stratum hierarchical model, which illustrates how abilities differ primarily in their empirically-determined generality of application across task domains).

Relative vs. absolute levels of ability

IQ tests are excellent measures of relative differences in a general proficiency to learn and reason, or g. But it is important to understand that they do so by providing deviation scores. That is, IQ scores are calculated relative to the average number of items answered correctly by everyone in one’s age group (the scores being transformed to have a mean of 100 and standard deviation of 15 or 16 for ease of use). Untransformed raw scores (numbers of items answered correctly) have no meaning by themselves, nor does the average difference between any two sets of raw scores. The best we can do, which Flynn does admirably, is to plug cross-generation differences in raw scores into the formula for calculating deviation IQs for the current generation. As noted above, however, we do not know whether the transported points represent an increase in g rather than something else.

In his book, Flynn thinks it pointless to continue research on elementary cognitive tasks (e.g., reaction time tests, such as with Jensen’s “button box”). But such tests may provide our first opportunity to measure g in absolute terms (on a ratio scale; Jensen, 2005). Performance on reaction time tests is scored in milliseconds. Unlike IQ scores, time has a zero point and equal-size units. Ratio-level measurement would finally allow us to chart patterns and rates of cognitive growth and decline over the life course as well as over decades. The Flynn Effect might have been explained long ago had we historical data of this sort.

Measure vs. the construct being measured.

No one would mistake a thermometer for heat, nor try to glean its properties from the device’s superficial appearance. Nor should one do so with IQ tests. But people often confuse the yardstick (IQ scores) with the construct (g) actually measured. The manifest content of ability tests items provides no guide to the ability constructs they actually succeed in measuring. The active ingredient in tests of intelligence is the complexity of their items, and it is also the ingredient—“processing complexity”—in functional literacy tasks that makes some more difficult than others (more abstract, more distracting information, require inferences, etc.). To oversimplify only a bit, as long as two tests have similar g loadings, both will predict the same achievement equally well (or poorly), no matter how different their content might seem (Gottfredson, 2002).

Flynn’s peering into the tea leaves of subtest items is a species of the old specificity hypothesis in personnel selection psychology, which held that each ability test measures a different ability and that different jobs and school subjects call upon quite different abilities. For example, it was once received wisdom (but mistaken) that tests of verbal ability would predict reading but not math achievement, whereas tests of arithmetic reasoning would do the reverse. Specificity theory was falsified decades ago, as can be seen in the large literature on “validity generalization” in employment testing. Professor Flynn may believe that the Similarities subtest measures the ability “to classify” and that Vocabulary assays a different cognitive “skill,” but he needs to provide evidence and not mere belief. Belief did not smash the atom. Belief cannot explain the Flynn Effect.

References

Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge, UK: Cambridge University Press.

Gottfredson, L. S. (Ed.) (1997). Intelligence and social policy (special issue). Intelligence, 24(1), 1-320.

Gottfredson, L. S. (2002). g: Highly general and highly practical. Pages 331-380 in R. J. Sternberg & E. L. Grigorenko (Eds.), The general factor of intelligence: How general is it? Mahwah, NJ: Erlbaum.

Gottfredson, L. S. (in press). Logical fallacies used to impugn intelligence testing. In R. Phelps (Ed.), Anti-testing fallacies. Washington, DC: American Psychological Association.

Jung R. E., & Haier, R.J. (2007). The Parieto-Frontal Integration Theory (P-FIT) of intelligence: Converging neuroimaging evidence. Behavioral and Brain Sciences (target article), 30, 135-187.

Mingroni, M.A. (2007). Resolving the IQ paradox: Heterosis as a cause of the Flynn effect and other trends. Psychological Review, 114(3), 806-829.

Linda S. Gottfredson is Professor of Education at the University of Delaware and co-director of the Delaware-Johns Hopkins Project for the Study of Intelligence and Society.

The Fundamental Intuition

Flynn says, “There is nothing really the matter with the concept of g,” as long as one is interested in the level of individual differences in ability, as opposed to development in time within individuals or across generations. You can’t help but notice that his concession doesn’t prevent him from being ripped, in Gottfredson’s reply, for not taking g seriously enough, for not placing it at the very center of the entire discussion. It’s the price he pays for giving up too much. There is plenty, indeed practically everything, wrong with the concept of g, even in its classical context of individual differences in ability among adults at a single point in time. Explaining why requires some slightly technical concepts—bear with me.

Here is the fundamental intuition: Since at any given time tests of ability “go together,” in the sense that people who score higher on one tend, on average, to score higher on the others as well, then it must be the case that a single explanatory factor, g, must be invoked to account for their commonality. After all, if there were many abilities underlying performance on mental tests, why wouldn’t there be tests that didn’t go together with the others? The fundamental intuition states that universal positive relations among mental tests compel a single dominant explanatory construct, which has come to be called g. The fundamental intuition is wrong.

Let me try to shift your intuition a little. Suppose there were not one but two abilities underlying mental tests, call then h and i. Suppose further that these two abilities have nothing whatsoever to do with each other, that knowing your score on h tells you absolutely nothing about your score on i. Individual mental tests require various amounts of h and i. Some tests require a lot of h and a not so much i, some the reverse, some require a little of both, but tests are always positively related to whatever mix of h and i they require, and there are no tests on which untalented people do better than talented people. (I am not giving anything back to the g-men here. Positive relations between tests and abilities, whether there is one ability or many of them, is a good way to define what an ability test is, as opposed to, say, a test of attitude. Ability questions have correct answers whereas attitude questions do not, and that is why ability questions all point in the same direction.)

So anyway, we have a set of tests related variously but positively to two completely unrelated underlying abilities. What will the relations among the individual tests look like? They will all be positive. Pairs of tests that both depend highly on h or highly on i will be strongly positively related; pairs for which one depends mostly on one ability and mostly on the other will be less strongly related, but why would any pair ever be related negatively? The g-men have confused two separate statistical issues about relations among sets of tests: their much-revered positivity, and what is known in statistics as their dimensionality, which refers to the number of underlying abilities, one according to the g-men and two in my example, that is required to explain their interrelationships. The fundamental intuition is that these are one and the same issue, but they are not, in fact they have nothing at all to do with each other. Sets of all positive relations among tests can require any number of dimensions to explain them.

So the g-men want you to believe that all they need to show in order to establish the supremacy of g is that all ability tests are positively related, but that is incorrect. They need to show that interrelations among ability tests are unidimensional, and guess what? They are not. Not ever, not from the very beginning of the discussion. The giant in this field was Charles Spearman, who coined the term g and invented the statistical method called factor analysis to answer exactly this question. He started his investigation with the question of whether a single dimension could explain the positive interrelations among mental tests. It could not.

Proponents of g theory are no doubt waiting for me to explain why, if Spearman himself showed that sets of ability tests are not unidimensional, he nevertheless went on to describe g and expound the theory for the rest of his career. Can g be saved if ability is multidimensional? A description of the attempt to do so, which defines the field of intelligence between Spearman and today, requires me to get back into a few technical details.

The task of quantifying ability becomes vastly more complex if anything more than a single dimension is required to account for it. If there is only one dimension the problem is simple, because everyone can be ordered along that dimension, as if placed on a line. But if there is more than one dimension, we are not ordering them on a line but placing them on (for starters) a two-dimensional map. How do we locate people on two dimensional maps? The obvious answer is with latitude and longitude, but there is a hitch: latitude and longitude, though mathematically sufficient and perfectly convenient once you are used to them, are also arbitrary and man-made. One could define new lines that ran from northwest to southeast, or in any other direction, and they would do just as good a job of accounting mathematically for points on maps. That is why geologists are not peering at satellite photos to find great circles etched into the planet: Latitude and longitude are a useful contrivance in the interest of human convenience, not a given aspect of the natural world.

Once ability is multidimensional, g is like longitude. In a multidimensional set of interrelations among tests, one axis can be found that accounts for as much of the interrelatedness as possible, even when it is known that more dimensions are required. The g-men have defined that largest dimension as g. They haven’t discovered it, as they are fond of saying, any more than the Greenwich Meridian was discovered by the International Meridian Conference in 1884. Any set of interrelated tests has to have a largest dimension, so under this definition the existence of g is no longer a matter of empirical dispute. Rather, it has simply been defined into existence. But it has no special status. Defining multidimensional ability with a big g-factor and some number of smaller sub-factors is just one out of an infinite number of ways that ability could be aligned along dimensions.

So let’s return to Flynn. He thinks that g used to hold together, as long as our focus was on relations among tests at a single point of time, and has only come apart once he started to examine differential changes in the components of ability over time. But the coherence of g was an illusion, founded on the false intuition that positivity of relations among ability tests was sufficient evidence of unidimensionality, In fact, pace Gottfredson, it would be possible to define separate ability domains for abstract thinking and practical knowledge within a single time point, and these traits would then correspond closely to the courses of generational change that interest Flynn. Such traits would not be the correct way to divide up ability, any more than g is. They would be a plausible solution in a domain where a certain amount of indeterminacy is part of the scientific landscape, and they would be a convenient tool for studying the Flynn effect. In the same way, g is useful for many things, especially for broad-stroke prediction of outcomes like job performance. The trick is not to get hooked on any particular way of dividing up the pie, because it is a short step from there to trying to find the Greenwich Meridian at the bottom of the North Atlantic.

Actually, psychologists don’t look for lines of longitude in the seabed; they look for mental factors in the brain and genome. Flynn’s over-commitment to the reality of g leads him to be distressingly cavalier about how human ability might be represented neurologically or genetically. “General intelligence or g,” he says, “has something to do with brain quality, and good genes have a lot to do with having an above average brain.” That sounds safe enough, but wait a minute: How do we know a quality brain or a good gene when we see one? And presumably not only general intelligence but abstract reasoning ability has something to do with the brain, the environmental Flynn effect notwithstanding. When we start looking for human intelligence in the brain and the genes, what exactly should we look for? General intelligence? Specific abilities? Morality? Which way do those lines really run again?

There is nothing wrong with studying the neurology or genetics of differences in ability, but these investigations will proceed on their own neurological and genetic terms, and we should not look to them for biological vindication of the psychological expediencies that help us tame the nearly overwhelming complexity of human behavior. Literal-mindedness about the details of psychological statistics may seem harmless when the discussion is just about what goes with what and when, but history has shown us only too clearly what can happen when simplistic views of human ability make poorly informed contact with biology and genetics. I am by training a behavioral geneticist, and as such I am too well-acquainted with the ugly places oversimplified thinking about human ability and genetics can lead to let the phrase “good genes” pass without a shiver. It is best to be careful from the beginning.

Bibliographic note:

Many of the ideas discussed here were first expressed by the great psychometrician Louis Guttman (1916-1987). Responsibility for any errors of interpretation rests with me.

Eric Turkheimer is a professor of psychology at the University of Virginia.

The Significance of the Flynn Effect

The significance of Flynn’s assault on the meaning of general intelligence cannot be overstated. General intelligence, or g, is inferred to underlie performance on a battery of diverse tasks that seem to be quite dissimilar but which turn out to be moderately correlated. For example, students who score above average on vocabulary usually score above average on math, spatial ability, general information, puzzles, and comprehension. They may be much higher on some things than others (due to the operation of specific abilities) but the fact that they are so alike on such disparate tasks is seen as a manifestation of g.

For nearly 100 years psychometric researchers have been enamored with g, touting its ability to tie together myriad, seemingly unrelated, phenomena. For instance, a person’s level of g is the single best predictor of his school performance, occupational success, and a host of other outcomes. Importantly, it is a far better predictor than are specific cognitive and personality measures, and it remains substantially, if imperfectly, stable over an individual’s lifetime—from around early elementary school till old age. In one longitudinal analysis of individuals who were given IQ tests at age 11 in 1932 and retested at age 77, the corrected correlation between their two IQ scores was .74, showing substantial stability over their lifetimes.

To the extent that an intelligence test measures general intelligence–or in the jargon of psychometrics, “loads” on g– it is not only a better predictor of performance in school and in the workplace, but it is also more heritable, and it is more closely related to a number of physiological indices such as neural efficiency and brain volume. IQ tests are heavily g-loaded, including ones that seem fair to children across all cultures such as matrices that involve nonsense shapes not encountered by any children–regardless of social class or culture–in school or home. The IQ mafia has interpreted this constellation of associations as evidence that g reflects an underlying biologically-based and stable intellectual ability, rather than a specific skill learned in school or taught by parents. It (g) permeates nearly all complex tasks, and this is allegedly why IQ scores are so highly correlated with all other complex cognitive tests, such as the SATs, Civil Service exams, Armed Services Vocational Aptitude Batteries, and the GREs. The claim is that they all measure g and they all predict important life outcomes, while being highly heritable. It is but short stone’s throw to a genetic meritocracy syllogism:

• An underlying ability (called g) is needed for all forms of cognitive performance

g is manifest in any broad cognitive battery such as IQ

g is related to many types of biological markers and is highly heritable

• Large individual and group differences exist in g

• Variation in g predicts differential life outcomes

• Therefore, variation in life outcomes is at least partly rooted in biological differences in g

Putting these pieces together leads some to argue that inequality in the distribution of wealth, prestige, and educational attainment is, in part, a consequence of unequal distribution of the intellectual capacity needed for high levels of functioning. Psychologists have gathered impressive data that seem to accord with each prong of this syllogism. So when Flynn revealed massive IQ gains over the course of the 20th century, he threw a spanner into the syllogism by revealing several paradoxes. How can IQ be a test of general intelligence (g) that is biologically driven and highly heritable and yet improve so quickly—often rising dramatically within a single generation?

Putting aside whether one agrees with (1) Flynn’s own attempt to resolve paradoxes such as how large IQ gains are nevertheless compatible with high heritability estimates for IQ or (2) that the gains are actually the result of improvements in g (not everyone does, see Rushton & Jensen, 2006), the fact remains that he has shown beyond doubt that general intelligence fluctuates systematically over time and this cannot be due to our having better genes than our grandparents. Each of us gains every year approximately .3 of an IQ point (6 IQ points every twenty years), and this has been found for nearly 30 nations. It was a secret before Flynn and others made this discovery because the IQ tests were periodically re-normed and the average scores were reset to 100 even if the average person had actually scored a 106. The size of the IQ gain is smaller on tests that are more directly taught in school and home (e.g., vocabulary, arithmetic) and largest on tests that would seem unrelated to schooling (e.g., matrices, detecting similarities).

This is not what one might expect if gains were the result of environmental improvements such as more or higher quality schooling. But it highlights the curious path from everyday activities to intellectual performance: It is one thing if a child’s IQ is elevated over time because she is drilled daily on vocabulary and basic number facts (two of the subtests of major IQ batteries). But Flynn and others have shown that these are not the areas where IQ has risen much. It is in what Flynn refers to as “on-the-spot reasoning” about relations between objects that are either totally familiar to everyone, hence no one can be claimed to have a prior advantage (e.g., arranging familiar pictures so they tell a coherent story) or objects that are totally unfamiliar to everyone (e.g., nonsense shapes that have to be seriated). On these types of tests the IQ gains have been enormous. If we gave our grandparents today’s tests they’d score near the mentally retarded range, something that neither Flynn nor most researchers believe reflects their intelligence, notwithstanding their low scores.

A relatively unexplored question is the causal pathways running from the early environments to later performance on g-loaded tests. Granted most of us do not directly teach our children how to arrange pictures to tell a story or how to seriate or cross-classify a multidimensional matrix of shapes, but perhaps there are activities that indirectly foster elevated scores on such tests. And perhaps these activities are more common with each subsequent generation, leading to the Flynn effect. There is some support for this view. For example research with Brazilian children demonstrates that every year of formal school attendance conveys an improvement in their Raven’s Matrices performance, the quintessential g measure. Raven’s Matrices are associated with the largest IQ gains in the 20th century, so there is clearly something that is associated with being in school that aids performance on the highly g-loaded IQ test.

Similarly, researchers have shown that differences in the ways boys and girls spend their time (e.g., playing with Legos) (Bornstein et al., 1999), toy selection (Goldstein, 1994), and computer videogame experience (Quaiser-Pohl et al., 2006) are responsible for differences in their spatial abilities, also loaded on g. In a recent well-controlled study by Feng et al., (2007), it was found that playing action videogames significantly narrowed gender differences in mental rotation, in which perspective drawings are shown at different orientations and one must determine whether they are the same object, or on tasks in which one is asked to judge whether a 2-dimensional piece of paper can be folded into a 3-D shape. In this study, both males and females who were asked to play action videogames improved their mental rotation scores but the improvement was much larger for females, and the performance of the females after playing such games was indistinguishable from that of the males who did not play them. Mental rotation is a g-loaded task that is related to math ability. But clearly it can be improved with certain everyday experiences that some individuals engage in.

Lest one imagine that g is driven exclusively by schooling, however, in a direct comparison of the increases in performance on tests of general intelligence across educational age (years of schooling) versus chronological age, Brouwers and his colleagues demonstrated that school attendance by rural Indian children has a substantially smaller impact than has the natural stimulation provided by their everyday experiences herding, running errands, etc.. On average, the increase in general intelligence that results from one year of chronological age is twice the increase that results from one year of attending school. This research revealed that attending school affects tests of cognitive ability primarily in academic domains (e.g., arithmetic) as opposed to on-the-spot reasoning.

We do not know where the Flynn effect is headed. I doubt it will continue at the .3 point per year pace that occurred in the 20th century, though my gut suspicion is that it will rear its head in undeveloped nations that have not had access to the environmental improvements (schooling, challenging games, parental investments) that drove the increases in developed nations. Regardless of whether this hunch proves accurate, all of us own Flynn a deep debt of gratitude for complicating what had started to seem like a closed case.

References

Bornstein, M. H., Haynes, O., Pascual, L., Painter, K., & Galperin, C. (1999). Play in two societies: Pervasiveness of process, specificity of structure. Child-Development, 70, 317-331.

Brouwers, S. A., Mishra, R. C., & Van de Vijver , F.J.R. (2006). Schooling and everyday cognitive development among Kharwar children in India: A natural experiment. International Journal of Behavioral Development, 30, 559-567.

Feng, J. Spence, I., & Pratt, J. (2007). Playing an action video game reduces gender differences in spatial cognition. Psychological Science 18, 850–855.

Goldstein, J. H. (1994). Sex differences in toy play and use of video games. (pp. 110-129) In J. H. Goldstein (Ed.), Toys, play, and child development. NY: Cambridge University Press.

Quaiser-Pohl, C.,*, Geiser C., & Lehmann, W. (2006). The relationship between computer game preference, gender, and mental-rotation ability. Personality and Individual Differences, 40, 609–619.

Rushton, J.P., & Jensen, A.R. (2006). The totality of available evidence shows the race IQ gap still remains. Psychological Science, 17, 921–922.

Stephen J. Ceci is the Helen L. Carr Professor of Developmental Psychology at Cornell University.

The Conversation

Measuring Two Different Things: People and Trends

It is important to recognize that an instrument measures something aside from its own measurements and to be clear about what it is. Take the early thermometers. To say that heat was the readings they provided would be absurd. Heat was quite a separate thing, namely, what you felt on your skin on a warm day. As for what the early thermometers measured, before they were perfected, they confounded measuring heat with registering atmospheric pressure. Later two separate instruments were developed that disentangled the two: mercury thermometers for the heat alone; and barometers for atmospheric pressure alone.

My book develops a simple thesis in three parts. First, IQ tests inclusive of Raven’s and the ten WISC subtests are instruments of measurement. Second, during the 20th century in America, they have been measuring two distinct things. Comparing individuals at any given time, they have recorded a tendency for a high-IQ person to be superior to the average person on all of these tests, which has led us to say they measure a general intelligence factor called g. Comparing generations over time, they have measured something quite different, namely, various cognitive abilities either remaining stable or being enhanced. Third, the concept of g sheds no light on why these trends have occurred or their significance.

The third point is simply a matter of fact. The various cognitive abilities measured by different tests or subtests show differing magnitudes of gain that have nothing to do with their excellence as a measure of g. They reflect social priorities that have shifted over time. Although we have done a better job of teaching children the mechanics of reading, thanks to a visual culture, they have no larger vocabularies and thus the Vocabulary subtests show minimal gains over time. Thanks to the enhanced demand for people who wear scientific spectacles, people are much better today at classifying the world and using logic to analyze it, that is, using logic as a tool that can deal with abstract categories. Therefore, there have been huge gains on the Similarities subtests and Raven’s. The shifting priorities of society do not reflect g-loadings because society does not value cognitive abilities in terms of how much a gifted person beats an average person on them. Why should it? Lumber may be a humbler thing than a symphony but more necessary.

Once we stop using g to try to make sense of cognitive trends over time, each trend becomes interesting in its own right. Why we have no larger vocabularies to deal with everyday life is interesting and why we tend to classify the world rather than merely manipulate it is interesting. A fascinating history emerges. It tells us how our minds have responded to the evolving cognitive demands that evolving industrial society has made. It is not a matter of some fixed cognitive factor trying to do new things; rather cognition itself is evolving to meet new demands. Our minds are not like a baseball bat that has remained unchanged over 100 years, with only fast balls to cope with up to 1950, and the curve ball coming along on that date. Our minds are like cars. Today’s cars have evolved beyond the Model-T Ford because we now expect more than a means of transport. We expect cars to go faster, have a stereo system, a direction finder, and so forth.

There is a certain sense in which g is stable over time. At Time A, high-IQ people are superior to the average person more on one cognitive skill than another and they beat the average person on all of them. This is the kind of inter-correlation of performances on tests that engenders g. Over time, some skills are enhanced and other not enhanced quite independently of g. But at a time B, a high-IQ person may still be better than the average person on all skills and still be better on the various skills in much the same pattern. So the inter-correlation we call g emerges once again. Therefore, absolute changes in average skills over time are quite consistent with the persistence of correlations calculated at any given time. However, we want to know what happened between the two times and the correlations are not informative.

There is a difference between measuring things on an absolute or on a relative scale. Psychometricians posit latent traits that do not exist as functional traits in the real world. People cannot actually “do” g or “do” full-scale IQ, so we can only assign them numbers in a pecking order that ranks them. In real-world functional terms, you can only read, speak, calculate, classify, or do logic. Things in the real world divide into entities we can measure on an absolute scale, a quasi-absolute scale, or a relative scale. But even the last kind of scale can sometimes be translated into a scale of rough absolute judgments about real-world competencies – if their links to those competencies are strong enough.

Time, space, and counting things that are the same can all be measured absolutely. A ruler has a zero point and we can use it to measure whether something is one or two or three inches long. We can count whether we have no or one or two or three beans. Measuring climate (as distinct from the temperature of other things) with a thermometer is a quasi-absolute scale. It has a zero point (absolute zero) but the degrees do not mean exactly the same thing in terms of climate, which has to do with human comfort. The degrees as the weather gets to freezing or boiling hot are more significant than those between 15 and 20 degrees centigrade, but it is easy to make allowance for this, so no harm is done. The WISC subtests and Raven’s can of course be used simply to rank people against one another on a relative scale consisting of deviation “IQs”. But if we forget such scores for a moment, we will see that getting the items correct is close enough to prerequisites for competencies in the real world, so that they imply a scale of absolute judgments.

We could have an absolute scale for vocabulary by counting the numbers of words someone can define from zero to any number you like. But since our object is to assess how competent someone is in speaking and reading English (in non-specialized speech), it is better to include a sample of the most frequently used words up to say 5,000 with less representation of the less frequently used words. The WISC Vocabulary subtest approximates this. We can make a series of judgments. This person cannot even read the Bobsie Twins, that one could but not Hemingway, that one could but not Huxley. This gives us a scale of absolute judgments running from illiteracy to “can read War and Peace.” The connection between the command of vocabulary and these competencies is strong enough to bridge time. Someone with a 500-word vocabulary could no more have read a serious novel in 1900 than they could today.

As for Similarities and Raven’s, we hypothesize a scale of competencies that link their items to the ability to classify (using dichotomies rather than utilitarian likeness) and to use logic to deal with formal symbols. We posit an absolute scale ranging from “this person lacks even the scientific spectacles to do elementary algebra” to “this person could, but not formal logic” to “this person could, but not tertiary science.” Once again, I posit that the links are strong enough to persist over time. Whether the average person could classify the world only in terms of categories of everyday utilitarian significance, or also classify it using the categories that underpin modern science, is assumed to have persistent real-world significance.

In contrast, when full-scale IQ gives us a relative ranking of people, the link between their score and real-world competencies is not robust enough to persist over time. It simply lumps too many things together that have differing functional significances. We need to know whether the IQ is high because of an unusually large vocabulary or an unusual ability to do three-dimensional jigsaw puzzles. The latent trait called g is equally useless in allowing for absolute judgments over time. In so far as it ranks people functioning in the real world, it merely tells us that full-scale IQ is a pretty good measure of how much better you are than the average person on a whole range of tasks lumped together. So again, we get no information that allows us to establish strong links between test performance and functional competencies.

We may dramatize score gains by using the scales of the past, that is, we may say that the average person today would be at the 84th percentile of people at some past time in terms of Similarities or Raven’s. But that is unnecessary. All we need do is to say that people today are a lot better at one cognitive skill (classifying and detaching logic from the concrete) and only marginally better at another (reading serious novels). We dramatize these trends only to counter anyone who might say they are trivial. But the fact remains that they allow us to derive a rough substitute for absolute measurements. And the fact remains that g is useless in analyzing trends over time because it takes its very meaning from a pattern of relative rankings and it lacks the specificity to shatter that limitation.

Now to deal with the three commentaries in turn.

Linda Gottfredson

I will stick to the usual language of scholarly discourse because that is the best way forward. This is not difficult because I have met, like, and respect Gottfredson. It should now be clear that I commit none of the fallacies she alleges. I do not confuse averages and correlations. The fact there is an “absolute” gain in average performances over time does not affect the fact that performances at a given time inter-correlate (that some people do well on everything) and thus engender g. I do not confuse absolute and relative scales but rather, use scales with rough absolute significance. I do not confuse the instrument of measurement with what is being measured. The only one who has done that in this ongoing debate is Jensen (1972, p. 76) when he said that intelligence is what IQ tests measure. Thanks to his high level of intelligence (capacity to learn), he quickly abandoned that position (Jensen, 1979). Absolutely nothing in my case as to why g is useless in understanding trends over time depends on such confusions.

That case in no way detracts from the efforts of some psychologists to understand g, its roots in genes and environment, and hypotheses that different full scale IQs in different nations will causally interact with their economic development. My point is simply that those lines of research will not illuminate the cognitive history of the United States. Gottfredson speculates that while gains over time are not caught by g, they might constitute gains on some subsidiary factor revealed by factor analysis of IQ tests. That is a dead end. IQ trends over time are not factor invariant on ANY of the factors that performance at a give time reveals, that is, not on verbal, spatial, or memory factors (Wicherts et al, 2004). The way forward is not factor analysis of static performance but sociology, which can show us society and its demands evolving in all of their complexity.

It is good that Gottfredsom agrees that the theory of intelligence must address the problems posed by massive IQ gains over time. Those who think g theory adequate should come forward with their solutions. The best way to replace a defective theory is by way of a better one. I will bet that whatever emerges will depends on sociology rather than g and, since it addresses a historical problem, will best me by greater plausibility rather than hard evidence.

A few other points:

  1. Gottfredson implies that I am sanguine about the success of interventions designed to raise IQ. The opposite is the case as the Dickens-Flynn model of cognitive development entails. I see nothing wrong about the speculation that those who fall in love with demanding cognitive pursuits will make the most of their potential.
  2. She rejects my attempt to show that nutrition does not account for IQ gains over the last half century in America and that hybrid vigor (the Mingroni hypothesis) does not account for those of the last century. But since she does not rehearse my evidence, she says nothing to refute it. The reader can read Chapter 5 of my book and judge for himself (Flynn, 2007).
  3. She is unimpressed by the Dickens-Flynn model but offers no alternative explanation of the puzzle it attempts to solve: how environment is so feeble in the twin studies but so potent over time.
  4. To understand Blair’s results on the decentralization of brain functions, she should read Blair if my account is insufficient.
  5. Reaction time studies may make absolute assessments of trends over time, but they are theoretically bankrupt because they do not measure brain physiology or neural speed. We could measure whether people can stuff more beans up their nose today on an absolute scale, but that would not advance our understanding of cognition.
  6. Anyone who doubts that the Similarities subtest measures the ability to classify should simply take the test and look at the scoring directions. If anyone doubts that Raven’s measures the ability to use logic to make sense of the sequences of shapes, they should do the same.

Eric Turkheimer

He takes issue with only one assertion in my book. I believe that g may well signal something “real” when we compare the cognitive performance of one person to another at a given time. Beyond dispute is that some people do better than the average person at a whole range of cognitive tasks. That poses three distinct questions.

In the realm of individual differences, does g have predictive validity, that is, if someone gets a high full scale IQ on a test with cognitively complex tasks, can we use that score to predict their fate? Turkheimer does not dispute this. Your full scale WISC IQ gives significant predictions as to whether you will get good grades or qualify for an elite profession. As I point out, that does not mean it cannot be improved on. Sternberg has gotten better predictions by supplementing conventional IQ tests with more creative tasks such as writing an essay on the octopus’s sneakers. Jim Heckman has shown that non-cognitive measures of social skills, self-control, etc, are equally powerful predictors.

Does g show that different real-world cognitive tasks have something functional in common? I argue that it does not. The same kind of person may do well at the high jump and sprints. But increasing your sprint speed will not improve your high jump performance. Similarly, the same person may do well on Raven’s and the Arithmetic subtest of the WISC. But enhancing your ability to solve matrix problems will not improve your ability to do arithmetic. The same people are good at them but they have little functional in common.

Does the fact that some people do better than others on a variety of cognitive tasks have a cause in brain physiology? Turkheimer shows that the fact that g emerges from factor analysis, and what we know about what correlates with g, such as that it is heritable and so forth, leaves this an open question. I agree and did not explore this in my book because I felt I had quite enough new things to sell. But also my suspicion is that progress in brain physiology may show that certain individual differences underlie generally high performance. Some people best me at all sports due to a better cardiovascular system and a faster reflex arc. Some people may beat me at all cognitive skills because of a better blood supply to the brain and better dopamine sprayers (dopamine strengthens whatever neural connections are in use when we learn things).

Stephen Ceci

Steve Ceci has done me the service of making an important point that is in danger of getting lost. Unless massive IQ gains over time are dismissed as mere test sophistication, and there is conclusive evidence that they are not, their mere existence means a re-evaluation of theories of intelligence based on g. G was supposed to be so robust as to bridge even differences between the species (Jensen, 1980). Some explanation must be offered as to how the malleability of IQ and the persistence of g are compatible. Talk about one being an instrument and the other being the thing measured is just saying something has gone wrong but I know not what. My theory offers a solution but at the price of confining the relevance of g to individual differences and rendering it irrelevant to cognitive history. Once again, I will wager that any better solution will have the same effect.

References

Flynn, J.R. (2007). What is intelligence? Beyond the Flynn effect. Cambridge: Cambridge University Press.

Jensen, A.R. (1972). Genetics and education. London: Methuen.

Jensen, A.R. (1979). The nature of intelligence and its relation to learning. Journal of Research and Development in Education, 12: 79-85.

Jensen, A.R. (1980). Bias in mental testing. London: Methuen.

Wicherts, J.M., Dolan, C.V., Hessen, D.J., Oosterveld, P., van Baal, G.C.N, Boomsma, D.I., and Span, M.M. (2004). Are intelligence tests measurement invariant over time? Investigating the Flynn effect. Intelligence, 32: 509-538.

The Black-White IQ Gap

We aren’t the only ones discussing IQ right now. Over at Slate, William Saletan argues that the black-white IQ gap should not be a taboo subject and should be judged on the evidence. In this, he is entirely correct and I will be debating this issue with Charles Murray at the Manhattan Institute in New York City on the last Wednesday in November. However, he presents (lucidly) only the evidence for a genetic component. In my recent book, What is Intelligence? (see amazon.com), I note that blacks have cut the gap by one-third over the last 30 years. Also that black IQ (with white set at 100) declines with age from parity in infancy to a few points at age 4, to almost 17 points at age 24. In New York, I will show that we can give good environmental reasons for that decline with age.

The Chilling Effect of IQ Taboos

William Saletan aptly summarizes the research on the genetic and environmental sources of variability in IQ scores. It’s refreshing to read someone so balanced on a topic where no one feels neutrally, and no one is dispassionate. Having said this, one can quibble with his take on some of the evidence. For example, the claim that the average sub-Saharan IQ is only 70 is based on two reviews that have been criticized. Everything in this field gets criticized, of course, but this particular criticism is quite compelling. I think the true sub-Saharan IQ is much closer to 85, and one can begin to see that is not a bad IQ score for children living in austere rural conditions, with poor or nonexistent schools, heavy parasite infestation, and malnutrition. I am not blaming Saletan, because his synopsis accurately reflects what those two reviews claim and what many intelligence researchers have parroted. I am pointing out that the reviews are wrong. My best guess is that the black-white gap in IQ has been converging in the U.S. by around 3-4 points and African IQs will show much larger gains as sub-Saharan countries experience modernization and the things that go with it (e.g., higher quality mandatory schooling).

James Watson may be the most illustrious scholar to have his career ended because of his reckless language on this topic, but he is far from alone, as proponents of racial differences in underlying intelligence have been physically threatened, censured and in a number of cases had their appointments revoked (for examples, see Gottfredson, 2005; Ellis, 2001; Hunt, 1999). Watson opined that “all our policies are based on the fact that their [African] intelligence is the same as ours whereas all the testing says not really,” going on to state that although he hoped that everybody was equal, “people who had to deal with black employees find this is not true.” He instantly plummeted from the A-list Nobelist to the outcast who was removed from his post as the Chancellor of Cold Spring Harbor Laboratory. In the aftermath he was excoriated roundly in the media as well as in the corridors of the academy.

Saletan defends Watson on the grounds that the research supports his claim. I have a very different worry: Namely, the fear of being labeled racist, the enmity of colleagues and students, and occasional reprimands that threaten advancement and tenure revocation, can be sufficient to muzzle those who do not accept racial equivalence in intelligence. As Gottfredson (2005) remarked:

… the farther one goes into forbidden territory, the more numerous and more severe the sanctions become: first the looks of disapproval and occasional accusations of racism, then greater difficulty getting promoted, funding, or papers published, and eventually being shunned, persecuted, or fired. (p. 159)

I personally believe that IQ scores do not reflect the range of cognitive complexity captured by the label “intelligence”, and that any racial differences in IQ scores are due to differing ecological challenges faced by white and black Americans, not to genetic or other biological differences. I also believe that current racial gaps in IQ scores can close over time (and have been closing over time).

I am fortunate that these are my views because they are politically correct and garner me praise, speaking and writing invitations, and book adoptions at the same time those who disagree with me are demeaned, ostracized, and in some cases threatened with tenure revocation even though their science is as reasonable as mine. Don’t get me wrong, I think their positions are incorrect and I have relished aiming my pen at what I regard to be their leaps of logic and flawed analysis. But they deeply believe that I am wrong. The problem is that I can tell my side far more easily than they are permitted to tell theirs, through invitations to speak at meetings, to contribute chapters and articles, etc.. This offends my sense of fairness and cannot be good for science. I think Saletan would agree with me on this.

References

Ellis, F. (2001), ‘Race and IQ’, in Derek Jones, ed., Censorship: A World Encyclopedia, vol 3, Fitzroy Dearborn Publishers, 2008-2010).

Gottfredson, L. S. (2005) Suppressing intelligence research. In Destructive trends in mental health: the well-intentioned path to harm, ed. R. H. Wright & N. A. Cummings, pp. 155–86. Routledge.

Hunt, M. (1999) The new know-nothings: The political foes of the scientific study of human nature. Transaction Press.

Race and IQ

The Theory of Innate Differences

Ceci and Flynn, while expressing their skepticism about the possibility of genetic differences between the races for IQ, agree that the question is a legitimate matter for scientific inquiry, to be settled by cool-headed evaluation of the empirical evidence. I disagree. I contend that:

  1. The important questions about the role of genetics in the explanation of racial differences in ability are not empirical, but theoretical and philosophical, and,
  2. When the theoretical questions are properly understood, proponents of race science, while entitled to their freedom of inquiry and expression, deserve the vigorous disapprobation they often receive.

To understand the theoretical conundrums underlying this race question, the hypothesis must be stated plainly: Genetic differences between groups of people produce innate differences in their cognitive abilities. The key is the word, “innate.” What exactly it means for a characteristic to be innate is one of the great questions in the philosophy of biology, and like most great philosophical questions it gets deeper and more puzzling the longer you think about it. Our initial intuition is that something that is innate just happens as a matter of biological development, regardless of environmental inputs. Dogs have an innate tendency to bark; ducks quack.

But the intuitive view turns out to be incoherent on more than superficial examination. A point of view that is sometimes called developmentalism points out that absolutely no aspect of biology or genetics comes into being automatically without rich interaction with the environment. Ducks raised in the complete absence of auditory input from other ducks don’t quack, and in general organisms raised in the absence of environmental inputs don’t do anything at all. So the difference between learning to play the oboe and learning to walk is not that the former requires environmental input while the other does not, being in principle innate. They both emerge from a complex interplay of genetics and environment, and thinking of walking as innate is a distraction from the real scientific question of how the extraordinarily complex process actually comes about. Once you start to think this way, it gets difficult to say that any difference between two organisms is innate. The contention about Africans and IQ has to be that their genetic makeup is such that they will be lower than other races in IQ not only in the current environment, but in all imaginable alternative environments, and how could we possibly know that?

The developmentalist argument, however, is susceptible to a reductio ad absurdum. If taken too literally, it could lead you to conclude that there is nothing to be said about biological differences between organisms. You couldn’t say that people are more intelligent than turtles, or that Africans have darker skin than Scandinavians, because it is always possible in principle to imagine an environment where that isn’t the case. And there is a difference between learning to play to oboe and learning to walk: Walking may depend in complex ways on environmental input, but it nevertheless develops in a very wide range of environments, whereas oboe playing requires a very specific environment to develop. Geneticists have come up with the concept of a reaction norm to describe the range of environments in which a genetic trait might develop. Under this view, characteristics of organisms are not either innate or learned: they vary in the width of the reaction norm describing the kind of environmental inputs they require. Under this view, the race and IQ question comes down to the question of whether African IQ deficits are like dark African skin, so pervasive across all imaginable environments that calling them innate is perfectly reasonable as a first approximation, or more like African-American success in popular music, for which we require no scientific evidence to attribute to the particular combination of history, culture and sociology of the present time. We are justifiably offended by a hypotheses involving, say, an innate gift of rhythm.

Why Race Science is Objectionable

If I may address my fellow Jews for a moment, consider this. How would you feel about a line of research into the question of whether Jews have a genetic tendency to be more concerned with money than other groups? Nothing anti-semitic, mind you, just a rational investigation of the scientific evidence. It wouldn’t be difficult to measure interest in money and materialism, and it wouldn’t surprise me if as an empirical matter Jews scored a little higher on the resulting test than other groups. As a behavioral geneticist I can assure you without reservation that the trait would be heritable, and, if anyone bothered to take the time to find out, specific genes would have small associations with it. Of course, this research program has already been carried out, at least to the extent the relevant technology was available in 1939. While we are at it we could open a whole scientific institute for the scientific study of racial stereotypes, and finally pull together the evidence on sneaky Japanese, drunken Irish, unintelligent Poles, overemotional women and lazy Italians.

Hopefully I am beginning to offend you. Why? Why don’t we accept racial stereotypes as reasonable hypotheses, okay to consider until they have been scientifically proven false? They are offensive precisely because they violate our intuition about the balance between innateness and self-determination of the moral and cultural qualities of human beings. No reasonable person would be offended by the observation that African people have curlier hair than the Chinese, notwithstanding the possibility of some future environment in which it is no longer true. But we can recognize a contention that Chinese people are genetically predisposed to be better table tennis players than Africans as silly, and the contention that they are smarter than Africans as ugly, because it is a matter of ethical principle that individual and cultural accomplishment is not tied to the genes in the same way as the appearance of our hair.

If the question of African IQ is a matter of empirical science, exactly what piece of evidence are we waiting for? What would finally convince the racialists that they are wrong? Nothing, it seems to me, except the arrival of the day when the IQ gap disappears, and that is going to take a while. The history of Africans in the modern West is roughly as follows: Millennia of minding their own business in Africa, followed by 200 years of enslavement by a foreign civilization, followed by 100 years of Jim Crow oppression, followed by fifty years of very incomplete equality and freedom. And now the scientific establishment, apparently even the progressive scientific establishment, is impatient enough with Africans’ social development that it seems reasonable to ask whether the problem is in the descendants of our former slaves’ genes. If that isn’t offensive I don’t know what is.

Having said that I should add that I believe absolutely in freedom of expression and inquiry everywhere, but especially in academia. The racialists are entitled to their tenure and their speaking engagements and their promotions, but they are not entitled to my encouragement or respect.

I will close with a word on Watson. He is not really a racial scientist to any significant degree, he just expressed a point of view that I think is false and destructive. No one deserves to be punished for expressing a point of view, but there is another consideration here. Watson is a legitimately respected and famous person on the basis of his great scientific accomplishments and the awards they have won for him, but those accomplishments don’t have very much to do with racial differences in intelligence, except that both domains involve the concept of “genes” in a very general way. It is safe to say that he does not know anything more about the subject than anyone writing here. He is, of course, still entitled to his opinion, but famous scientists and intellectuals have some responsibility not to use their fame in the service of dangerous ideas that are ultimately outside their real expertise. Watson got in trouble for casually stating poorly informed opinions about a deeply serious subject. He is still the great scientist he always was, and I admired the apparent sincerity of his apology, but he deserved most of the criticism he got.

Rational Discussion of the Offensive is Okay

Reviewing the contributions of Ceci and Turkheimer, I still agree with Ceci. For these reasons:

(1) What is offensive should not be discouraged as a subject for rational discussion. I do not doubt that discussion of group differences is offensive particularly to groups where the hypotheses reflect unfavorable stereotypes. Until recently anything that contradicted scripture was offensive: “do you mean to say these people were lying about Christ?”

(2) Scientists have no veto power over what most people say about groups. Group stereotypes existed in simplistic form prior to the rise of science. Does anyone believe they would be less vicious today if Turkheimer’s view had always inhibited rational discussion of the abilities of different races? What about his discussion of the fallacies of talking about “innate”? Was that in bad taste? He may call it a philosophical point but he brings a galaxy of scientific findings to bear to debunk it.

(3) Or are we just to discourage discussions about race that various groups dislike rather than like? As for my group, Irish-Americans, I welcome a no-holds-barred discussion of the roots of our high rates of alcoholism.

(4) When we do not like a discussion, there is always a temptation to raise the bar as to what would count as evidence to a height no social science can meet. Turkheimer should not ask what would convince the race scientists they were wrong, rather he should ask what would convince good scientists that their hypotheses were wrong. His question introduces an ad hominem element in that it implies that his opponents are biased in some way that a good scientist is not. Otherwise there is no reason to single them out as particularly recalcitrant.

(5) I have little doubt as to what would make me conclude they were wrong with finality (as distinct from my present position that they are probably wrong). Evidence of the sort we got from Eyferth’s study of black and white occupation children in Germany from a number of nations, with larger numbers, ages going to maturity, and fewer ambiguities.

(6) Is it not legitimate to ask exactly what would convince people like me and Turkheimer that we were wrong? If we are to ask this of others, a prerequisite would be to first answer it ourselves. I have done so by implication (negative results from a series of studies like Eyferth’s but better), but I can imagine data from progressive knowledge of the human genome. Can Turkheimer not imagine evidence that would settle the issue for any fair-minded person? And if not, what a peculiar exception to all the questions posed by the sciences of man.

Arthur Jensen and John Stuart Mill

Before we abandon the topic of full discussion of race differences, everyone should read Mill’s classic, On Liberty. Mill warns us that governmental censorship is often not the main threat but rather the force of public opinion. He reviews the historical record and shows that all past restriction on open debate was counterproductive, which raises the question: do we live in the one time that is an exception to the rule?

Mill also emphasizes the positive value of a challenge to even true beliefs. Without this, they become stale Sunday truths without vitality because no one is practiced in defending them. If there is any belief that had become a Sunday truth among U.S. social scientists by 1960, it was that all groups were equal on average in cognitive ability. Everyone took it for granted, just as a conventional Christian went unthinkingly to church to practice the faith of his parents.

Imagine that Arthur Jensen had been persuaded not to publish his 1969 article on race. Not by law but by subservience to the public opinion of his colleagues. I would never have made any contribution to psychology. It was arguing versus Jensen that led me to investigate IQ trends over time, led me to formulate a more sophisticated defense of affirmative action, led me to my book What Is Intelligence, which (I hope) has done much to unfreeze our thinking about g, or the general intelligence factor.

It was arguing against Jensen that led Bill Dickens’ step by step towards the the Dickens-Flynn model, which in in turn has inspired Maas’ new model. Would Eric Turkheimer have ever put forward his wonderfully acute analysis of g without the flow of ideas that originated with Jensen’s article? If there is any debate that illustrates Mill’s point, it is the Jensen debate.

Reality is like a ball of twine and when debate pulls one thread loose, it begins to unravel and much is understood. Full debate is far more productive of truth than a prize fight in which force (law) legislates “truth”. It is also far better than a shouting match in which someone is intimidated by disapproval, difficulty in finding a journal in which to publish, fears about promotion.

It may be said that truth is not the only value. A colleague once told me that all that has been learned in the wake of Jensen was not worth hundreds of acts of discrimination, those that may have occurred from naive racists feeding on his views. Rousseau said that same thing about those who argue for atheism: is this single truth worth the danger that thousands of people of simple faith will become immoral, rather than acing responsibly?

I side with Mill: if only one peson held a view, however repugnant or wrongheaded, and all mankind were united against him, he would have as much right to free speech as they. And no one should discourage exercise of that right.

What Is the Alternative to Civil Discourse?

Our field is riddled with theoretical conundrums, but that is neither unique to our field nor does it imply that progress must await a total closing of the racial IQ gap. If research were to demonstrate that, for example, the racial gap in IQ has narrowed by .25 SD in the past few decades, and such narrowing coincided with hypothesized causal events (e.g., more per pupil spending in minority schools, reduction in single parent families, etc.), then this would be very suggestive of a nongenetic basis for the racial IQ gap. (And Flynn’s own research does show a racial gap closing of this magnitude since the 1970s.) Are we better off if race scientists never publicly made their arguments about the genetic basis of racial IQ gaps but kept their beliefs to themselves, perhaps sharing them with private audiences, thus never provoking disconfirmatory evidence? I think not.

Turkheimer states that race scientists “while entitled to their freedom of inquiry and expression, deserve the vigorous disapprobation they often receive.” Perhaps it is this phrase “vigorous disapprobation” that is hanging me up, for I know Turkheimer’s views on the substantive issues and my own are compatible. Suppose it is meant that race scientists deserve vigorous condemnation or disgrace for the freedom to express their ideas. If this is the intended meaning, then it is hardly worth saying they have freedom at all if the cost is personal disgrace. We can easily think of analogies that show that one is not truly free if the expression of that freedom carries with it condemnation or disgrace. Does Turkheimer really mean that race scientists should be disgraced for expressing their views? Isn’t it enough that their views be refuted? Why must they be condemned?

A similar argument can be made for “gender scientists”, who argue that the underrepresentation of women at the extreme right tail of the mathematics distribution (~ 5-to-1 in favor of males) is the result not of cultural factors such as stereotype threat and socialization but of biological differences between males and females—brain structure and volumetric capacity, both organizing and subsequent hormones, etc., perhaps the result of evolutionary pressures that favored males for spatial rotation ability. Those of us who have responded to such arguments, have done so civilly, on a point-by-point basis, providing what we deem to be reasonably compelling refutations to some, albeit not yet all, of the gender scientists’ assertions. (Interestingly, some of the same race scientists are also key gender scientists—e.g., Rushton, Ankney, Lynn; and some of the rivals are the same rivals, e.g., Flynn, Steele, myself). My female coauthors and I can understand why some women might feel that such assertions are inappropriate and deserve vigorous responses, but no one claims the gender scientists themselves merit opprobrium, at least not as I understand the term. To take that position is to tip-toe toward the one party state scientific platform. I know Turkheimer would not want this enforced orthodoxy any more than Flynn and I. We should let diverse views be expressed for that is the best way to challenge our positions and grow a body of knowledge.

One can always maintain that the knowledge we strive to grow is unknowable in the near term, or that we already do know it and the race scientists are absolutely and irredeemably wrong, so wrong that it smacks of a moral flaw in their character, thus warranting disgrace. None of us ought to have the hubris to imagine we have settled these issues definitively. We haven’t. Flynn and others have aimed their logic and data at the race scientists in debates and writings, and there is some evidence that they are making good progress. If race scientists were not permitted to publish their claims without engendering personal disgrace, that cost would have impeded the progress that Flynn and others have made. I don’t think that is a good situation, for they would secretly be relying on assumptions that were effectively refuted precisely because they were allowed to publish them in the first place without fear of being disgraced or condemned. It is telling that many of the strongest detractors of Flynn’s claims have come around to agreeing with him, at least on some of his points (see the names of the people who blurbed his latest book because they include many former rivals who although still at odds with him about some issues have seemingly backed off a number of issues he has refuted. This is progress!)

Turkheimer raises many forms of stereotypical claims that all of us have recognized from childhood and none of us find compelling. Like Flynn, I welcome the opportunity to tackle such claims head on, raising counter evidence. This is superior to having stereotypical beliefs closeted, immune to disconfirmation. Yes, they can be hurtful, and surely it is possible to calculate a heritability for eating pasta, for example, or for being lazy, or getting divorced, etc.. There are real costs to freedom of expression and we should insist that it always be done in the most respectful manner. The alternative is to muzzle scholars whose beliefs we find distasteful. As Saletan notes, depending on your stance, the beliefs can be those of race scientists, evolutionary theorists, etc. I’d hate to think of a world where those in charge had the hubris to think they validated creationism and evolutionary critics were free to express their beliefs as long as they endured personal disgrace for doing so. For me, that is what it comes down to—engaging in civil discourse with my rivals and expecting civility in return. I have not always gotten the latter, and so I know how hurtful that can be. But at least I have not been made to endure disgrace or condemnation for my beliefs, replete with attributions about my moral flaws. Watson engaged in reckless language, as I noted in my posting. But his recklessness was in the second part of his statement where he lapsed into undocumented hearsay and impressions rather than in an evidentiary statement he made in the first part of his statement to the effect that there was research showing racial gaps in African IQ. Research exists to support the latter claim, it is published in journals and in a book, and it can be refuted (and it is being refuted).

I return to my position: It is better to have exchanges such as this Cato Unbound between oppositely held beliefs in public with both sides accorded access to journals and conferences than to have only one side present its story. Today that one side is the one that Turkheimer, Flynn, and I are on (the non-genetic basis of the racial gap in IQ), but tomorrow it could be another side and I’d want to be able to argue my case without being humiliated and made the object of personal attributions.

Flynn, Ceci, and Turkheimer on Race and Intelligence: Opening Moves

William Saletan may be the first journalist to so directly acknowledge the scientific evidence suggesting a non-trivial genetic basis for racial differences in IQ (hereafter, the partly-genetic hypothesis) and to be allowed to publish his views. Publication of his 3-part article in Slate (Nov. 18-20, 2007) represents a stunning departure from usual journalist practice, which is to enforce the taboo against voicing such thoughts. But Saletan does something more important. He urges us to look the dreaded possibility straight in the eye to see whether it is really as fearsome as we imagine. He thinks not, and I agree.

Flynn, Ceci, and Turkheimer comment on both issues raised by Saletan: How strong is the scientific evidence favoring a partly-genetic hypothesis? And is it wise to speak openly and honestly about it? All three reject Saletan’s judgment favoring the genetic hypothesis, though with varying degrees of finality. On the second question, Turkheimer opposes “cool-headed evaluation of the empirical evidence” because the hypothesis is, in his view, “offensive” and “dangerous.” Flynn and Ceci side with Saletan in allowing such evaluation, but Flynn echoes Turkheimer’s moral distaste when he labels as “offensive” the IQ-based scholarship with which all three disagree.

The series of posts by these three authors illustrates, in microcosm, the melange of criticism commonly marshaled against mainstream science on intelligence in order to seem to discredit it without actually engaging its large interlocking body of evidence. Indeed, the criticism succeeds precisely by avoiding such engagement. There are two general strategies for avoiding the totality of relevant evidence: (1) create doubt about some small portion of it as if that isolated doubt nullified the totality of evidence, and (2) put unwelcome evidence off-limits by labeling it immoral or ill-motivated.

What Is the Debate About — Phenotypic or Genotypic Differences in Intelligence?

IQ tests measure only phenotypic (developed) differences in general intelligence (g), not what causes them. Ability differences among groups can be real without necessarily being genetic, either in whole or part. When scientists speak of race or sex differences, this is short-hand for average differences between the groups—just as when we refer to men being taller than women or Americans being fatter today than decades back. There is no implication that the observed differences are caused by genetic rather than non-genetic differences between the groups.

Public commentary on black-white IQ differences generally confuses these two questions: whether blacks and whites differ in average developed intelligence level, and, if so, why. The first question has been scientifically settled for decades. As a task force of the American Psychological Association wrote in its flagship journal in 1996, “The differential between the mean intelligence test scores of [Western] Blacks and Whites … does not result from any obvious biases in test construction and administration” (Neisser et al., 1996). The gap in abilities is real, and it has real-world, practical consequences for the individuals and societies involved (Gottfredson, 1997). Racial-ethnic differences in phenotypic intelligence are the rule, not the exception, worldwide.

In contrast, the “why” question—are these differences partly genetic?—remains a matter of much scientific debate. For instance, despite agreeing that the American black-white IQ gap reflects a gap in general learning and reasoning abilities, intelligence researchers differ on the extent to which the average difference is genetic in origin. Some have written that current evidence makes the black-white gap more plausibly 80% genetic than 0% genetic and others that 50% genetic is more plausible than 0% genetic, and yet others have argued that the available evidence is equally consistent with 0% genetic or is not yet sufficient to venture an opinion. I suspect that most experts on the topic now believe that the gap is at least somewhat genetic, because that was the plurality judgment when solicited confidentially twenty years ago (Snyderman & Rothman, 1988). My view is that the current weight of evidence favors a 50-80 percent rather than a zero percent genetic component (Gottfredson, 2005b).

Gambits for Seeming to Discredit the Genetic Hypothesis Without Actually Doing So

The question Saletan raises is thus whether the part-genetic hypothesis explains average racial differences in phenotypic intelligence better than does a no-genetic hypothesis. Flynn, Turkheimer, and Ceci seem to favor the latter. Their arguments for it, to the extent they provide any, rest on creating doubt about the evidence for g and its relation to race.

The part-genetic hypothesis becomes plausible only if intelligence differences among members of the same race are demonstrated to be real, important, and at least partly genetic. It gains further plausibility when the difference in average intelligence between two races is also real, important, and stubborn. The longer and stronger this chain of evidence for both blacks and whites, the more scientifically plausible a genetic component to their average IQ difference becomes.

Each link in the chain of evidence has been vigorously tested many times because all were once controversial hypotheses among the scientists testing them. Often contrary to the researchers’ initial expectations, each has proven robust. That is why many critics of the genetic hypothesis will concede one or more of the links, at least implicitly, but then assert that crucial other links have been broken.

Table 1 lists a series of seven “yes-but” gambits for such purposes. The first five deny the validity of basic facts about intelligence: the existence of intelligence (g), its fair measurement, practical importance, stability (lack of malleability), and high heritability. None of these facts relates to race, per se. The last two gambits concede all prior five facts but assert that racial differences in IQ are not genetic or must not be thought so. The first, Non-Existence gambit is to concede racial differences in scores on IQ tests but then to assert that IQ tests do not measure intelligence because, for instance, “intelligence” is only that which a particular culture chooses to value. In contrast, the Mismeasured (Test-Bias) rejoinder concedes what the Non-Existence rejoinder disputes (“Yes, IQ tests do measure intelligence”) but then targets a different link in the chain of plausibility (“but, they produce falsely low readings for blacks”). Both rejoinders, however, urge us to abolish IQ tests or question testers’ motives rather than take the IQ gaps seriously.

Table 1. “Yes But” Rejoinders Used to Ignore Scientific Findings of Intelligence and Race

(Click image for full table)

Flynn, Turkheimer, and Ceci make mutually inconsistent claims over which well-established facts we should doubt. Turkheimer deploys primarily the Nonexistence and Unthinkable gambits. Ceci rejects both and relies primarily on Malleability. Flynn also deploys Malleability, but has it both ways on heritability (secular changes but not individual differences are Just Environmental) and the Nonexistence of g (he “shatters” it across generations but not within them), hints at Unimportance, and explicitly rejects Unthinkable while providing reasons to accept it.

The “but” segment of the “yes-but” rejoinders in Table 1 have all been plausible scientific hypotheses at one time or another—for instance, that IQ tests are biased against non-whites or that intelligence is highly malleable. That is why there has been so much research on them in the last half century. A once plausible “but” becomes a misleading gambit, however, when it is falsely asserted to be proved or to enjoy a scientific consensus, especially once the weight of high-quality evidence has clearly tipped in the opposite direction. The three commentators sometimes accomplish this by simply asserting counterfactual claims as “clearly” true.

As the center of scientific inquiry has moved along the chain of plausibility, from newly-confirmed links to still-contested ones, those who would quash the part-genetic hypothesis have shifted their “yes-but” rejoinders accordingly. Where once they could seem to discredit the reports of racial differences in intelligence simply by asserting that intelligence (or fair measurement or genetic influence within race) does not exist, they now frequently assert that the existence of biological races has been disproved. If race is only a social construct, without any biological component, then any genetic component to average racial differences in intelligence is ruled out. Readers are thus invited to ignore or disparage all evidence in the entire chain of plausibility. Our commentators do not comment on the biological reality of race, perhaps because they all deny one or more prior links in the chain of plausibility.

Each “yes-but” gambit contradicts prior ones in the list but enlarges the menu of options for impugning the science. Their mutual inconsistency is muted by the “yes” concessions usually being left implicit. While a “but” criticism can draw public accolades, a “yes” to bedrock facts can draw fire. So, while our commentators sometimes disagree sharply about which links are weak, they join in pronouncing the chain broken.

Those making “but” statements generally justify doing so by providing an evidentiary “because.” The last column of Table 1 exemplifies such appeals to seeming evidence. Many are themselves once-plausible hypotheses, now disproved; others are straw men or emotional appeals designed to arouse fear and disgust. What catches public attention, however, is the great variety of ways in which the science seems to be rendered deficient, or worse.

The trump card is the Unthinkable gambit. It gains popularity when the empirical evidence starts to seem incontestable. This gambit moves the partly-genetic hypothesis out of the scientific realm and into the moral arena. There, we need not consider whether the hypothesis is true or false (that is, supported or not by the weight of evidence) but whether accepting it makes one good or evil (or, at least, feel good rather than guilty). Scientific truth is no defense in the moral realm and, indeed, an unwelcome idea may be attacked all the more fiercely for possibly having truth on its side. By this code, moral duty requires suppressing and censuring evil speech and mandating good speech, whether true or false. Turkheimer plays this card openly.

Critics sometimes apply moral standards in the guise of scientific rigor, for example, when they demand that unwelcome conclusions be proved beyond all possible doubt. Besides falsely suggesting that moral blame is the question, it shifts the scientific standard for adjudicating competing claims in science from the preponderance of the evidence (Which side made the best factual case?) to burdening the unwelcome side with impossible standards of proof, as Flynn himself notes in one post. The favored side is presumed true until the disfavored side answers an infinite regress of manufactured doubt. All our commentators explicitly reject that double-standard, but they all enforce it implicitly by manufacturing and celebrating doubt about some facts while ignoring the fuller, interleaved network of evidence.

In the court of moral judgment, defending the scientific validity of an unwelcome conclusion amounts to confessing moral guilt. Scientific strengths are rendered moral flaws (objectivity becomes insensitivity; integrity under fire, a taste for controversy). The innocuous is made suspicious (dedication is depicted as a fetish, collaboration as cabal). Scientific conclusions on what “is” (racial differences in IQ) are construed as personal preferences for what “ought to be.” Appeals to academic freedom are derogated as self-serving attempts to escape accountability or well-deserved opprobrium. Thus, while all our commentators defend rights to academic freedom and free speech, they impugn the characters of opponents to make them seem less credible.

Let us turn now to the specifics in our three commentators’ recent posts.

Avoiding the Totality of Evidence on g

Ceci is the most candid because he concedes there is “impressive data” for the first six bedrock facts in the chain of plausibility in Table 1. They “seem to accord with each prong of the [following] syllogism:”

  • An underlying ability (called g) is needed for all forms of cognitive performance
  • g is manifest in any broad cognitive battery such as IQ
  • g is related to many types of biological markers and is highly heritable
  • Large individual and group differences exist in g
  • Variation in g predicts differential life outcomes
  • Therefore, variation in life outcomes is at least partly rooted in biological differences in g

He is also candid about wishing to reject the syllogism’s conclusion that “g reflects an underlying biologically-based and stable intellectual ability, rather than a specific skill learned in school or taught by parents.” This is why he says “all of us owe Flynn a deep debt of gratitude for complicating what had started to seem like a closed case”—for “throwing a spanner” of doubt. Ceci has pinpointed here why Flynn is feted worldwide by the critics of mainstream research on intelligence: not for disproving any part of the syllogism, and not for solving the mystery of why certain scores on certain IQ subtests have risen (it remains unsolved), but for cultivating doubt about the syllogism.

In fact, the syllogism has already been empirically verified. Level of education, occupation, and income are themselves moderately heritable (60-70%, 50% and 40-50%, respectively), and these heritabilities overlap that for g by at least half (yielding 40%, 25%, and 20% of the phenotypic variation in the three life outcomes being jointly heritable with g: e.g., Rowe, Vesterdal, & Rodgers, 1998).

Ceci implies that Flynn has rendered Table 1’s bedrock facts 5 and 6 (demonstrated lack of malleability, high heritability) irrelevant when he says Flynn “has shown beyond doubt that general intelligence fluctuates systematically over time and this cannot be due to our having better genes than our grandparents.” Flynn has shown neither, of course. There is no evidence that g itself has risen much over time, if at all. Nor is there any evidence that rules out genetic change (e.g., heterosis) or that rules in environmental influences of any particular kind, if any at all, for the observed increases on certain subtests, the limited nature of which suggests that only a narrow band of skills has been affected.

For Ceci, Flynn provides a life-line of doubt to escape the decades of evidence showing that intelligence (g) is not malleable. Now freed to search for levers of malleability, Ceci proposes education. He acknowledges that the patterns of change in IQ subtest scores are opposite of what one would expect if school instruction were the cause, so he suggests that there is “clearly” something “indirect” about having more schooling that enhances intelligence. He also notes, however, that some children show greater cognitive growth in a year of schooling than do others. He interprets this, not as expected differences in children’s maturation rates, but as evidence that “clearly” the “everyday experiences that some individuals engage in” improve intelligence. For empirical support he must turn away from the large body of contrary evidence in the US from standard educational and work settings and toward bits of research from odd corners of the globe (rural Kharwar Indian and Brazilian children) or forms of “natural stimulation” during childhood (herding, running errands, playing videogames or Legos).

Turkheimer selects a different target for scientific doubt: g itself (bedrock fact 1). “Practically everything [is] wrong with the concept of g.” He chides Flynn for conceding too much to me by accepting g as an empirical phenomenon. His strategy is to assign an “intuition” to g’s advocates which he then demolishes with a thought experiment. The experiment is that he can construct a universe in which abilities are independent. He shows us that factor analysis will extract a g factor nonetheless. His conclusion is that g is therefore an “illusion.” His other-worldly demonstration does nothing, however, to invalidate the independent bodies of evidence showing that this “illusion” exists beyond factor analysis as a biological (Jensen, 1998) and cross-species phenomenon (Chabris, 2007), and that it has more predictive substance in the real world than does any other cognitive ability. In any case, Humphreys (1986) and others put his hypothesis to the test half a century ago when they earnestly tried, but failed, to build useful ability tests that did not also measure general intelligence.

Flynn seems less concerned with breaking any particular link in the empirical chain of evidence than with simply declaring the most troublesome ones irrelevant. The most troublesome are g and its high heritability, which constitute the core of the syllogism that Ceci would like to see nullified. So, while Flynn does not deny the existence of g or its high heritability, he attempts to “shatter” it with secular changes in IQ. Explaining the secular increases seems secondary, except that the proferred explanations must emphasize the cultural malleability of “intelligence.”

He accomplishes this “shattering” by reversing perhaps the most important advance in intelligence measurement of the 20th Century, which was to separate the latent trait, g, from the specific devices used to measure it, including IQ tests. This separation is what allows us to describe different mental tests according to their “g loading” (their correlation with the latent trait, g). All mental tests measure mostly g, whatever their original intent. They often differ in the narrower skills and abilities that also contribute somewhat to good performance on them, but their power to predict important life outcomes has been shown to reside almost exclusively with their common g component.

How does Flynn “shatter” this psychometrically unitary phenomenon? He ignores latent traits and focuses on the measurement device. First, he notes that raw scores on IQ tests have risen over time because people are doing dramatically better on a few of the various subtests that contribute points to the overall IQ score. He cannot say that g has increased, because scores on some g loaded subtests have increased a lot (e.g., Similarities) but others hardly at all (e.g., Vocabulary), so he merely says “intelligence” has increased. Because the score increase rests on only a few subtests, he pulls them out for special attention as the source of this increase in “intelligence.” This pulling apart of the measurement device as if it were pulling apart “intelligence” is presented as the “shattering” of g. The impression is solidified by presenting sports analogies in which the athletic “skills” in question are “functionally independent.”

Flynn can now examine these shattered pieces of the IQ battery as “interesting in their own right,” as if they no longer measured mostly g. He points to selected surface characteristics of their items to generate scenarios for very general cultural factors that could have increased performance on them. These mechanisms are but metaphors, such as the donning of “scientific spectacles” by the population at large owing to the industrial and scientific revolutions. By these means “intelligence” is not only made malleable, but made amenable to change via “cultural priorities” which we can presumably manipulate at will.

The problem with his explanation, however, is that IQ (g) remains highly heritable. Flynn “solves this paradox” by postulating, in effect, that environments mimic genetic influence. For this he turns to William Dickens’ proposed environmental “multipliers.” By this model, any one of our independent abilities will trigger environmental reinforcement of all our abilities. That is, environmental demands for exercising our abilities come bundled together—trigger one demand and you trigger all, which boosts all abilities in tandem. The result is to reconstitute g from the outside in. Instead of g being internally generated, it is now apparently the product of positive manifold among opportunities and demands for cognitive development in the external environment. The resulting g is highly heritable because of the single, genetically-influenced ability that happened to recruit a very powerful environment. Dickens’ is a valiant attempt to make g seem malleable by having genes themselves recruit environments to make it so.

Moral Denigration of Politically Unwelcome Research

Ceci argues that the current taboo against discussing one side of the argument on race and intelligence is not good for science. Indeed, he describes how the taboo can produce one-party science: all professional advantages and accolades flow toward scientists who explicitly disavow the politically incorrect side, as he has. I appreciate his candor.

I am fortunate that these are my views because they are politically correct and garner me praise, speaking and writing invitations, and book adoptions at the same time those who disagree with me are demeaned, ostracized, and in some cases threatened with tenure revocation even though their science is as reasonable as mine….I can tell my side far more easily than they are permitted to tell theirs.

Unlike Ceci, who believes that this double-standard is “unfair,” Turkheimer contends that those who disagree with their views “deserve the vigorous disapprobation they often receive.” The more eminent the disagreeing scientist, presumably the more vigorous the opprobrium ought to be—a principle exemplified by the recent vilification of Nobel laureate James Watson.

Turkheimer’s stance allows Flynn to claim the scientific high ground by turning Turkheimer’s peremptory question to “racial scientists” against him: is there any evidence that you, Turkheimer, would be willing to accept as disproving your view? Unlike Turkheimer, Flynn would allow “rational discussion of offensive ideas”—it is “okay,” he says. Flynn yields none of the moral high ground to Turkheimer, however, because he, too, labels the research conclusions with which they all disagree as being “offensive.” Thus, Turkheimer and Flynn seem to disagree only on how immoral the politically incorrect ideas are (“dangerous” or merely “offensive”) and thus whether the “offensive” may be allowed to compete in the marketplace of ideas. Flynn grants entry.

One supposes I should thank Flynn for his magnanimity in allowing others to rationally discuss my “offensive” research. My writings certainly meet his offensiveness criterion, if only because I review evidence and draw conclusions without regard to political correctness (e.g., Gottfredson, 2005a; Wainer & Robinson, in press; see www.udel.educ/educ/gottfredson/reprintsfor other publications). Perhaps he will show others who also believe I am mistaken how to dissect my evidence and argument to locate my errors and prove me wrong. I have waited decades for someone to do so.

In view of his stance on Watson, Turkheimer seems more likely to favor “vigorous [opprobrium],” which is much easier as well. Moral opprobrium requires no mucking about hip-deep in data or the complications of analyzing it properly. It is accomplished most easily by simply evoking the specter of genocide, as Turkheimer does in his first posting.

[H]istory has shown us only too clearly what can happen when simplistic views of human ability make poorly informed contact with biology and genetics….I am by training a behavior geneticist, and as such I am too well-acquainted with the ugly places oversimplified thinking about human ability and genetics can lead to let the phrase “good genes” pass without a shiver.

Lest readers miss his allusion to genocide, he refers to researchers who study group differences in (phenotypic) intelligence as “racial scientists” and “racialists” and, moreover, in the context of “200 years of [black] enslavement…, followed by 100 years of Jim Crow oppression.”

Turkheimer begins his post on race and IQ by moving the genetic question beyond the reach of empirical inquiry: “The important questions about the role of genetics in the explanation of racial differences in ability are not empirical, but theoretical and philosophical.” This he does by substituting the word “innate” for “genetic,” and then criticizing the substituted term. By this logic, we could shift his own field of study—behavior genetics—into his university’s philosophy department.

Turkheimer continues to invoke scientific expertise, however, when it serves to amplify Watson’s moral culpability or his own moral authority (as in the quotation above). Specifically, he chastises the great geneticist for speaking on scientific questions that are, according to Turkheimer, “ultimately outside [his] real expertise.” I do not know whether Turkheimer would grant me expertise for my decades investigating the nature and consequences of racial disparities in intelligence, or that he would claim any special expertise of his own, but I suspect he would avoid the question altogether by reverting to his opening argument, in which he renders scientific expertise irrelevant.

Criticisms of research into intelligence differences have become more strident and ad hominen as the research base has become, in Ceci’s words, ever more “impressive.” The postings by Flynn, Turkheimer, and even Ceci are laced with aspersions on the characters of the researchers with whom they disagree and with efforts, explicit or not, to encourage moral panic in the service of suppressing unwelcome truths.

Turkheimer describes the “possibility of genetic differences between the races for IQ” as “ugly,” “offensive,” “destructive,” and “dangerous.” He heightens our sense of outrage by listing slurs against various ethnic groups. He does not, however, explain why the genetic hypothesis is “dangerous”—because it is false, or because it might be true? Again, I await rational analyses by those who encourage moral panic.

Flynn is less direct in alleging that unwelcome conclusions about racial differences do harm, yet he specifies how he thinks they could do so. He prefaces his allegation by telling us why we are fortunate that political pressure did not dissuade Arthur Jensen from posing the part-genetic hypothesis in 1969. It prompted Flynn to greatness.

I would never have made any contribution to psychology. It was arguing versus Jensen that led me to investigate IQ trends over time, led me to formulate a more sophisticated defense of affirmative action, led me to my book What is Intelligence, which (I hope) has done much to unfreeze our thinking about g, or the general intelligence factor.

Flynn then plays the trump card for suppressing unwelcome inquiry: “It may be said that truth is not the only value.” For support, he refers to imaginings of the social harm possibly done by Jensen’s work, seemingly because his conclusions were true.

A colleague once told me that all that has been learned in the wake of Jensen was not worth hundreds of acts of discrimination, those that may have occurred from naïve racists feeding on his views.

For additional support, Flynn cites 18th century philosopher Jean-Jacques Rousseau on the dangers of atheism: “Is this single truth worth the danger that thousands of people of simple faith will become immoral, rather than acting responsibly?” In so doing, Flynn echoes Turkheimer’s implied fear that bad ideas about race will push average Americans down the slippery slope toward racial genocide. By branding such ideas as socially dangerous, both Flynn and Turkheimer encourage their suppression—despite Flynn’s ostensible admonition that we ought not do so.

All three commentators belittle as well as morally taint the unspecified scientists with whom they disagree. Flynn consistently describes the “g-men” and their ideas in derogatory terms (see especially his book). They have a “blinding obsession” for an “imperialistic” g, empirical challenges to whose “supremacy” even the best of them “strongly resist” and therefore must be irrationally wedded. One might conclude after reading Flynn’s book that Arthur Jensen, arguably the greatest intelligence researcher of the 20th Century (Detterman, 1998), is but a buffoon. Turkheimer also speaks of researchers who study the general intelligence factor as “g-men” who “revere” the positive correlations among ability tests and wish, thereby, to establish the “supremacy of g.” Ceci likewise derogates the “IQ mafia” as being “enamored with g” and “touting” its ability to tie together myriad seemingly unrelated phenomena—as if surely exaggerating g’s explanatory power.

Moral slander and fear mongering stifle scientific dialog, as is their purpose. They also preclude rational inquiry into the social costs of telling untruths about racial disparities in intelligence.

The Truth—What Possible Good? Lies—What Possible Harm?

Flynn implies that the public interest would be served by withholding the truth about race and IQ. But exactly which truth? It cannot be the part-genetic hypothesis because he rejects it as false. He uses Jensen’s research post-1969 to argue his point, but that research focused almost entirely on differences in phenotypes. So, Flynn is encouraging either the suppression of well-established facts about racial differences in phenotypic intelligence or else the common confusion between genotype and phenotype, which serves the same purpose.

The game here is not to suppress discussion of genetic differences but to suppress knowledge of phenotypic differences. The latter make the former more plausible, so the specter of genetic causation is used as a club to beat back scientific knowledge about racial disparities in developed abilities, whatever their origins.

None of our commentators mentions the large, persisting, and socially important black-white differences in phenotypic g, except to suggest that they are rapidly disappearing (something I’ve heard now for over thirty years). Instead, Flynn, Turkheimer, and Ceci focus almost entirely on cultivating doubt about the scientific edifice for g itself. If g is an illusion, then there can be no racial differences in g. If the scores on some IQ subtests rise over time, then perhaps we can declare “intelligence” malleable and disregard the high heritability of g. Nature’s constraints may not be so constraining after all, our commentators imply, so we have no lessons to learn from the many prior educational interventions and social policies that failed because they presumed ready malleability. If there is a lesson, says Flynn, it is that we should “impose” such interventions on individuals more vigorously and into their adult years. We begin to envision the state, guided by a firm hand and correct priorities, trying to reshape us to eliminate cognitive diversity for egalitarian ends.

Proponents of the taboo on discussing race and IQ assume that the taboo is all for the common good, but whose good, exactly, is served? It is most certainly not individuals of below-average intelligence, who face a tremendous uphill battle in modern, literate societies where life becomes increasingly complex by the day. General intelligence (g) is simply a general proficiency to learn and reason. Put another way, it is the ability to deal with complexity or avoid cognitive error. Virtually everything in life requires some learning or reasoning and thus confers an advantage on brighter individuals. Life is complex, and complexity operates like a headwind that impedes progress more strongly for individuals lower on the IQ continuum. Everyone makes cognitive mistakes, but lower intelligence increases the risk of error.

Take, for example, health care. Patients differ enormously in intelligence level, and these differences have life and death consequences for them. Individuals of lower health literacy, or IQ, are less likely to seek preventive care even when it is free, use curative care effectively when they get it, understand and adhere to treatment regimens, or avoid health-damaging behavior. They have worse health, more accidental injuries, higher health costs, and die sooner—regardless of income, insurance coverage, or quality of health care. Health care matters, as do material resources and motivation, but mental resources matter too. They are critical in the prevention and self-management of chronic illnesses such as diabetes and heart disease. Health self-care is an increasingly complex life-long job for all of us, which becomes even more complex as we age and experience more health problems.

It overstates only slightly to say that health care providers currently pay no attention to patient differences in the ability to learn and understand. As health literacy researchers have shown, however, a sizeable fraction of patients in urban hospital outpatient clinics are unable to understand an appointment slip (when to come back), a label indicating how to take four pills a day, or, among their insulin-dependent diabetic patients, the signs of low (or high) sugar and what action to take to bring their blood sugar back under control. Do proportionately more blacks have such problems? Yes, many more. Is that a reason to continue ignoring or disputing individual and group differences in g?

References

Chabris, C. F. (2007). Cognitive and neurobiological mechanisms of the law of general intelligence. In M. Roberts (Ed.), Integrating the mind: Domain general versus domain specific processes in higher cognition. Hove, UK: Psychology Press.

Detterman, D.K. (Ed.). (1998). A king among men: Arthur Jensen

[Special issue]. Intelligence, 26(3).

Gottfredson, L. S. (Ed.) (1997). Intelligence and social policy [Special issue]. Intelligence, 24(1).

Gottfredson, L.S. (2005a). Implications of cognitive differences for schooling within diverse societies. In C. L. Frisby & C.R. Reynolds (Eds.), Comprehensive handbook of multicultural school psychology (pp. 517–554). New York: Wiley.

Gottfredson, L. S. (2005b). What if the hereditarian hypothesis is true? Psychology, Public Policy, and Law, 11, 311-319.

Humphreys, L. G. (1986). Commentary. Journal of Vocational Behavior, 29(3), 421-437.

Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.

Neisser, U., Boodoo, G., Bouchard, T.J., Jr., Boykin, A.W., Brody, N., Ceci, S.J., et al. (1996). Intelligence: Knowns and unknowns. American Psychologist, 51, 77–101.

Rowe, D. C., Vesterdal, W. J., & Rodgers, J. L. (1998). Herrnstein’s syllogism: Genetic and shared environmental influences on IQ, education, and income. Intelligence, 26, 405-423.

Snyderman, M. & Rothman, S. (1988). The IQ controversy, the media and public policy. New Brunswick, NJ: Transaction Books.

Wainer, H., & Robinson, D. (in press). Interview with Linda Gottfredson. Journal of Educational and Behavior Research. (see also http://www.udel.edu/educ/gottfredson/reprints/2007gottfredsoninterview.pdf )

I Reject Rousseau

Just two comments in reply to Linda. Read what I said about Rousseau with a calm eye. It says just the opposite of what you think you read: that we should NOT place supposed unwelcome consequences over the search for truth. I have never referred to Linda’s research as obnoxious and do not think it so. But it is undeniable that some people find it obnoxious. Since I cannot affect their emotions, I said that despite their feelings they should not discourage her research.