Contemporary Issues in Neuropsychology: Race-Based Norms

Brian panichella


Controversies regarding test performance in ethnic minorities have a long history in the United States. This history makes manifest the unfortunate fact that it is the minorities who suffer the deleterious effects of misused information. In the 1920s, more immigrants were admitted to the U.S. from countries that scored higher on the Stanford-Binet intelligence test; and in the 1930s, employers tested intelligence and hired individuals with higher scores. During the First World War, the Stanford-Binet test determined job-placement in the workforce, where racially prejudiced Americans exploited those who were not educated enough to do well on the test (Moore 2007). In 1935, however, Otto Klineberg examined test data from previous WWI recruits to show differences in acculturation. The data had previously been used as proof for intellectual inferiority of African Americans and immigrants, and he demonstrated that African Americans educated in the Northern States scored notably higher than those in the South (Manly 2007).

Interestingly enough, the IQ test originated in Europe and was a complete failure. Once taken up by Lewis Terman at Stanford University, it caught on like wildfire in the U.S. Simply put by psychology Professor Michael Moore, “We [Americans] are a culture consumed with the measurement of individual differences. Competition is the name of our game” (Moore 2007). The intelligence quotient simply gave a number to differentiate people. Society did not necessarily try to understand intelligence – only how to measure it. Alfred Wechsler, creator of the WISC (Wechsler Intelligence Scale for Children), was once asked, “What is intelligence?” and with jest he merely answered, “What my test measures …” (Moore 2007).

Society has come a long way since the 1920s, yet the issue of race in test performance, construction and norm development has remained controversial and important. Within the field of neuropsychology, normative data take on special significance because of the current race situation. However, what exactly is neuropsychology?

Neuropsychology is defined as a “branch of psychology dealing with the relationship between neural and mental or behavioral processes” (OED 2007). Those who perform neuropsychological examinations – or a comprehensive set of assessments for cognitive and behavioral functions using a set of standardized tests and procedures – are called neuropsychologists. An assortment of mental functions may be systematically assessed, including, but not limited to intelligence, problem solving, conceptualization, planning, organization, attention, memory, learning, language, perceptual and motor ability, emotion, behavior, and lastly personality (Stern 2007). In short, all cognitive domains may be assessed through neuropsychological tests.

Currently, neuropsychologists are progressively becoming sensitized to the unique considerations that should attend to the assessment, treatment and research of individuals from various ethnic, cultural and linguistic backgrounds. Neuropsychology, as a division of psychology, adheres to the American Psychology Association’s (APA) code of ethics, which lays down concrete rules and regulations for members of the association (Harris 12). Nevertheless, no one has to be a member of the APA to be a psychologist. In order to practice psychology, one must only be licensed under the state. Across all disciplines, if something is made clear as an ethical rule or guideline, it should be interpreted appropriately; however, whether the rule is there or not, there will invariably be some misconduct (Stern 2007). Human nature itself exemplifies the tendency to delimit boundaries and to thereafter stretch and defy them.

Among the various considerations of ethics in neuropsychology, the issue of race-specific norms is highly controversial. However, what exactly are neuropsychological norms? What do norms actually mean in psychological research? What are the ethical issues relating to race in test construction and validity? Why have many neuropsychologists followed the trend to utilize race-specific norms? Once these questions have been addressed, one may see the arguments raised against race-based norms. There are essentially two arguments: (1) race-based norms ignore underlying cultural and educational factors, which lead to differences in acculturation, and (2) they promote leniency in giving lower cutoffs that deny many races much-needed services (Manly 2). Among the possible solutions, Dr. Jennifer Manly, of Columbia University, has suggested the use of literacy-based norms as a control – regardless of race – that some neuropsychologists believe is the key to extricating the field from the race debate.

Among the many things unique to humans is the degree to which their routine behavior is governed by multifaceted rules and principles, collectively referred to by psychologists as norms. Norms are based off healthy individuals in a particular community and may be used in a variety of disciplines. Normative data delimit the bounds of appropriate behavior and abilities in a host of domains, offering a hidden network of normative constructs, covering nearly all aspects of life and cognitive functions.

The majority of tests in psychology are norm-referenced; i.e., the subject’s scores are evaluated against typical test scores from a representative group of individuals (Spreen 44). Implicit in test development is the assumption that the scores are normally distributed, and that the subject’s score is interpreted against group norms (Spreen 45). When the term race-based norm is used, it means that the norm was developed for a particular race to utilize when making comparisons. Normative data are usually supplied by the test developer, who may further update and improve the norms through research (Lezak 22). Therefore, all conclusions drawn from norm-referenced assessments are comparative.

As with other subfields of psychology, ethical issues arise in test construction and validity. “The ethics of test construction and validity is a major problem,” maintains Dr. Stern of Boston University's Alzheimer's Disease Research Center, “and this is the reason I spent eight years of my life creating the NAB (Neurological Assessment Battery)” (Stern 2007). Within neuropsychology, much of what is done is based on comparing an individual’s performance on some available normative data – derived from neurologically intact individuals, or those who perform well on neuropsychological tests in all cognitive domains. If the person in question is not intact, then one may make corresponding statements to the neurological functioning of that individual.

What does it mean, then, to compare someone to one set of norms that was developed from two people who have nothing to do with the individual in question? For example, a practitioner may give Test A, and Test A has as its best available norms performance from a group of twenty-four-year-old, college-educated, white males. Furthermore, this may be all that is available for normative comparison of healthy performance. In testing a 70-year-old, eighth-grade educated, African American woman, how can one say that comparing her test results to these available norms is appropriate? Dr. Stern bluntly addresses the issue in remarking on the quality of normative data: “Most norms out there, for neuropsychological tests stink, and are not representative of any major group of people” (Stern 2007). Most individual tests are normed separately, which creates a distinct problem in comparing scores. One may construct a battery of ten different tests, for example, to give to a patient clinically; and each one of the tests may have a separate set of norms from which it was derived. However, in so doing, one is essentially comparing apples and oranges – not because of the function, but because of the norms that were used.

In order to address the issues of comparison, Dr. Stern and his colleagues created the NAB. The goal of the NAB development was to create a new set of tests that could be standardized on the same normative sample that is representative of the U.S. population. Thus, every test is standardized on the same group of 1,500 people from all over the country, across all age groups, educational groups, racial groups and equal gender distribution. The test is a group of thirty-three neuropsychological tests, each with an equivalent form, measuring a wide range of cognitive functioning for adults. It is broken down into six different modules, starting with a screening module, which is meant to be a relatively brief forty-five-minute assessment of most cognitive functions. The rest of the modules are more specific batteries of tests for more particular functions. There is an attention module, a language module, a spatial module, an executive functioning module and a memory module. Within each of the modules, one of the tests is ecologically valid, which means that the test is related to real world functioning, as opposed to some construct hypothetically derived by psychologists. The battery of tests can be given in a variety of ways – the entire battery, different subgroups, or different modules – to adults with suspected neurological disorders (Stern 2007).

In the development of the NAB, one of Dr. Stern’s consultants was a prominent leader in race-based norms. Dr. Robert Heaton, from the University of California, San Diego is the developer of norms for a variety psychological tests that include race-based African American norms. Dr. Heaton goes both ways on the race issue; in some cases, he believes it is better to not use racial norms. If all white-based norms were used, it occasionally produces unreasonable data in looking at someone who is not in the group. Thus, while constructing the NAB, Dr. Heaton and Dr. Stern wrestled with the issue of potentially using race-based norms. Eventually, they finally decided to create two different sets of norms. One set was demographically corrected – meaning that the raw scores were plugged into a computer or table and corrected for the impact of age, education and gender, but not race, which produced a positive effect on the scores. The second set of norms were created based off the U.S. census – meaning instead of correcting for age, education and gender, it was solely based on age – and within each age group, the make up of the sample for the normative groups were completely the same as the racial breakdown, age breakdown, gender breakdown, educational breakdown and regional breakdown of the U.S. census. Dr. Stern believes this is a way to say, “I have a seventy-year-old white male, and I am comparing this seventy-year-old white male to the typical seventy-year-old person in the U.S.” (Stern 2007). A typical person may be defined as one who has all the influences of the different types of groups and ethnicities, gender, and education. In this way, one may see how an individual is performing in the larger scheme of things, as compared to a specific subgroup (Stern 2007).

Nonetheless, race-specific norms are still a huge issue. The field of neuropsychology is essentially split on whether or not they should be used. Dr. Stern maintains that “there are cultural, racial, and ethnic differences in test performance” (Stern 2007) that may or may not have anything to do with underlying neurological or cognitive functioning. In a study of 161 different ethnic patients who were referred to a hospital for evaluation, significant group differences were measured in several cognitive domains by a team of researchers at Harbor-UCLA Medical Center (Boone 355). In the study, lower neuropsychological scores correlated with certain ethnicities to confirm the notion of differences in test performance. However, the researchers offer further counsel to practitioners in the field:
Results from the present study suggest that ethnic differences in test performance are not attenuated by presence of psychiatric or neurologic illness. The findings further caution that normative data derived on Caucasian samples may not be appropriate for use with other ethnic groups, particularly for measures of language, attention, processing speed, constructional skill, and select executive skills; application of Caucasian-derived norms will result in overpathologizing of cognitive disorder in ethnic minorities. Additionally, the fact that all non-Caucasian groups performed consistently lower on the Boston Naming Test suggests that the test stimuli themselves may be systematically biased against those groups. In a multicultural society, the development of appropriate norms may be insufficient in and of themselves unless specific tests/test-items that are reflective of a diverse cultural experience are developed simultaneously with normative data. (Boone 360)
In spite of this, Dr. Stern does not advocate the use of specific race-based norms. He asserts, “There is no such thing as racially based performance” (Stern 2007). A Hispanic person in Los Angeles is not necessarily going to have the same acculturation as a Hispanic person in upper state New York, and an African American in southern Florida is not going to have the same acculturation as an African American in Seattle, Washington. Therefore, instead of directly addressing the issue of race-based norms in neuropsychology, Dr. Stern encourages searching for the basis of acculturation differences among minorities and correcting them. Society should look to remedy the source of the problem – not mask its deficiencies by lowering standards.

One solution to the race-based norms problem has been suggested by one of Dr. Stern’s colleagues and ex-trainees, Dr. Jennifer Manly. As one of the most notable voices in the issue of race-based norms, Dr. Manly has been working to promote acculturation awareness in neuropsychological testing and the impact of race in dementia assessment. Dr. Manly espouses a measure of literacy to control for performance in neuropsychological functioning. Rather than race-based norms, she proposes to have literacy-based norms, or some measure or estimate of a person’s English literacy, through word knowledge (Stern 2007). In an article published in the Archives of Clinical Neuropsychology, Dr. Manly illuminates a method of viewing the race-based norms question through another field of medical investigation: hypertension research.

One of the central aspects of comparison for hypertension research lies in the context of construct validity. Essentially, the validity of a test is whether or not it measures the construct it claims to be measuring (Manly 1). Researchers compared the indirect auscultatory measure, i.e. cuff and stethoscope, with a direct method of inserting a canula into the artery, and they found that the indirect method was relatively accurate. However, using the indirect method resulted in a “significantly high[er] prevalence of hypertension among African Americans than among Caucasians (Cooper & Rotimi 1997)” (Manly 4). In spite of this fact, researchers have not demonstrated risk factors for hypertension that are unique to African Americans. Manly encapsulates her analogy of construct validity in relation to hypertension research and neuropsychology in a succinct and erudite fashion; and thus, it may be prudent to cite her findings in some length:
In summary, prior research on hypertension has established that having blood pressure above the common cutoff means the same thing across race, ethnicity, and geographic region. […] Construct validity of blood pressure measurement has been further demonstrated in that the biological and environmental causes of high blood pressure are common across racial groups and that a diagnosis of hypertension using these cutoffs is associated with similar cross-sectional and longitudinal health outcomes across racial groups.
[However,] Neuropsychologists cannot make the same assertions about neuropsychological test performance. It has not been demonstrated that performance below a common, race-independent cutoff means the same thing across race, ethnicity, and geographic region. We are yet unclear whether our measures of any particular cognitive domain have equivalent construct validity across racial groups (Helms, 1992). There is now a plethora of data that suggests that common cutoffs result in poor specificity of neuropsychological tests for true cognitive impairment (Manly and Jacobs, 2001). Whether common, race-independent test cutoffs have similar longitudinal health outcomes across race has not yet been fully investigated; thus this is clearly an important direction for future research. (Manly 6)
Additionally, Manly notes that it is important to distinguish from actual and perceived racial discrimination. Among ethnic minorities, both actual and perceived discrimination have significant associations with poorer health outcomes such as hypertension (Manly 6). This has a direct connection with neuropsychological testing. If an African American woman perceives someone – who may or may not think she is actually intellectually inferior – to be discriminating against her, then she will most likely not perform well in a neuropsychological test. The phenomenon is known as stereotype threat and was exposed in a groundbreaking Stanford University study:
Steele and his colleagues demonstrated that when a test consisting of difficult verbal GRE exam items was described as measuring intellectual ability, Black undergraduates at Stanford University performed significantly worse than did SAT score-matched Whites. However, when the same test was described as a “laboratory problem-solving task” or a “challenging test” which was unrelated to intellectual ability, scores of African Americans matched those of White students. Using similar methods, another study showed ... [the same] effect of stereotype threat (McKay, 2003). (Manly 8)

The fundamental connection between hypertension research and the discussion of race-based norms is the search for the underlying explanation of racial differences. For much time researchers have known that socioeconomic status (SES) is related to better health, and that SES is correlated to race. Even after taking into account indicators of SES, years of education, and income, the racial discrepancies in hypertension persisted. Eventually, hypertension researchers decided to incorporate measures such as assets, debt, use of public assistance and indicators of income, which subsequently eliminated the discrepancies (Manly 8).

In a similar fashion, even after race, formal education and age are taken into account; discrepancies still persist in cognitive performances with neuropsychological tests. Manly notes, “disparate school experiences, with accompanying different bases of problem-solving strategies, knowledge, familiarity and practice could explain why some ethnic minorities obtain lower scores on cognitive measures even after controlling for years of education” (Manly 9). Thus, it becomes apparent why Manly supports finding another factor that leads to better construct validity in neuropsychological tests. It is her belief that literacy-based norms are a better way to disentangle the field from the racial debate.

In conclusion, the predicament with race-based norms is that race is not the only issue – one would have to have a separate normative group for every different thing that might have an impact. Then, by definition, it would no longer be normative data. However, many neuropsychologists argue that race-based norms provide improved sensitivity and specificity for neuropsychological measures in detecting cognitive impairment, which is why the issue becomes somewhat sticky. In her conclusion, Manly offers advice to her fellow practitioners in the field: “Clinicians must continue to struggle when choosing whether to apply race-specific norms to a specific clinical case. Although there is clearly no right answer, there is no question that careful deliberation regarding the application of norms in each case is far superior to [any] blanket statements [concerning use or disuse]” (Manly 12). Despite diverse opinions on how to deal with the current situation, calls to resolve differences in acculturation are central to the nexus of replies from all neuropsychologists. Essentially, there is no “one” answer to the question; the administrator of the test has to make a judgment call. Is it appropriate, or not?

Works cited