|
|
|
|
THERE are few issues in foreign language education that provoke as much anxiety-among professionals and students alike-as language testing. This is as it should be: educators should never forget that any decisions they base on the results of these tests can have far-reaching consequences. Bachman has argued that, in our litigious culture, testers even run the risk of being taken to court and forced to defend the instruments they use, say, as college entrance or exit requirements or for employment purposes. Thus it is no surprise that the ACTFL oral-proficiency interview (OPI), which is enjoying increasing popularity and widespread acceptance, has become embroiled in controversy. Freed, Chastain, and Barnwell each provide interesting perspectives on the OPI debate in a recent issue of the ADFL Bulletin. Freed, for example, points out that a number of well-known scholars have vehemently expressed compelling and coherent objections to the OPI and to the proficiency movement in general (53). She argues correctly that the teaching profession would do well to rethink the assumptions of proficiency-based testing and its companion teaching methodology on an ongoing basis.
The purpose of this paper is to respond to some of the criticism of the OPI that Freed, Barnwell, and Chastain cite. I will not argue that the OPI is a problem-free test. In fact I agree with what I take to be the main thrust of their articles: the OPI, like any other test, needs research into its reliability and validity. However, the arguments expressed by its critics are not as compelling and coherent as they are reported to be. I will focus on two that are frequently cited in the proficiency debate. The first is that the OPI is logically unsound because it is an elaborate form of question begging (see the essays by Lantolf and Frawley). Let us call this the logical argument, for convenience. The second is that the OPI overemphasizes grammar at the expense of other skills, like sociolinguistic and discourse competence (Savignon; Valdman, Introduction; Raffaldini). Again for convenience, let us call this the narrow-view argument, following Savignon. In addition I want to address a larger issue that has been raised in both arguments and indeed in nearly all the criticism of the OPI; namely, that the proficiency guidelines are theoretically or scientifically unsound and empirically unsupported.
I intend to show (1) that the logical argument is based on a misrepresentation of several concepts of sentential logic; (2) that the narrow-view argument implies a standard of test validation that simply cannot be met, by the OPI or by any other test, given our limited understanding of linguistic ability; and (3) that the arguments about the theoretical underpinnings of the OPI are themselves based on unsound and unsupported assumptions about the scopes of linguistic theory, of second-language-acquisition (SLA) research, and of empirical investigation. When academia endorses a test, it must make any number of suppositions about what demands should be imposed on students. But much of the debate over these suppositions should fall in the domains of curriculum development and academic policy-making. Thus the debate is an axiological matter that cannot be resolved only by appeals to logic and linguistic theory.
Space limitations will not allow a comprehensive review of the pros and cons of proficiency-based testing, and in any case I can add little. Omaggio has written a good synopsis of the arguments in favor, and the lion's share of a 1988 issue of a professional journal, Studies in Second Language Acquisition 10, as well as the proceedings of a 1987 symposium on foreign language testing (see Valdman), is devoted to arguments against. Thus I will assume that the reader has some familiarity with the main points of the debate and turn directly to the logical argument, which alleges circularity in the reasoning behind the OPI guidelines. This position has been forcefully expressed in two papers by Lantolf and Frawley, who note that the issues raised by their work have largely gone unaddressed (Understanding the Construct 181). For that reason their papers deserve special attention here. In Critical Analysis they introduce and define a number of terms that are crucial to their argument and therefore necessary to introduce here. First, they claim that the OPI guidelines state analytic truths. An analytic truth, they say, is something that is true by definition, convention, or the rules of language; a synthetic truth is something that is true by empirical test. Analytic truths, because they can be arbitrary, are generally suspect (334). Second, they claim that this analytic logic yields criterion-reductive tests; that is, the logic of the levels and of their criteria is symmetric implication. [T]he criteria are the levels and vice versa (340). They say that symmetric implication is equivalent to equality, or mutual substitutability, when X implies Y and Y implies X (345). Next, they say that the guidelines argue for true entailment, which they define as true or false implication. If X is true, then Y is true. If X is false, then Y is false (345). From all this they conclude that the definitions of levels in the guidelines are circular and therefore not empirically based.
Before taking this argument at face value let us sharpen our terminology. In the first place, analytic sentences are inherently neither more nor less arbitrary than synthetic sentences. The sentence All Advanced-level speakers can narrate with paragraph-length, connected discourse, which is analytic, is arbitrary inasmuch as ACTFL proficiency testers use a particular term for a particular type of speaker, though surely some other term could be used. A sentence like All Advanced-level speakers at our university have names that are nonpalindromic, which is synthetic (i.e., can only be verified empirically), is arbitrary insofar as it says nothing of theoretical import about the students in question. Curiously, Lantolf and Frawley cite Quine's Word and Object in support of their position, but in that work Quine says nothing about analytic sentences being more arbitrary than synthetic sentences. He says that an analytic sentence is one that is falsifiable purely by meaning and independently of collateral information (65). If that is so, such sentences are hardly suspect. Their truth value is quite easy to determine.
Furthermore, Lantolf and Frawley's definition of entailment as true or false implication is meaningless. It is not true that all examples of entailment have one of the two forms given in Lantolf and Frawley's definition. Perhaps the most perspicuous definition of entailment is that X entails Y when the statement X is true and Y is false is a contradiction (Kegley and Kegley 291). As such, entailment is not synonymous with all if then statements. Manicus and Kruger say that [t]he confusion between the two is a common one and must be guarded against at all costs (75). To illustrate, the sentence If my French 101 students are Novices then they are not fluent is an if then statement, and it is an example of entailment. The sentence If my French 101 students are Novices then only thirty-three percent of all my students must be Intermediates is an if then statement, but it is not an example of entailment. Lantolf and Frawley illustrate their use of the term by saying fly entails leave the ground but leave the ground does not entail fly since it could entail jump or leap (338). This is an unfortunate example because leave the ground simply does not entail jump or leap; the entailment is the other way around. It means nothing to say that leave the ground could entail jump or leap.
Now, in some logic texts implication serves as a synonym for entailment, while in others it applies to if then statements whether or not they are examples of entailment. If Lantolf and Frawley mean that entailment and implication are the same, then they have merely used two words where one would suffice. If they mean that implication is any if then statement, then they have simply defined the term inaccurately. To be precise, a statement of the form if X, then Y is false if the antecedent X is true and the consequent Y is false; otherwise it is true. Thus it would be misleading to characterize a sentence like If it is true that you must be able to narrate in order to be a Superior, then it must be true that all Superiors are novelists as true implication, according to Lantolf and Frawley's definition, because the second part of the sentence is false as a matter of fact; thus the statement as a whole is false. In other words, implication is a logical operation. Conjunction and disjunction are other operations, just as multiplication is a mathematical operation. True or false implication is no more meaningful than true or false multiplication.
Finally, a circular definition is one in which the term being defined is repeated in the definition (e.g.,a gardener is someone who works in a garden). It is not one in which the definiendum (the term to be defined) and the definiens (the expression that defines the term) are mutually substitutable. In fact, they are supposed to be mutually substitutable in a valid definition, as Manicus and Kruger point out (28). If the OPI levels and their criteria were not mutually substitutable, the guidelines would be either too broad or too narrow in scope. Lantolf and Frawley are criticizing the guidelines for having the very property that makes for good definitions.
In short, Lantolf and Frawley do not present a coherent logical framework within which the OPI or any other language test can be evaluated. Perhaps what they have in mind is that some definitions are more useful or more theoretically meaningful than others. This is a reasonable argument to pursue and one that deserves careful attention. For instance, suppose we posit the construct botanicolinguistic competence (i.e., knowledge of the words used to denote plants) and say that one who knows such-and-such number of words pertaining to plants is botanicolinguistically competent. There is nothing wrong with the internal logic of this definition. Its terms are mutually substitutable, and it is precise enough to permit empirical tests of hypotheses about language learners. The problem is that it is not obvious what this construct has to do with language learning or language testing. Thus SLA research must, of course, allow that some test other than the OPI can be devised, that such a test may specify, say, ten levels instead of the OPI's five, that its criteria may be different from those in the guidelines, that some of its definitions may capture intuitions about speaking ability better than do those in the guidelines, and so on. To my knowledge no one connected with the proficiency movement has denied this. Moreover, it is legitimate to argue that, for example, one test or another includes spurious criteria or weighs some criteria too heavily. This is essentially the narrow-view argument that identifies what Valdman calls the Achilles' heel of the OPI (Introduction 126). That is, since grammatical competence is only one of many important skills, and since the OPI emphasizes grammar more than it should, we must conclude that it is not a good test.
But already we are begging important questions. Given two or more methods of measuring language ability, what sort of arguments or evidence can sanction one over the others? How can we know the degree to which one skill is more important than others? Two things come to mind. First, let us suppose that we are to determine the correctness, or appropriateness, of testing criteria on linguistic-theoretical or psycholinguistic grounds, as is often proposed. Second, let us leave open the possibility that this matter primarily concerns setting educational standards or promoting the skills deemed appropriate, by convention, for particular circumstances (the position implicit in Chastain's paper). In other words, perhaps the criteria for any language test should be treated as stipulative definitions, therefore to be appraised as useful or misleading, clear or obscure, but not as true or false (Kegley and Kegley 93).
Now most of the critics have come down strongly in favor of the psycholinguistic approach, though they have used different terms to get at this idea. In general the suggestion is that a test of speaking ability should measure some objectively real language behavior without recourse to subjective criteria based on what the tester thinks a learner should know. Lantolf and Frawley, for example, argue that the guidelines do not reflect the world (Critical Analysis 341), Raffaldini speaks of the differences between what the OPI measures and authentic communication (201), and Barnwell asks whether samples gleaned from the OPI genuinely mirror the process of the native speaker (46). Valdman has explicitly claimed that the matter is to be settled on these grounds:
Few SLA specialists have considered research questions surrounding the OPI, such as how the various linguistic levels (phonology, morphosyntax, lexicon, discourse organization, etc.) contribute to relative levels of communicative ability. As a result, it is fair to say that although the OPI may be experientially based, its theoretical underpinnings are shaky and its empirical support, scanty.
(Introduction 121)
Is it possible that these questions have been neglected because they are too vaguely formulated to be treated in linguistic theory? The feasibility of the research program Valdman is alluding to is at least worth reflecting on.
Let us suppose, for the sake of argument, that the OPI does emphasize grammar more than it emphasizes sociolinguistic ability and that we know this because we have done extensive empirical research with other tests that measure these constructs independently. Results of this sort might be interpreted in one of two ways: (1) in the real world sociolinguistic competence is a relatively minor factor in explaining language ability, so that the OPI is justified in not stressing this area, or (2) in the real world sociolinguistic competence is very important, and therefore the OPI doesn't measure real-world language behavior. There will be nothing in the empirical findings themselves, whatever they are, to show either of these interpretations to be correct. The study by Raffaldini factor-analyzed results from the OPI and other tests, including one, the multiple-choice situation test (MCST), that is intended to measure language use in a range of socio-cultural contexts and communicative functions. Raffaldini states that the MCST tests the ability of learners to identify, among the choices provided, what they should say in a given situation, rather than what they would say [204) and argues that since the OPI does not measure what the MCST measures, it must not include many sociolinguistic criteria.
In the first place, stating a priori that someone should say such and such in a particular situation presupposes that a tester knows that a speaker intends to conform to or to violate sociolinguistic conventions, that the speaker interprets the situation in the same manner as the tester, and so on. These are, of course, value judgments that undermine any effort to keep the argument within objective linguistic theory. In the second place, it would be a mistake to conclude that Raffaldini's results alone demonstrate that the OPI is at odds with the real world. They may prove, for all anyone knows, that the OPI places just the right amount of importance on sociolinguistic competence. Raffaldini mentions a taxonomy of communicative functions reported in van Ek and states that
[w]hen compared with this taxonomy, the majority of communicative functions described in the guidelines refer to the exchange of factual information and intellectual attitudes. Other types of communicative functions, such as socializing or suasion, or functions that express emotional or moral attitudes, are mentioned here and there at various levels in the guidelines. (20)
But what percentage of one's time in real life is spent socializing or discussing attitudes? What percentage of real-world speech is made up of sociolinguistic competence? Raffaldini does not say. Thus even if the OPI does stress grammatical competence-a claim its advocates deny, incidentally (Byrnes)-we cannot know on any theoretical or empirical grounds or from looking at one of many possible taxonomies of functions that the decision to stress grammatical competence is incorrect.
What is needed to support the narrow-view argument theoretically or empirically is first of all a clearer notion of what is meant by Valdman's phrase contribute to relative levels of communicative ability. Surprisingly, some who have written under the ACTFL proficiency rubric have come close to proposing the strongest imaginable interpretation of this notion. For instance, Bragger discusses the five linguistic components of the Higgs and Clifford relative contribution model and states that at the level of the Educated Native Speaker, these factors contribute 20 percent each to make up the total speech act (102; emphasis mine). Does Bragger mean that researchers are to analyze test results by adding up the rules of grammar, the word meanings, the discourse rules, and so on? Do all grammar rules carry the same weight as discourse rules (if, indeed, both these categories can be adequately defined in the first place)? Is it safe to assume homogeneity across all speech acts, contexts, and speakers? Is it possible to factor out irrelevant or redundant information? Surely there is no theory that even approaches such a degree of specificity.
Unfortunately, no one has given a better definition of the slippery term contribute. Not many doubt that the various linguistic levels differ qualitatively; for example, grammatical phenomena differ fundamentally from sociolinguistic phenomena. But whether or not one can even hope to show how they differ quantitatively is another matter. Perhaps this can be done by (1) developing an objective global measure of real-world communicative ability, (2) developing measures of particular traits like grammatical competence and discourse competence, (3) using the measures of components to predict and explain the global measure, and (4) comparing the results to similar studies of the OPI and other tests. But this approach, as well, seems fraught with difficulty, and Valdman himself is skeptical. He points out that one of the most challenging enterprises in SLA research is to show how the variables of a model interrelate in accounting for various types of verbal communication but admits that
[u]nfortunately, our limited understanding of the process of language learning and the absence of any integrated theory of communicative competence make this interrelating process exploratory at best. (Introduction 124; emphasis mine)
Why, then, should we demand that ACTFL engage in a research program that is doomed to failure? If there is no integrated theory of communicative competence, there cannot be an integrated definition of real-life speaking ability, and the type of research Valdman alludes to is out of the question. In other words, the narrow-view argument is simply untestable. Neither ACTFL nor anyone else should try to support testing criteria on linguistic or psycholinguistic theory if no one can show that reliable results are possible, at least in principle.
A forthright evaluation of the debate on proficiency thus shows that much of the controversy stems from terms like theoretical and empirical, as well as from a number of terms of sentential logic, and that none of these terms has been accorded the precision it deserves. Chastain aptly notes that the result is the fuzzy thinking that characterizes our discussions (48). For example, Lantolf and Frawley urge that the Guidelines be prevented from penetrating any further into the foreign language curriculum than they already have until a sound theory of proficiency has been established and adequately evaluated (Understanding the Construct 184). Later in the same paper they state their intentions to outline a picture of what a serious theory of [oral proficiency] should address (186). They present no such theory, however, and they give few indications that one will come about in the foreseeable future. Indeed, much of what they write exemplifies the understandable frustration that SLA researchers seem to feel in seeking theoretical bases for language tests. At one point Lantolf and Frawley claim that
there is a difference between individuals who simply react to their world and those who actively create that world. The former, the preferred output or result of all idealized, algorithmically based models of communicative competence, respond deterministically to others who respond deterministically to them. (187)
They do not say what sort of theory might allow researchers to test objectively and empirically the hypothesis that some individuals create their world, and they do not spell out the details for incorporating such an opaque claim into a theory of language ability or even show that the claim is relevant to the business of language testing.
I am not proposing the exclusion of all empirical research and theoretical considerations from the process of test validation. It is legitimate, indeed necessary, to ask whether those who have developed a test can demonstrate its reliability to a satisfactory degree. It is legitimate to wonder whether a test score has as much to do with interviewers as it does with interviewees, and if so whether the elicitation techniques or the training procedures are to blame. Bachman has addressed this issue in depth, and his arguments must be taken seriously. It is equally legitimate, however, to raise a number of questions about the appropriateness of test criteria for particular instructional objectives at particular schools, about potential biases for or against speakers with particular backgrounds (recall the Educated Native Speaker debate; it is one thing to say that a test favors the educated, quite another to say that this favoritism is good or bad), about the type of linguistic behavior a test encourages, about the wisdom of relying on linguists and academicians to establish test criteria. None of these issues can be settled by empirical research and linguistic theory alone.
While I disagree with Freed's and Barnwell's positive assessments of the criticism of the OPI up to now, it is refreshing to note that they, along with Chastain, have moved the debate away from the poorly defined union of linguistic theory and test validation and into an arena that permits more meaningful discussion of the merits and shortcomings of proficiency testing. Barnwell's article deserves special attention for its insightful remarks on judgments by and about native speakers. And it is hardly possible to disagree with Freed's call for far-sighted leadership and careful and multifocused research (56). Moreover, Freed is historically correct in saying that the guidelines have been severely criticized for failing to acknowledge the underlying notion of communicative competence (54). But such criticism is wide of the mark because, as Valdman says, there is no integrated theory of communicative competence and, in Savignon's words, linguists and psychometricians are far from reaching a consensus on exactly what communicative competence is (130).
At least for now oral-proficiency testing cannot be analogous to judging a 100-meter sprint, where performance can be accurately documented with a stopwatch. Proficiency is much more like judging a gymnast's routine on the parallel bars, where a score is based on the impressions of an observer and is in keeping with a set of conventions and standards. Such conventions and standards cannot be adequately debated within the confines of linguistic theory. These are matters of educational policy and should be treated as such. If the line of reasoning laid out in the numerous critiques of proficiency testing were carried to its logical conclusion, the language-teaching profession would end up with a sort of linguistic solipsism in which no evaluation could take place until linguists somehow provided an objective account of the totality of language behavior.
The author or is Lecturer in the Department of French at the University of California, Los Angeles.
American Council on the Teaching of Foreign Languages. ACTFL Proficiency Guidelines. Hastings-on-Hudson: ACTFL, 1986.
. ACTFL Provisional Proficiency Guidelines. Hastings-on-Hudson: ACTFL, 1982.
Bachman, Lyle. Problems in Examining the Validity of the Oral Proficiency Interview. Valdman, Proceedings 29–43.
Barnwell, David. Proficiency and the Native Speaker. ADFL Bulletin 20.2 (1989): 42–46. [Show Article]
Bragger, Jeannette D. Materials Development for the Proficiency-Oriented Classroom. Foreign Language Proficiency in the Classroom and Beyond: ACTFL Foreign Language Education. Ed. Charles J. James. Lincolnwood: Natl. Textbook, 1985. 79–115.
Byrnes, Heidi. Second Language Acquisition: Insights from a Proficiency Orientation. Defining and Developing Proficiency: Guidelines, Implementations, and Concepts. Ed. Heidi Byrnes and Michael Canale. Lincolnwood: Natl. Textbook, 1987. 107–31.
Chastain, Kenneth. The ACTFL Proficiency Guidelines: A Selected Sample of Opinions. ADFL Bulletin 20.2 (1989): 47–51. [Show Article]
Freed, Barbara E Perspectives on the Future of Proficiency-Based Teaching and Testing. ADFL Bulletin 20.2 (1989): 52–57. [Show Article]
Higgs, Theodore V., and Ray Clifford. The Push toward Communication. Curriculum, Competence and the Foreign Language Teacher. Ed. Higgs. Lincolnwood: Natl. Textbook, 1982. 57–79.
Kegley, Charles, and Jacquelyn Anne Kegley. Introduction to Logic. Lanham: Bell, 1984.
Lantolf, James P., and William Frawley. Oral Proficiency Testing: A Critical Analysis. Modern Language Journal 64 (1985): 337–45.
. Proficiency: Understanding the Construct. Studies in Second Language Acquisition 10 (1988): 181–95.
Manicus, Peter T., and Arthur N. Kruger. Logic: The Essentials. New York: McGraw, 1976.
Omaggio, Alice C. Teaching Language in Context: Proficiency-Oriented Instruction. Boston: Heinle, 1986.
Quine, Willard V. O. Word and Object. Cambridge: MIT P, 1960.
Raffaldini, Tina. The Use of Situation Tests as Measures of Communicative Ability. Studies in Second Language Acquisition 10 (1988): 197–216.
Savignon, Sandra J. Evaluation of Communicative Competence: The ACTFL Provisional Proficiency Guidelines. Modern Language Journal 60 (1985): 129–34.
Valdman, Albert. Introduction. Studies in Second Language Acquisition 10 (1988): 121–28.
, ed. Proceedings of the Symposium on the Evaluation of Foreign Language Proficiency. Bloomington: Committee for Research and Development in Language Instruction, Indiana U, 1987.
van Ek, J. A. The Threshold Level in a European Unit/Credit System for Modern Language Learning by Adults. Strasbourg: Council of Europe, 1975.
© 1990 by the Association of Departments of Foreign Languages. All Rights Reserved.
|
|---|
|
|
|