  Native- and nonnative-speaking EFLteachers’ evaluation of Chinesestudents’ English writing Ling Shi  University of British Columbia  This study examined differences between native and nonnative EFL (English asa Foreign Language) teachers’ ratings of the English writing of Chinese universitystudents. I explored whether two groups of teachers – expatriates who typicallyspeak English as their first language and ethnic Chinese with proficiency inEnglish – gave similar scores to the same writing task and used the same criteriain their judgements. Forty-six teachers – 23 Chinese and 23 English-background –rated 10 expository essays using a 10-point scale, then wrote and ranked threereasons for their ratings. I coded their reported reasons as positive or negativecriteria under five major categories: general, content, organization, language andlength. MANOVA showed no significant differences between the two groups intheir scores for the 10 essays. Chi-square tests, however, showed that the English-background teachers attended more positively in their criteria to the content andlanguage, whereas the Chinese teachers attended more negatively to the organiza-tion and length of the essays. The Chinese teachers were also more concernedwith content and organization in their first criteria, whereas English-backgroundteachers focused more on language in their third criteria. The results raise ques-tions about the validity of holistic ratings as well as the underlying differencesbetween native and nonnative EFL teachers in their instructional goals for secondlanguage (L2) writing. I Introduction The fairness and construct validity of the ratings of different groupsof raters using the same scale in assessing ESL/EFL (English as aSecond/Foreign Language) writing is a major concern in secondlanguage (L2) writing instruction and testing (Hamp-Lyons, 1991a;Connor-Linton, 1995a; Silva, 1997). Exploring whether L2 writers’texts meet the expectations of native English readers, a growing num-ber of studies have investigated whether native English speakers(NES) and nonnative English speakers (NNS) share the same judge-ment of the students’ writing (James, 1977; Hughes and Lascaratou,1982; Machi, 1988; Santos, 1988; Kobayashi, 1992; Hinkel, 1994; Address for correspondence: Ling Shi, Department of Language and Literacy Education, 2034Lower Mall Road, UBC, Vancouver, BC, Canada V6T 1Z2; email: Ling.Shi  Language Testing   2001 18 (3) 303–325 0265-5322(01)LT206OA  ©  2001 Arnold  304  Evaluation of Chinese students’ English writing Connor-Linton, 1995b; Kobayashi and Rinnert, 1996; Zhang, 1999;Hamp-Lyons and Zhang, 2001). Most studies on the evaluation of L2writing have focused on either heterogeneous ESL students in British(Hamp-Lyons, 1989) and North American universities (Santos, 1988;Cumming, 1990; Brown, 1991; Vaughan, 1991; Hinkel, 1994) or ahomogeneous EFL group, such as Arab university students (Khalil,1985), Greek high school students (Hughes and Lascaratou, 1982),Israeli high school students (Shohamy  et al ., 1992), Chinese univer-sity students (Zhang, 1999; Hamp-Lyons and Zhang, 2001) orJapanese adult students (Machi, 1988; Kobayashi, 1992; Connor-Linton, 1995b; Kobayashi and Rinnert, 1996). Following in this tra-dition, the present study investigates differences between NES andNNS EFL teachers in their judgments of English writing of Chineseuniversity English majors. It aims not only to verify previous  fi ndingsbut also to explore the issue of cooperation between NNS and NESEFL teachers in countries like China where writing programs arecommonly taught jointly by both groups of teachers.Previous research suggests that we know little about whether theNES and NNS raters/teachers give similar holistic ratings and quali-tative evaluations to the same ESL/EFL essays. In a study comparingthe evaluative criteria of 26 American ESL and 29 Japanese EFLinstructors in their ratings of 10 compositions written by adult L1Japanese EFL students, Connor-Linton (1995b) asked half of theraters from each group to rate the compositions holistically and theother half to rate the compositions with an analytical evaluationpro fi le and then to state three reasons for their ratings. The resultssuggested similarities in ratings between the two groups, althoughNES teachers tended to focus on intersentential discourse featurescompared to NNS teachers ’  focus on accuracy in their qualitative judgments (Connor-Linton, 1995b).Compared with Connor-Linton (1995b), other researchers haveused analytical scoring to investigate the extent to which NES andNNS teachers/raters attend to the same features and value the samequalities in student writing. As a result, NES teachers were found tofavour American English rhetorical patterns (Kobayashi and Rinnert,1996) and clarity of meaning and organization (Kobayashi, 1992).They also differed systematically from NNS raters in their judgmentsin terms of students ’  paragraph structuring and political/social stance(Hamp-Lyons and Zhang, 2001) in terms of   ‘ purpose and audience,speci fi city, clarity and adequate support ’  (Hinkel, 1994), and in termsof   ‘ overall organization, supporting evidence, use of conjunctions,register, objectivity and persuasiveness ’  (Zhang, 1999). Furthermore,differences were found in the raters ’  judgments of error gravity. Ingeneral, NES teachers were found to be more tolerant of students ’   Ling Shi  305errors than NNS teachers (James, 1977; Hughes and Lascaratou, 1982;Santos, 1988). Kobayashi (1992), however, observed that NNS instruc-tors would accept grammatically correct but awkward sentences com-pared to NESs. These  fi ndings were all based on pre-determined categ-orical evaluations which might have restricted or mandatedteachers/raters judgments. Some, for example, used decontextualizedor edited student writing to direct the raters ’  attention (Santos, 1988;Hinkel, 1994; Kobayashi and Rinnert, 1996). Research is thereforeneeded, using authentic writing samples and no predetermined evalu-ation criteria, to verify accurately whether NES and NNS teachersscore L2 essays for different reasons. In addition, as these studiesindicating differences in teachers/raters ’  qualitative evaluations didnot compare the analytic judgments using holistic scoring, I thoughtit would be worth trying to verify the  fi ndings using both qualitativeand quantitative judgments.Set in the context of Mainland China, the present study parallelsConnor-Linton ’ s (1995b) study to compare holistic scores and self-reported reasons from teachers with different ethnolinguistic back-grounds. I planned the study with a two-fold purpose, viz. to verifywhether NES and NNS teacher raters differ in (1) their holistic ratingsand (2) their analytical reasons or qualitative judgments in evaluatingEFL students ’  writing. II Method 1 Written samples Ten written samples were selected randomly in the fall of 1998 fromwritings of third-year students in the English department of a univer-sity in Mainland China. From a pool of 86 in-class written assign-ments gathered in no particular order, every eighth essay was selectedfor the study. Students were asked to write, within a 50-minute ses-sion (a common length of a lesson period in Chinese universities), a250-word expository essay on the topic of TV and newspapers. Thedecision on the 250-word length was made based on the suggestionof the three class teachers who administered the task with an attemptto match the present task with other existing writing tasks in the pro-gram. Most students, however, produced much longer essays. The 10essays selected averaged 292 words. (For Essay 1 as an example of essays in the middle range of scores, see Appendix 1.) Apart fromlength, the writing prompt was also a result of negotiation with theclass teachers to incorporate the task into their teaching routines. Hereis the prompt that was used in this study: Nowadays with the popularity of televisions people gain daily news more con-veniently. Some people even begin to play down the advantage of newspapers,  306  Evaluation of Chinese students ’  English writing arguing that it is time that they were replaced by television. To what extentdo you agree or disagree with this statement? Give support for your argument. 2 Teacher raters The 10 written samples were  fi rst sent, together with a letter of invi-tation to participate in the study and a background questionnaire, to70 English expatriates teaching in various tertiary institutions in allparts of China. All of them had then been teaching in China for aminimum of six months and a maximum of about a year. The namelist was provided by Amity Foundation, a Christian organization thathelps Chinese universities to recruit native English teachers. Twenty-four of the NES EFL teachers responded to the letter and sent back the completed questionnaires and their quantitative and qualitativeevaluations of the essays. I then invited 24 EFL NNS teachers, whowere either working or taking an in-service teacher training programin the university where data were collected, to evaluate the same setof essays. In each of the groups, there was one rater who missedratings for several of the essays, which made for an equal number of 23 raters in each group for the study.The information summarized from the questionnaire suggested thatthe participating raters were mostly experienced teachers with someprofessional training. At the time of data collection, the 46 raters, asthey reported in the questionnaires, were teaching at 23 tertiary insti-tutes in 12 cities of China. Of the 23 NES teachers, 14 were UScitizens, 5 British, 2 Canadian, and 2 Norwegians. (The two Norwegi-ans are considered NESs in this study because they were expatriateswho had learnt and used English since elementary school. Withemphasis on multilingualism in the school system and English beingthe second most important language, many educated Norwegians arebilinguals.) Summarizing the rater pro fi les, Table 1 shows that morethan half (frequency of 31, 67%) of the raters were female and 15(33%) were male. In terms of educational background, about half (frequency of 23, 50%) had a master ’ s degree and 17 (37%) had anundergraduate degree. Most raters (frequency of 34, 74%) said theyhad teacher training, whereas 11 (8 NNSs and 5 NESs) reported thatthey had no such training. Of the 46 raters, 18 (39.1%) had taughtEnglish for 1 to 5 years, 11 (23.9%) 6 to 9 years, and 17 (37%)about 10 years. In general, the NES group appeared to be more edu-cated than the NNSs (4 NES teachers had a PhD whereas 1 NNS hadonly a high school diploma) although the latter seemed to have moreteaching experience than the former. (12 NNSs had taught for about10 years compared with only 5 NESs who had similar experience.)

