Language Testing

Estudo de caso de estudo de teste da habilidade oral
of 32
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Transcript Language Testing DOI: 10.1191/0265532206lt326oa 2006; 23; 167 Language Testing  Cyril J. Weir and Jessica R.W. Wu case study of a semi-direct speaking testEstablishing test form and individual task comparability: a   The online version of this article can be found at:   Published by:   can be found at: Language Testing Additional services and information for Email Alerts: Subscriptions: Reprints: Permissions: Journals Online and HighWire Press platforms): (this article cites 13 articles hosted on the Citations   unauthorized distribution. © 2006 SAGE Publications. All rights reserved. Not for commercial use or October 6, 2007 at SWETS WISE ONLINE CONTENT onhttp://ltj.sagepub.comDownloaded from   Establishing test form and individualtask comparability: a case study of asemi-direct speaking test Cyril J. Weir Universityof Luton and Jessica R.W. Wu Language Training and Testing Center, Taiwan  Examination boards are often criticized for their failure to provide evidenceof comparability across forms,and few such studies are publicly available.This study aims to investigate the extent to which three forms of the GeneralEnglish Proficiency Test Intermediate Speaking Test (GEPTS-I) are parallelin terms of two types of validity evidence:parallel-forms reliability and con-tent validity. The three trial test forms,each containing three different task types (read-aloud,answering questions and picture description),were admin-istered to 120 intermediate-level EFL learners in Taiwan. The performancedata from the different test forms were analysed using classical proceduresand Multi-Faceted Rasch Measurement (MFRM). Various checklists werealso employed to compare the tasks in different forms qualitatively in termsof content. The results showed that all three test forms were statistically par-allel overall and Forms 2 and 3 could also be considered parallel at the indi-vidual task level. Moreover,sources of variation to account for the variabledifficulty of tasks in Form 1 were identified by the checklists. Results of thestudy provide insights for further improvement in parallel-form reliability of the GEPTS-I at the task level and offer a set of methodological proceduresfor other exam boards to consider. I Introduction In 1999,to promote the concept of life-long learning and to furtherencourage the study of English,the Ministry of Education in Taiwancommissioned the Language Training and Testing Center (LTTC) todevelop the General English Proficiency Test (GEPT),with the aimof offering Taiwanese learners of English a fair and reliable Englishtest at all levels of proficiency. The test is administered at five levels – Elementary,Intermediate,High-Intermediate,Advanced,and Language Testing 2006 23 (2) 167–19710.1191/0265532206lt326oa© 2006 Edward Arnold (Publishers) Ltd Address for correspondence:Cyril Weir,Powdrill Chair in English Language Acquisition,LutonBusiness School,Putteridge Bury,Hitchin Road,Luton,LU2 8LE,UK;    unauthorized distribution. © 2006 SAGE Publications. All rights reserved. Not for commercial use or October 6, 2007 at SWETS WISE ONLINE CONTENT onhttp://ltj.sagepub.comDownloaded from   Superior – each level including listening,reading,writing,andspeaking components.A major consideration in developing a speaking proficiencycomponent for use within the GEPT program was that it be amenableto large-scale standardized administration at GEPT test centersisland-wide. For the GEPT Intermediate Speaking Test (GEPTS-I),with normally over 20,000 candidates in each administration,it wasconsidered too costly and impractical to use face-to-face interviews,involving direct interaction between the candidate and an interlocu-tor who would have had to be a trained native or near-native speakerof English. A semi-direct tape-mediated test conducted in a languagelaboratory environment was more feasible.In tests such as the GEPTS-I,limited availability of languagelaboratory facilities necessitates the use of different test forms inmultiple administrations to enhance test security. As a consequence,demonstrating the comparability of these forms is essential to avoidcriticisms of potential test unfairness. The administration of multipleforms of a test in independent sessions provides alternate-form coef-ficients,which can be seen as an estimate of the degree of overlapbetween the multiple forms. Thus,in the quantitative aspect of thisstudy,candidates’scores achieved on one form were comparedstatistically with scores achieved by them on an alternate form.Parallel-form reliability may be influenced by errors of measure-ment that reside in testing conditions and other contextual factors.The quantitative analysis of test score difficulty was thus comple-mented in this study by collection of data on rater perceptions of anumber of contextual parameters with regard to each individual task type in the GEPTS-I tests.Skehan (1996) attempted to identify factors that can affect thedifficulty of a given task and which can be manipulated so as tochange (increase or decrease) task difficulty. Skehan proposes thatdifficulty is a function of code complexity (lexical and syntacticdifficulty),cognitive complexity (information processing andfamiliarity),and communicative demand (time pressure).A number of empirical findings have revealed that altering task difficulty along a number of these dimensions can have an effect onperformance,as measured in the three areas of accuracy,fluency,andcomplexity (Robinson,1995; Foster and Skehan,1996; 1999;Skehan,1996; 1998; Skehan & Foster,1997; 1999; Mehnert,1998;Norris etal. ,1998; Ortega,1999; O’Sullivan etal. ,2001;Wigglesworth,1997). Recent research such as that by Iwashita etal. (2001) has raised some doubts over the findings on some of the 168  Establishing test form and individual task comparability  effects of these variables on performance and generated interest inpossible reasons for such differences in findings.However,the focus for our particular study is not the actual effectson performance of intra-task variation in terms of these difficultyparameters but whether these variables are in fact equivalent in thethree comparable tasks under review. Therefore,in evaluatingwhether the three trial test forms of the GEPTS-I and the three taskswithin them are equally difficult,it was thought useful to determineequivalence according to the parameters established earlier in thisintra-task variability research:code complexity,cognitive complexity,and communicative demand. II Comparability of forms and tasks 1Establishing evidence of statistical equivalence Exam boards have been the subject of criticism for not demon-strating the parallelness of forms (or tasks within these) used in andacross administrations. Spolsky (1995) makes this criticism of exam-inations,echoing Bachman etal. (1995),and it was repeated inChalhoub-Deville and Turner (2000). Failure to address this issuemust cast serious doubt on the comparability of tests/examinationsacross forms and raise concern over their value for end users.In addition to establishing parallelness of test forms we are alsoconcerned in this article with establishing parallelness at the task level across test forms. The type of intra-task variation researchreferred to above must be contingent on having two parallel tasksand then manipulating a variable of interest in one of them; other-wise,all subsequent comparisons are flawed. In much of thisresearch (one of the few exceptions being Iwashita etal., 2001),there appears to be little evidence that the parallelness of the tasksemployed had been established prior to any manipulation in respectof a single variable,which must cast some doubt on findings. Inresearch by exam boards to explore the effects of any potentialchanges to a test task – e.g. increasing planning time,providingstructured prompts,or providing an addressee – it is obviously a sinequa non that two parallel tasks first need to be established before theeffect of the change can be investigated.The administration of parallel (alternate) forms of a test inindependent sessions provides us with alternate-form coefficients.The two tests are normally given to an appropriate single groupoflearners with a short period between the two administrations. Cyril J. Weir and Jessica R.W. Wu 169


Jul 23, 2017
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks