Murat Polat


Scoring language learners’ writing exams is a difficult task for graders since many task-relevant or irrelevant variables such as the user-friendliness of the rubric, difficulty of the task, students’ handwriting or grader characteristics (being too lenient or harsh) are involved in the process. To be able to gain valid and reliable scores, studying the variables that affect scoring procedures and seeking ways to control and minimize them are crucial concerns for institutions in order to assure their learners that their assigned scores are genuine and given in the least subjective way that could be possible. That is why analysing grader attitudes while scoring and identifying the stringent and lenient graders in the rater-pool is important not only to be able to set the best matches of graders where multiple scorings or cross-marking sessions are applied but for making those raters be aware of their scoring habits. In this exploratory study, 6 writing graders who had more than 10-year-expertise in grading writing voluntarily scored 20 student essays including two separate tasks. MFRM (Many Faceted Rasch Measurement) was used to explore graders’ marking behaviours and discover how those behaviours affect test scores of language learners. Finally, results of the study showed that graders, while they all used the same rubric and had enough expertise in grading, have significant score differences and a significant level of stringency in scoring essays.


Article visualizations:

Hit counter



testing, subjectivity, reliability, severe graders, lenient graders, Rasch analysis, rater effect

Full Text:



Anastasi A, 1988. Psychological Testing, 6th Edition. New York: Macmillan.

Boone, W.J., 2016. Rasch Analysis for instrument development: Why, When and How?. CBE Life Science Education. 15-1/7, 2016.

Congdon, P.J., McQueen, J. 2000. The stability of rater severity in large-scale assessment programs. Journal of Educational Measurement, 37(2), 163–178.

Coniam, D., Falvey, P. 2007. High-stakes testing and assessment. In J. Cummins & C. Davison (Eds.), International handbook of English language teaching (pp. 457– 471). New York: Springer.

Di Nisio, R. 2010. Measure school learning through Rasch Analysis: the interpretation of results. Procedia - Social and Behavioral Sciences, Volume 9, 2010, Pages 373-377.

Eckes, T. 2005. Examining rater effects in TestDaF writing and speaking performance assessments: A multi-faceted Rasch analysis. Language Ass. Quarterly, 2(3), 197– 221.

Kondo-Brown, K. 2002. An analysis of rater bias with FACETS in measuring Japanese L2 writing performance. Lang. Test, 19, 1-29.

Lane, S., Stone, C.A. 2006. Performance Assessment. In R. L. Brennan (Ed.): Educational Measurement (pp 387-431). Wesport, CT: ACE/Praeger.

Linacre, J.M. 1989. Many-facet Rasch measurement. Chicago: MESA Press.

Linacre, J.M. 2002. Optimizing Rating Scale Category Effectiveness. Journal of Applied Measurement, 3, 85-106.

Linacre, J.M. 2009. FACETS (Computer program, version 3.66.1). Chicago: MESA Press.

Linacre, J.M., Wright, B.D. 2002. Construction of Measures from Many-Facet Data. Journal of Applied Measurement, 3, 484-509.

Lumley, T. 2005. Assessing second language writing: The rater’s perspective. Frankfurt am Main: Peter Lang.

Lumley, T., McNamara, T.F. 1995. Rater characteristics and rater bias: Implications for training. Language Testing, 12(54), 54–71.

Lunz, M.E., Wright, B.D., Linacre, J.M. 1990. Measuring the impact of judge severity on examination scores. Applied Measurement in Education, 3(4), 331–345.

MacMillan, P.D. 2000. Classical, Generalizability, and multifaceted Rasch detection of interrater variability in large, sparse data sets. Journal of Experimental Education, 68(2), 167–190.

McNamara, T.F. 1996. Measuring second language performance. London: Longman.

Messick, S. 1995. Standard of validity and the validity of standard in performance assessment. Educational Measurement: Issues and Practice, 14(4), 5–8.

Myford, C.M., Wolfe, E.W. 2004. Detecting and Measuring Rater Effects Using Many- Facet Rasch Measurement: Part I. In E. V. Smith y R.M. Smith (Eds.). Introduction to Rasch Measurement (pp. 460-515). Maple Grove, MN: JAM Press.

Park, T. 2004. An Investigation of an ESL Placement Test of Writing Using Many- facet Rasch Measurement, Papers in TESOL & Applied Linguistics, 4, 1-21.

Prieto, G. 2011. Evaluación de la ejecución mediante el modelo Many-Facet Rasch Measurement. Psicothema, 23, 233-238. [Performance assessment using Many- Facet Rasch Measurement].

Razak, N., Khairani, A.Z., Thien, L.M. 2012. Examining Quality of Mathemtics Test Items Using Rasch Model: Preminarily Analysis. Procedia - Social and Behavioral Sciences, Volume 69, 24 December 2012, Pages 2205-2214.

Tyndall, B., Kenyon, D. M. 1996. Validation of a new holistic rating scale using Rasch multi- faceted analysis. En A. Cumming y R. Berwick (Eds.), Validation in language testing (pp. 39-57). Clevedon: Multilingual Matters.

Wolfe, E.W. 2004. Identifying rater effects using latent trait models. Psychology Science, 46(1), 35–51.

Wright, B.D., Masters, G.N. 1982. Rating scale analysis. Chicago: MESA Press.

Wright, B.D., Linacre, J.M. 1994. Reasonable mean-square fit values. Rasch Measurement Transactions, 8, 370-371.

Wu, S.M., Tan, S. 2016. Managing rater effects through the use of FACETS analysis: the case of a university placement test, Higher Education Research & Development, 35:2, 380-394, DOI: 10.1080/07294360.2015.1087381



  • There are currently no refbacks.

Copyright © 2015 - 2023. European Journal of Foreign Language Teaching (ISSN 2537-1754) is a registered trademark of Open Access Publishing GroupAll rights reserved.

This journal is a serial publication uniquely identified by an International Standard Serial Number (ISSN) serial number certificate issued by Romanian National Library (Biblioteca Nationala a Romaniei). All the research works are uniquely identified by a CrossRef DOI digital object identifier supplied by indexing and repository platforms.

All the research works published on this journal are meeting the Open Access Publishing requirements and can be freely accessed, shared, modified, distributed and used in educational, commercial and non-commercial purposes under a Creative Commons Attribution 4.0 International License (CC BY 4.0).