Academia.eduAcademia.edu
European Journal of English Language Teaching ISSN: 2501-7136 ISSN-L: 2501-7136 Available on-line at: www.oapub.org/edu Volume 2 │ Issue 1 │ 2017 doi: 10.5281/zenodo.268576 TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE Hooshang Khoshsima1, Seyyed Morteza Hashemi Toroujeni2i 1 Ph.D. Associate Professor, English Language Department, Chabahar Maritime University, Chabahar, Iran 2 M.A. in TEFL, English Language Department Chabahar Maritime University, Chabahar, Iran Abstract: Computer-Based Testing (CBT) is becoming widespread due to its many identified positive merits including productive item development, flexible delivery testing mode, existence of self-selection options for test takers, immediate feedback, results management, standard setting and so on. Transitioning to CBT raised the concern over the effects of testing administration mode on test takers scores compared to Paper-andPencil-Based testing. In this comparability study, we compared the effects of two different media (CBT vs. PPT) by investigating the score comparability of General English test taken by Iranian graduate students studying at Chabahar Maritime University to see whether test scores obtained from two testing modes were different. To achieve this goal, two versions of the same test were administered to 100 intermediate-level test takers organized in one testing group in two separate testing occasions. Using paired sample t-test to compare the means, the findings revealed the priority of CBT over PPT with .01 degree of difference at p<05. Utilizing ANOVA, the results indicated that two prior computer familiarity and attitudes external moderator factors had no significant effect on test takers C‛T scores. Furthermore, according to the results, the greatest percentage of test takers preferred test features presented on computerized version of the test. Keywords: computer-based testing, paper-and-pencil-based testing, computer familiarity, computer attitude, test preference Copyright © The Author(s). All Rights Reserved. © 2015 – 2017 Open Access Publishing Group 54 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE 1. Introduction Advances in technology have always had an impressive role in the development of human life. Sometimes technological developments have such great influences on human life that some scholars and sociologists categorize mankind history based on the produced technological tools. Technology has been greatly changing the way we live, work, think, communicate and interact with the others, and its strong continuous endless impact on all aspects of our lives is obvious (Challoner, 2009). According to the assessment researcher, Stuart Bennett who is interested in doing research in measurement, new technology s transformative impacts on assessment domain makes it possible to impel someone manage something well and satisfactorily by building some tests based on the conceptualization of preconditions and qualifications. He also declares that by utilizing technological assessment tools to create tests, test takers performance can be practically assessed through computer based simulations, item and item bank creation and also scoring process. Besides, large-scale delivery test is made possible by using technology and computer in assessment domain (Bennett, 1999, p. 11). New types of assessment have been taken up in educational settings in USA in order to incorporate CBT into the assessment field and to help test designers develop the same test conditions as that of paper-based test for all test takers regardless of test population size (Al-Amri, 2009). Although a serious discussion on the development of ComputerBased Testing (Henceforth CBT) and a great deal of research on developing and implementing high stakes computerized version of testing program began in 70s A.D. decade by some leading works such as ASVAB (Armed Services Vocational Aptitude Battery) program done by USA Defense Department, the Graduate Record Examination (GRE), Test of English as a Foreign Language (TOEFL) and etc.), the real history of computerized fixed-length testing goes back to the decade of 30s A.D. The IBM model 805 machine used in 1935 has been recorded as the first attempt to use computers in testing domain. It aimed to score objective tests of millions of American test takers each year. Use of computer in language testing has resulted in the birth of independent discipline named CBT which has been expedited by CAL (Computer-Assisted Learning). CBT has changed the nature of language assessment field with its potential benefits and capabilities. In fact, CBT may assist language assessment field by helping overcome many common administrative and logistical problems that are widespread in traditional fixed-length testing environment. In fact, by offering new approaches and basic advantages such as easier and more precise test scoring and reporting, item innovation, item generation, greater security, standardization, and test efficiency, test booklets and answer sheet elimination, more flexible scheduling, reduced measurement European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 55 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE errors, and etc., CBT opened new windows and laid foundations for future assessment in educational testing. In examining perceptions on CBT, several issues have been identified to organize the advantages and challenges of CBT. The most important benefit of CBT is the innovation, efficiency and productivity that can be achieved in this area (Al-Amri, 2009). In CBT, input materials are presented in text, graphics, audio, and video which simulate target language situations and develop the authenticity of test tasks by enhancing the interaction between test takers and test tasks. In education, CBT is also used to administer the test to evaluate the language proficiency of English learners Fleming & Hiple, . C‛T assesses test taker s language ability accurately by providing more efficient standardization of test administration conditions (Al-Amri, 2009). The same and consistent test conditions are provided by test developers in CBT (Al-Amri, 2009) and the same instructions, materials and information are presented in an enhanced consistent and uniform way to all test takers, regardless of the tests population size, place and time of testing. Moreover, unlike paper examinations in conventional classrooms, immediate viewing of scores on screen is provided in CBT to give test takers the instant feedback. Immediate feedback, accurate test result reports and the possibility of printing the basic testing statistics are other advantages of using computer in assessment field that enable test takers take the test at any time (Mojarrad et al., 2013). CBT provides improved test security, requires less time to finish (Laurier, 1999), creates more positive attitude towards test (Madsen, 1986) and individualizes test experience. The issue that currently needs more attention and prompt investigation of researchers is to study the testing mode effects on comparability and equivalency of the data obtained from two modes of presentation, i.e. traditional paper-and-pencil (PPT) and computerized tests. Comparability studies in second language tests are in short supply, and the importance of conducting comparability studies in local settings to detect any potential test-delivery-medium- especially when a traditional PPT is converted to a computerized one should be considered. The critical issue of establishing comparability and equivalency of computerized test with its paper-and-pencil counterpart is of prime importance. Some research have focused on the equivalency of computer and paper-administered tests in terms of scores (Choi, Kim, and Boo, 2003; Kenyon & Malabonga, 2001; Khoshsima & Hashemi, 2017). Recently, some studies have been conducted to indicate that in order to replace computer-based test with conventional paper-and-pencil one, we need to prove that these two versions of test are comparable. In other words, the validity and reliability of computerized counterpart is not violated. But actually, there is no agreed upon European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 56 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE theoretical explanation for the test mode effects. The comparability is achieved through equivalent scores of two test versions. Since in Iran, however, computerized testing is still at an early experimental stage, the present study would be conducted to provide some helpful and informative findings for those learners, teachers, testing practitioners and researchers who seek to know the possibility of replacing computerized tests with paper-pencil ones. In this study, the testing mode effects on the final performance of test takers is investigated to show whether there is any significant difference between two versions of the same test. It means that whether there is any discrepancy that violates the reliability and validity of the computerized counterpart; the computerized version that is supposed to be replaced with the conventional paper-and-pencil version of the test. In the case of Choi, Kim and Boo (2003), significant cross-mode differences in means of listening, grammar, and vocabulary subtests were examined and the largest cross mode discrepancy was observed in the reading comprehension subtest. About the relationship of computer familiarity as the frequently cited contributor to score differences with the examinee performance on both forms of testing, Wallace and Clariana (2005) said that learner characteristics such as computer experience were associated with higher post-test performance for computerized test (in their case, webbased test). They found out that lower ability learners were less familiar with computers. Watson (2001) also reported that although there was no relationship between age and sex with students performance, students with higher academic attainment and those with greater frequency of computer use benefited mostly from computer based instruction. In addition, some other studies show that students with a good knowledge of computer use feel more free and comfortable to utilize computerized kind of testing O Malley, Kirkpatrick, Sherwood, ‛urdick, Hsieh, & Sanford, 2005; Poggio, et al., 2005). Prior computer experience variable can be introduced as one of the most critical reason causing discrepancies in the performance of testing mode. Some indefinite conclusions concerning to the impact of computer familiarity on performance were resulted from other studies. In one investigation, Lee (1986) distributed a computer experience questionnaire among participants and administered an arithmetic reasoning test via paper and computer medium to reach the conclusion that low- and highcomputer use groups showed no significant differences in performance. Furthermore, individual characteristics of test takers may provide a cornerstone and groundwork for a theory explaining the foundational aspects involved in test performance in two different testing modes. Inevitable questions about test takers reactions and attitudes towards computerized version of paper-and-pencil test are European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 57 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE raised after the introduction of worldwide computerized version of the Test of English as a Foreign Language to evaluate general English proficiency of those whose native language is not English. Some factors that determine the attitudes towards the use of computer in testing setting are based on computer familiarity, knowledge level, skills and abilities, ease of access to computer, formal computer training, gender and some else. Due to the probable impact of these issues on test taking motivation, test performance and thereby on test validity, these issues are of prime importance (Ryan & Ployhart, 2000). Student preference may be considered as another factor whose relation with the performance of test takers on CBT should be examined. Some students have necessary prior familiarity and experience of using computers to play games and receiving some of their instructions through computers. Due to the possibility of customizing the assessment based on personal preferences, some people prefer to take CBT version of the test. For instance, all students have the option to select their own background color and font size preference on computer screen. Although some students may prefer CBT, others may prefer paper and pencil-based test (Cater et al., 2010; Russell et al., 2010).Some test takers prefer paper-based testing process because they are accustomed to taking notes and circling questions and/or answers for later review. 2. Literature Review Popular computerized testing has been increasingly implemented across the world so far. Countries such as United States of America and United Kingdom have initiated use of computers in their testing and assessment environment for about three decades. When computerized version of examinations has appeared, researchers began making comparisons between PPT and CBT. Consequently, comparability studies were conducted to study testing mode effect. Translation of paper and pencil assessment into computerized version often requires that the computerized form be comparable to its conventional paper and pencil one and the scores and the results obtained from two identical test forms approximate to each other. Interchangeability is required when students may take the same test in either mode. In fact, validity of the computerized version of a test must be confirmed by the same methods of validity determination for its traditional. According to American Educational Research Association (AERA), in the case of using more than one way of test administration or recording the marks obtained from the test (such as marking the right answers in a booklet, separate answer sheet, or onscreen) the guidelines and instructions should express obviously that the scores European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 58 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE received from these ways are equivalent and interchangeable (American Educational Research Association, 1999, p. 70). Empirical research on cross-mode comparability should be conducted to answer whether test scores are equivalent across modes in order to replace CBT with PPT. Although CBT offers some benefits over its traditional counterpart (Poggio, Glasnapp, Yang and & Poggio, 2005), comparability and equivalency of test scores between two test administration modes have been the real concerns for educators, scholars, practitioners and designers in assessment field (Lottridge, Nicewander, Schulz, & Mitzel, 2008). Evaluating the comparability of CBT and PPT scores is critical before introducing the computerized assessment into any educational context. The main objective of a comparability study is to determine if test results obtained from two versions of the same test are equivalent. International Guidelines on Computer-Based Testing (International Test Commission, 2006) states that scores received from CBT and its conventional counterpart should be equivalent. The standards stated by International Test Commission are also supported by classical true-score test theory which is considered as the cornerstone of CBT and PPT (Allen & Yen, 1979). According to this theory, a test taker is expected to receive nearly the same test scores in two modes of test administration. The standards were examined in many comparability researches and supported by some of the empirical studies (e.g. OECD, 2010). According to Boo et al. (2012), the scores obtained from computer and paper-based tests were comparable in terms of internal consistency, criterion and construct validities, means and standard deviations. Test takers also preferred computer counterpart of conventional paperbased test and had positive attitudes towards it. Choi, Kim, and Boo (2003) reported that the results of paper and computer versions of the standardized English Language test administered to postsecondary level language learners were comparable across listening and reading comprehension, grammar and vocabulary subtests which have been proved to measure the same constructs by confirmatory factor analysis. Of course, a more comprehensive and detailed investigation of all these subtests indicated that the reading comprehension and grammar subtests revealed weakest and strongest comparability, respectively (p. 316). In a last comparability study, Khoshsima & Hashemi (2017) concluded that test scores of test takers did not vary in both PPT and C‛T. Their findings confirmed the equivalency of test takers scores obtained from two different testing modes. Florida Department of Education (2006) reported that early examinations of the relationship between computer familiarity and test performance showed significant difference. It means empirical evidences confirmed lower scores of test takers who had European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 59 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE less experience and familiarity with computers. But it also asserted that recent studies show no relationship between them (Florida Department of Education, 2006). In another research, no relationship was found between prior computer experience and computerized TOEFL test performance (Taylor et al., 1999). Since some students bring up unfamiliarity with computerized mode of testing as the main reason of their falling in this kind of testing and complain that their computerized test score is not the real representative of their language proficiency, the necessity of more examination on prior frequent use of computer as a moderator variable in CBT have to be considered. Attitudes towards computerized test play a crucial role in implementing CBT successfully. Attitudes towards computer can be influenced by some other contextual factors such as age, gender, socioeconomic status and etc. Although prior attitudes towards computers may have a direct relationship with prior computer experience, these two constructs are completely distinct from each other. According to Eagly and Shelly (1998), attitude is positive or negative feelings towards a psychological object. In another definition of attitude, Loyd and Gressard (1985) name four components including anxiety, confidence, liking, and usefulness that organize attitude towards computer. Al-Amri (2009) utilized some special sections of CAS questionnaire to study learners attitudes towards computer use. In spite of the fact that students show high preference for C‛T, his research findings indicate no relationship between learners attitudes and their performance on CBT. The same study has been done by Youdbakan and Uzunkavak on learners attitudes towards computer and C‛T in both private and state schools. A researcher made attitude scale was distributed among 784 Turkish primary school learners who participated in the study. The data that was collected from the piloted researcher made questionnaire indicated no significant difference in attitudes towards computer. But the students of state schools showed more positive attitudes towards CBT. Generally, no association effect was found between attitudes towards CBT (Youdabakan and Uzunkavak, 2012). In addition to computer familiarity and computer attitude, testing mode preference of test takers that is typically related to high stakes standardized test administration has attracted much attention in recent researches. Like this study, many studies have been done to examine the preference of test takers on testing administration mode (Al-Amri, 2009; Flowers et al., 2011; Higgins et al. 2005; Khoshsima et al., 2017; Yurdabakan & Uzunkavak, 2012). Testing mode preference is a contributing factor that should be considered in comparability studies. In a research conducted by Flowers et al. , there was a high preference for C‛T, and test takers preference had negative correlation with test takers performance on CBT. According to their findings, although test takers show high preference for taking CBT, they European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 60 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE outperformed on PPT. According to Al-Amri (2009), although test takers preferred to take CBT, their test performance was better on PPT. His research findings show no relationship between test performance and testing mode preference. In another similar study, no correlation between testing mode preference and testing performance of test takers was found (Khoshsima & Hashemi, 2017). Since evaluating the comparability of paper-based and computer-based tests is crucial before introducing computer aided assessment into any context, the present study, first, seeks to examine cross-mode effects on test takers General English scores. The second purpose of the study is to examine the relationship of computer familiarity, prior attitudes towards computer and testing mode preference with testing performance on CBT version. Considering both theoretical and pedagogical perspectives, the following questions are addressed in this study to accomplish the main purposes: RQ1. Is there any statistically significant difference between computer-based language testing and paper and pencil-based one when assessing General English of Iranian graduate students? RQ3. Is there any relationship between two computer familiarity and prior attitudes towards computer external variables and Iranian graduate students testing performance on CBT version of the test? RQ3. Do participants prior testing mode preferences affect their performance on CBT? 3. Methodology 3.1 Research design The present research that covered both comparison and correlational studies explored the comparability of paper and computer-based testing in a General English context and the correlation between some external moderator factors including test takers characteristics such as computer attitude, prior computer experience and testing mode preferences that were believed to be meaningfully related (Warner, 2013) to their testing performance on computer-based language testing in comparison with paper-based version. In order to reach more solid conclusions in this research, a mixed-method approach including both qualitative and quantitative instruments were utilized to investigate the difference between test results due to its advantages such as easy and fast data collection, consistency and accuracy of collected data and proper descriptive and inferential results. The mixed-methods approach of the study combined multiple European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 61 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE choice achievement tests, questionnaires and interviews that were employed in this study. 3.2 Participants The selected participants for the present study were 100 graduate students of Maritime University of Chabahar. After administering New Interchange placement test to 186 graduate students to identify intermediate level students, 128 homogenous students were selected. 28 participants were removed because they were unwilling or unable to complete the study. Of the remaining total participants who were assigned to one testing group to take two versions of the same test, there were slightly more girls (n=60%) than boys (n=40%). The age range of all the 100 students who had signed the consent form to participate in the study was between 23 to 28 years. And, the mean age was 24.5 (Table 1). Table 1: Gender frequency distribution Testing group one Gender Testing group two frequency percentage frequency percentage Male 22 44 18 36 Female 28 56 32 64 Total 50 100 50 100 3.3 Instruments New-Interchange Placement Test was implemented to the participants of the study to the purpose of checking their homogeneity and to make sure that they are homogeneous in terms of general English knowledge and proficiency. The testing group took two versions of a test derived from General English book on separate testing sessions with four weeks interval. The four weeks interval was to mitigate the practical potential, fatigue effects and testing effects. The study employed General English multiple-choice achievement test as the main research data instrument to compare the mean of scores received from both testing modes. The paper version of the test was converted into computer version using ClassMarker.com website. Unlike the paper-based format in which all the question items were presented in three pages, with CBT version of the test, test takers were presented one question per screen. When the question item was presented to the test taker, s/he should click on the letter of the right answer and then proceeded to the next item. Like PPT, test takers could review previously answered questions and change them due to the nature of this kind of computerized testing. European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 62 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE The items order was the same in both versions of the test. To examine the internal consistency Cronbach s alpha of the test on each testing mode, the responses of testing group of the present study were investigated and relatively high reliability coefficients α=. and α =. for PPT and C‛T, respectively, were achieved. The second procedure that was employed in this research attempted to answer the research question two. It was used to see if there was any relationship between two computer familiarity and prior computer attitudes external moderator variables and test takers testing performance on C‛T. To meet this objective, the standard Loyd Gressard Computer Attitude Scale (Loyd and Gressard, 1985) that was validated by Berberoglu and Calikoglu (1992) was distributed to the test takers after implementing CBT version of the test. It should be mentioned that high reliability coefficient was reported on the total score by Loyd and Gressard (1985). Christensen and Knezek (1996) also reported high reliability coefficient value of .95 and stable factorial validity. After examining the internal consistency of CAS questionnaire distributed to the participants, fair reliability coefficient value of .84 was obtained for this study. Another instrument to collect the research data concerning to the third research question was a simple question mentioned at the bottom of exam paper and screen, i.e. would you prefer taking test on paper – no difference – computer to examine the relationship between testing mode preference and performance. Due to the importance of relationship between testing mode preference and testing performance when conducting PPT and CBT, our third research question examined the correlation between test takers testing mode preference and their performance on either testing mode. The last qualitative instrument was a formal semi-structured interview through which a series of data was collected and coded to be analyzed quantitatively. The qualitative research data that was collected to support the quantitative research data came from conducting semi-structured interviews with 30 participants who were randomly selected from the testing group. Based on the previous literature, the questions of the interview were developed by the researcher and then content analyzed by two instructors of TEFL in CMU. 3.4 Procedure New-Interchange Placement Test was administered to 186 graduate students to the purpose of checking their homogeneity. Consequently, the intermediate level students were selected to participate in the research. Then, the testing group was given both versions of General English multiple-choice achievement test in two separate testing sessions with four weeks interval. At the end of both exams, testing group answered the simple question would you prefer taking the test on paper – no difference – computer to European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 63 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE explore the relationship between testing mode preference and testing performance. Before taking CBT version, test takers received a simple sample computerized task and oral instruction on how to take the computerized version of the test. After becoming familiar with the CBT environment, every test taker was given a unique registration code to register into the assigned group created in the website. Test takers had 40 minutes to answer 50 question items (the time given to complete the sample exercise before administration of CBT was not included). On the onscreen test, students received one question per screen. Students clicked on the letter of the correct answer choice and then proceeded to the next question. Like paper-based testing, students could go back, review and change previously answered questions in CBT. And T the last stage, formal semi-structured interviews were conducted through which a series of related qualitative data was collected and coded to be analyzed quantitatively. The participants were asked about their attitudes towards the features of two modes of testing administration, testing mode preference, development of positive or even negative attitudes and their reasons for possible changing mode preference. Some of the participants who changed their preference were also asked about their reasons to change their preferences after taking CBT. In the focus group semi-structured interview, the participants were asked a series of pre-determined open-ended questions on the issue based on a list of topics in a particular order (Interview Guide). The researcher used the interview guide printed on paper that was required to be observed during the conversations in order not to stray from the interview procedure. The interview for each participant took about 7-10 minutes. Totally, 30 interviews took about 250 minutes in one session. The components of interview were a brief introduction of CBT and its history, some questions about participants testing mode preference, and their comments about CBT and PPT features. 4. Results and Discussions The usual procedures for comparability are psychometric characteristics such as the distribution, rank, and correlation of scores on two tests (Choi et al., 2003), shape of the score distribution, reliability, and conditional standard error of measurement (Wang & Kolen, 2001). Aforementioned criteria that are usually considered in comparability study of CBT and PPT are compatible with the criteria that are declared by some testing organizations such as the International Test Commission (ITC) and the American Psychological Association (APA). ITC testing organization states that the designers of computerized tests should produce the interchangeable scores whose means and standard deviations are the same as their PPT counterparts (International Test European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 64 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE Commission, 2006, p. 156-157). A majority of research conducted on PPT and CBT comparison focused on the differences in means and standard deviations (e.g. Khoshsima et al., 2017; Makiney, Rosen, & Davis, 2003; Pinsoneault, 1996). Before exploring the comparability of paper and computer-based testing in the General English context by employing paired sample t-test test, we examined the normality of the data distribution. Shapiro-Wilks and Kolmogorov-Smirnov statistical tests were used to provide objective judgement of data distribution normality. Anyway, the result of normality testing is displayed in Table 2 statistically. Table 2: Normality distribution test Tests of Normality Kolmogorov-Smirnova Shapiro-Wilk Statistic D.F. Sig. Statistic D.F. Sig. PPT .115 100 .890 .931 100 .912 CBT .165 100 .868 .954 100 .951 From Table 2, it was concluded that the research data obtained from two PPT and CBT versions of General English tests administered to testing groups of graduate students in two separate testing sessions were normally distributed. We continued data analysis by conducting paired t-test. The main goal of t-test series conducted in this section was to examine if there was any statistically significant difference in participants testing performance in PPT and C‛T. ‚ccording to the results, the mean score of test takers on PPT testing performance (M = 2.48, SD = .16135) was lower than the mean score of test takers on CBT testing performance (M = 2.51, SD = .15982) (Table 3). Then, of the two versions of the test taken by testing group, the highest mean score was found for the performance of testing group on CBT. Furthermore, the higher standard deviation for PPT results indicated that the dispersion of scores from mean score for CBT was lower. Table 3: Descriptive Statistics of test scores in both PPT & CBT Groups General English Test Independent Samples Statistics PPT Mean 2.4815 CBT 2.5155 N Std. Error Mean 100 Std. Deviation .16135 100 .15982 .03574 .03608 Then, according to the inferential analysis, there was a statistically significant difference between test takers mean scores from PPT M = . , SD = . European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 and C‛T M = . , SD = 65 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE .15); (t (98) =-4.773, P=0.000) (Table 4). It can be concluded that there is a statistically significant difference between the mean scores of the graduate students on both PPT and CBT versions of the test. Results of paired t-test comparing mean scores of test modes are indicated in Table 4. The aim of this test was to gather further evidence to ensure whether two testing administration modes were showing interchangeable results. According to Table 4, t-test revealed that the t-statistic value was 0.000 with 29 degree of freedom at P<0.05. The corresponding two-tailed p-value was 0.000 that was smaller than 0.05. Table 4: Paired t-test results for both PPT and CBT modes of administration Paired Differences Mean t Std. Std. Error 95% Confidence Interval Deviation Mean of the Difference Lower PPT – -.03400 .03185 .00712 D.F. Sig. (2tailed) Upper -.04891 -.01909 -4.773 98 .000 CBT In order to answer the second research question, ANOVA statistical test was to examine the significant difference between computer familiarity and attitudes and testing performance of students. The results in Table 5 indicate that the F Observed value for the students prior computer familiarity and CBT is 1.82 (P =0.895 > 0.05). Based on these results, it can be concluded that the students computer familiarity does not have any significant correlation or interactive effect between computer familiarity and on CBT performance. Additionally, the F observed value for the effect of the prior attitudes towards computer on CBT performance is 1.87 (P = .456 > 0.05). Therefore, it can also be concluded that the prior computer attitudes does not have any significant influence on CBT performance of test takers. Based on the findings, no significant correlation was seen between the participants attitudes towards computer and C‛T performance. Table 5: ANOVA results of interactive effect of computer familiarity and attitudes on CBT performance Source DF F Sig. Mode ⃰ computer familiarity scale Sphericity Assumed 1 1.82 .895 Mode ⃰ computer familiarity scale Sphericity Assumed 1 1.87 .456 To answer the research question three, the relationship between testing mode preference and testing performance was examined. To reach this aim, the correlation European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 66 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE between participants responses to the simple question appearing at the end of PPT exam, i.e. would you prefer taking test: 1.On paper 2.No difference 3.On computer and their mean scores obtained from CBT version of the test was examined. The answers that participants gave to the question were coded as , and for On paper , No difference , and On computer . Table 6 and 7 display the results of the correlations between pre and post-CBT testing mode preferences and CBT testing performance variables. Table 6: Correlations of pre-CBT testing mode preference and mean of CBT scores Mean of CBT Pre-CBT testing mode preference Pearson Correlation .142 Sig. (2-tailed) .312 N 100 The Pearson product-moment correlation was run to examine the relationship between pre-CBT testing mode preference and testing performance. According to the results, for the testing group, the answers of participants to the first testing mode preference question (M=1.86, SD=.89) and CBT performance (M=2.48, SD=.161) were not significantly correlated; .142(98) =.312, P >1. According to the findings it can be concluded that pre-C‛T testing preference mode is not correlated with test takers scores in CBT. Table 7: Correlations of post-CBT testing mode preference and mean of CBT scores Mean of CBT Post-CBT testing mode preference Pearson Correlation .192 Sig. (2-tailed) .436 N 100 The Pearson product-moment correlation was also run to examine the relationship between post-CBT testing mode preference and CBT testing performance. According to the results, for the testing group, the answers of participants to the second testing mode preference question (M=2.46, SD=.81) and their CBT performance (M=2.51, SD=.159) were not significantly correlated; .192(98) =.436, P>1. In the next stage, we examined if test takers performed better on their preferred testing mode according to their pre and post-CBT testing mode preference and testing performance. The descriptive statistics are shown in Table 8. European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 67 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE Table 8: Descriptive statistics of testing group s performance according to participants pre and post-CBT preference and testing performance in two testing administration modes Testing sessions Preferred testing mode N Mean Pre-CBT PPT CBT Std. Deviation Post-CBT Pre-CBT Post-CBT Paper 55 36.12 38.19 10.74 17.20 No difference 15 39 45.12 6.77 10.93 Onscreen 30 48.76 42 25.93 15.94 Paper 15 46 58 12.52 49.33 No difference 10 48 48 13.85 13.85 Onscreen 75 56 44.35 16.87 16.87 According to the findings, in paper-based testing session, participants who preferred to take paper-based version of the test outperformed on CBT (M=38.19) and those who preferred to take computerized version of the test performed in PPT (M=48.76). After implementing CBT version of the test, the answers of testing mode preference question appeared at the bottom of the screen was analyzed. As it was shown in Table 8, those participants of computer-based testing session who preferred to take PPT version of the test performed better on CBT (M=58) and those who preferred to take the test on CBT performed better on PPT (M=56). The findings indicated that there was no interaction between testing mode preference and testing performance of participants. Then, it can be concluded that testing mode preference does not affect test validity. The qualitative research data that was collected to support the quantitative research data came from conducting a semi-structured interview with 30 participants who were randomly selected from two testing groups. In interview session, if the participant had changed his/her testing mode preference after taking the CBT, s/he would have been asked about her/his reasons to change the preference. To analyze the qualitative data, the interview conversations were transcribed. In transcription, just the relevant sections of recorded conversations were picked up. Once transcription of the data has been completed, content analysis was conducted on transcribed data by identifying all the main concepts. The content analysis involved a thematic analysis of the received data. In thematic analysis, similar statements and responses to the same question were coded and categorized under a common theme (Seidman, 1998). The main relevant and meaningful notions and concepts were identified and categorized under common themes. European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 68 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE 5. Conclusion The purpose of the current study was to investigate the equivalency of test results in CBT and PPT by comparing the test results of two modes of testing administration among graduate students of Chabahar Maritime University in Iran. Moreover, it sought to probe the probable relationship of prior computer familiarity, attitude towards the use of computer and testing mode preference with testing performance on CBT. Therefore, this study employed a quantitative design to determine whether there was any difference between test scores on PPT and CBT as well as finding out any relationship between aforementioned moderator factors and their test results on CBT. It also enjoyed a qualitative design using focus group interview to find out what was the preference of test takers in test modes and their justifications for their preferences. For the first research question which aimed at investigating the comparability of scores obtained through two PPT and CBT versions of the test, paired t-test was conducted. It was used to compare the means of two sets of scores of testing group obtained in two different testing sessions. Based on the findings, it was concluded that there was statistically significant difference in the mean scores of testing group in two testing sessions as a whole (p=.000). The findings of the research question one were compatible with the results of (Coniam, 2006; Fulcher, 1999) who claim that assessments are not comparable across modes. In comparability studies on CBT and PPT, it is important to take into account the factors influencing the results on computerized tests especially when there is a significant or even slight difference between test scores. Some of these influencing external variables that have been investigated by many researchers due to increasing development and changing the interest in using computers are computer familiarity and attitude towards the use of computers. This is why in this study; the second main question was examining the relationship between these variables and test performance on CBT. If there was any relationship, the difference between two test modes could be attributed to the influence of these constructs irrelevant variable on CBT result. The findings revealed that there was no interactive effect of computer attitudes and computer familiarity variables with testing performance of participants on CBT. It means that whether test takers have high or low degrees of prior positive or negative attitudes towards computer and computer familiarity, there is not any advantage or disadvantage while performing on CBT. Additionally, it supports the construct validity of CBT as this construct-irrelevant variable is not considered as a component or part of the construct that is measured by CBT version of the test. European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 69 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE Moreover, the overall descriptive statistics of prior testing mode preference and testing performance of different preference groups analysis answered negatively the research question 3. These findings indicated that there was no necessarily positive interaction between testing mode preference and testing performance. The reason might be the novelty of CBT in the target setting. The findings of the present study were in consistent with the result of Khoshsima et al. s study that found out test takers with positive attitudes towards the use of computer did not perform better on CBT. Testing mode preference of test takers of testing group was examined before and after exposure to CBT. Then, the testing mode preference was categorized under two preCBT and Post-CBT testing mode preferences. By analyzing two pre and post-CBT questionnaires of testing group one to study possible testing mode preference change, it was revealed that only 15% of the test takers still preferred PPT version of the test while just % didn t mind taking the test on either mode. The greater percentage % was the test takers who opted for computer as their preferred mode of testing. We concluded that the number of participants who preferred PPT and who didn t mind taking the test in either mode have changed in favor of the test takers who chose On Computer as their preferred testing mode preference. References 1. Al-Amir, S., (2009).Computer-based testing vs. paper-based testing: establishing the comparability of reading tests through the evolution of a new comparability model in a Saudi EFL context. Unpublished doctoral dissertation. University of Essex, England. 2. Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole. 3. American Educational Research Association. (1999). Standards for Educational and Psychological Testing. Washington: American Educational Research Association. 4. Bennett, R. E. (1999). How the Internet will help large-scale assessment reinvents itself. Education Policy Analysis Archives, 9(5), 1-25. 5. Berberoglu, G. & Calikoglu, G. (1992).The construction of a Turkish computer attitude scale. Studies in Educational Evaluation, 24 (2), 841-845. 6. Boo, J. & Vispoel, W. (2012). Computer versus paper-and-pencil assessment of educational development: A comparison of psychometric features and examinee preferences. Psychological Reports, 111, 443-460. European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 70 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE 7. Cater, K., Rose, D., Thille, C., & Shaffer, D. (2010, June). Innovations in the classroom. Presentation at the Council of Chief State School Officers (CCSSO) National Conference on Student Assessment, Detroit MI. 8. Challoner, J. (2009). 1001 Inventions that changed the world (Cassell Illustrated: 2009). 9. Choi, I. C., Kim, K. S., & Boo, J. (2003). Comparability of a paper-based language test and a computer-based language test. Language Testing, 20, 295–320. 10. Christensen, R. and Knezek, G. (1996). Constructing the Teachers Attitudes toward Computers (TAC) questionnaire. ERIC Document Reproduction Service No. ED398244. 11. Coniam, D. (2006). Evaluating computer-based and paper-based versions of an English language listening test. ReCALL, 18, 193-211. 12. Eagly, ‚. H., & Shelly C., . ―Attitude Structure and Function.” In Handbook of Social Psychology, ed. D.T. Gilbert, Susan T. Fisk, and G. Lindsey, 269–322. New York: McGowan-Hill. 13. Fleming, S. & Hiple, D. (2004). Foreign language distance education at the University of Hawai'i. In C. A. Spreen, (Ed.), new technologies and language learning: issues and options (Tech. Rep. No.25) (pp. 13-54). Honolulu, HI: University of Hawai'i, Second Language Teaching & Curriculum Center. 14. Florida Department of Education. (2006, September 4). What do we know about choosing to take a high-stakes test on a computer? Retrieved May 15, 2010, from: http://www.fldoe.org/asp/k12memo/pdf/WhatDoWeKnowAboutChoosingToTak e AHighStakesTestOnAComputer.pdf. 15. Flowers, C., Do-Hong, K., Lewis, P., & Davis, V. C. (2011). A comparison of computer-based testing and pencil-and-paper testing for students with a read- aloud accommodation. Journal of Special Education Technology, 26(1), 1-12. 16. Fulcher, G. (1999). Computerizing an English language placement test. ELT Journal, 53(4), 289-299. 17. Higgins, J., Russell, M., & Hoffmann, T. (2005). Examining the effect of computerbased passage presentation on reading test performance. Journal of Technology, Learning, and Assessment, 3(4). Retrieved July 5, 2005, from http://www.jtla.org. 18. International Test Commission. (2006). International guidelines on computer-based and Internet delivered testing. International Journal of Testing, 6, 143–171. 19. Kenyon, D.M. and Malabonga, V. (2001 . Comparing examinee attitudes toward computer-assisted and other oral proficiency assessments , Language Learning and Technology 5(2), 60–83. European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 71 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE 20. Khoshsima, H. & Hashemi, M. (2017). Cross-Mode Comparability of Computer-Based Testing (CBT) versus Paper and Pencil-Based Testing (PPT): An Investigation of Testing Administration Mode among Iranian Intermediate EFL learners. English Language Teaching, Vol 10, No 2(2017). 21. Laurier, M. (1999). The development of an adaptive test for placement in French. In M. Chalhoub-Deville (ed.), Development and research in computer adaptive language testing (pp. 122-35). Cambridge: University of Cambridge Examinations Syndicate/Cambridge University Press. 22. Lee, J., Moreno, K. E., & Sympson, J. B. (1986). The effects of mode of test administration on test performance. Educational and Psychological Measurement, 46, 467-473. 23. Lottridge, S., Nicewander, A., Schulz, M. & Mitzel, H. (2008). Comparability of Paper-based and Computer-based Tests: A Review of the Methodology. Pacific Metrics Corporation 585 Cannery Row, Suite 201 Monterey, California 93940. 24. Loyd, B. H, & Gressard, C. (1985). The Reliability and Validity of an Instrument for the Assessment of Computer Attitudes. Educational and Psychological Measurement, 45(4), 903- 908. 25. Madsen, H. S & Larson J. W. (1986). Computerized Rasch Analysis of item bias in ESL Tests. In C. W. Stansfield (Ed.), Technology and language testing. A collection of papers from the annual colloquium on language testing research. Princeton, New Jersey. 26. Makiney, J.D., Rosen, C., Davis, B.W., Tinios, K. & Young, P. (2003). Examining the measurement equivalence of paper and computerized job analyses scales. Paper presented at the 18th Annual Conference of the Society for Industrial and Organizational Psychology, Orlando, FL. 27. Mojarrad, H, Hemmati, F, Jafari Gohar, M, & Sadeghi , A. (2013). Computer-based assessment (CBA) vs. Paper/pencil-based assessment (PPBA): An investigation into the performance and attitude of Iranian EFL learners' reading comprehension. International Journal of Language Learning and Applied Linguistics World, 4(4), 418-428. 28. OECD. (2010). PISA Computer-based assessment of student skills in science. http://www.oecd.org/publishing/corrigenda (accessed September 21, 2014). 29. O Malley, K. J., Kirkpatrick, R., Sherwood, W., ‛urdick, H. J., Hsieh, M.C. &, Sanford, E.E. (2005, April). Comparability of a Paper Based and Computer Based Reading Test in Early Elementary Grades. Paper presented at the AERA Division D Graduate Student Seminar, Montreal, Canada. European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 72 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE 30. Pinsoneault, T.B., (1996). Equivalency of computer-assisted and paper-and-pencil administered versions of the Minnesota Multiphasic Personality Inventory-2. Computers in Human Behavior, 12, 291–300. 31. Poggio, J., Glasnapp, D., Yang, X. & Poggio, A. (2005). A Comparative Evaluation of Score Results from Computerized and Paper & Pencil Mathematics Testing in a Large Scale State Assessment Program. The Journal of Technology, Learning and Assessment, 3(6), 5-30. 32. Russell, M., Almond, P., Higgins, J., Clarke-Midura, J., Johnstone, C., Bechard, S., & Fedorchak, G. (2010, June). Technology enabled assessments: Examining the potential for universal access and better measurement in achievement. Presentation at the Council of Chief State School Officers (CCSSO) National Conference on Student Assessment, Detroit MN. 33. Ryan, A. M., & Ployhart, R. E. (2000). Applicants perceptions of selection procedures and decisions: a critical review and agenda for the future. Journal of Management, 26, 565–606. 34. Seidman, I. (1998). Interviewing as qualitative research: A guide for researchers in education and the social sciences (2nd ed.). New York: Teachers College Press. 35. Taylor, C., Kirsch, I., Eignor, D., & Jamieson, J. (1999). Examining the relationship between computer familiarity and performance on computer-based language tasks. Language Learning, 49, 219–274. 36. Wallace, P., & Clariana, R. (2005). Perception versus reality – Determining business students computer literacy skills and need for instruction in information concepts and technology, Journal of Information Technology Education, 4, 141-151. Retrieved March 26, 2008 from http://jite.org/documents/Vol4/v4p141-151Wallace59.pdf 37. Wang, T., & Kolen, M. J. (2001). Evaluating comparability in computerized adaptive testing: Issues, criteria and an example. Journal of Educational Measurement, 38, 19– 49. 38. Warner, R. M. (2013). Applied Statistics: From Bivariate through Multivariate Techniques. (2th Ed.). SUA: SAGE Publication Inc. 39. Watson, B., (2001). Key factors affecting conceptual gains from CAL. British Journal of Educational Technology 32 (5) 587–593. 40. Yurdabakan, I., & Uzunkavak, C. (2012). Primary school students attitudes towards computer based testing and assessment in turkey. Turkish Online Journal of Distance Education, 13(3), 177-188. European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 73 Hooshang Khoshsima, Seyyed Morteza Hashemi Toroujeni TRANSITIONING TO AN ALTERNATIVE ASSESSMENT: COMPUTER-BASED TESTING AND KEY FACTORS RELATED TO TESTING MODE Creative Commons licensing terms Authors will retain the copyright of their published articles agreeing that a Creative Commons Attribution 4.0 International License (CC BY 4.0) terms will be applied to their work. Under the terms of this license, no permission is required from the author(s) or publisher for members of the community to copy, distribute, transmit or adapt the article content, providing a proper, prominent and unambiguous attribution to the authors in a manner that makes clear that the materials are being reused under permission of a Creative Commons License. Views, opinions and conclusions expressed in this research article are views, opinions and conclusions of the author(s). Open Access Publishing Group and European Journal of English Language Teaching shall not be responsible or answerable for any loss, damage or liability caused in relation to/arising out of conflict of interests, copyright violations and inappropriate or inaccurate use of any kind content related or integrated on the research work. All the published works are meeting the Open Access Publishing requirements and can be freely accessed, shared, modified, distributed and used in educational, commercial and noncommercial purposes under a Creative Commons Attribution 4.0 International License (CC BY 4.0). European Journal of English Language Teaching - Volume 2 │ Issue 1 │ 2017 74