Home About us Editorial board Search Ahead of print Current issue Archives Submit article Instructions Subscribe Contacts Login 
    Users Online: 129
Home Print this page Email this page Small font size Default font size Increase font size


 
 Table of Contents  
ORIGINAL ARTICLE
Year : 2020  |  Volume : 9  |  Issue : 2  |  Page : 84-87

Relationship of text length of multiple-choice questions on item psychometric properties – A retrospective study


1 Department of Orthodontics, Batterjee Medical College, Jeddah, Saudi Arabia
2 Department of Community Dentistry and Research, Batterjee Medical College, Jeddah, Saudi Arabia
3 Department of Medical Education, Batterjee Medical College, Jeddah, Saudi Arabia
4 Operative Dentistry and Head of the Program, Dentistry Program, Batterjee Medical College, Jeddah, Saudi Arabia

Date of Submission20-May-2020
Date of Decision30-May-2020
Date of Acceptance05-Jun-2020
Date of Web Publication21-Jul-2020

Correspondence Address:
Fawaz Pullishery
Department of Community Dentistry and Research, Batterjee Medical College, P. O. Box 6231, Jeddah 21442
Saudi Arabia
Login to access the Email id

Source of Support: None, Conflict of Interest: None


DOI: 10.4103/sjhs.sjhs_76_20

Rights and Permissions
  Abstract 


Background: Item writing flaws while constructing multiple-choice questions (MCQs) have serious impact on different psychometric properties of questionnaire. The study aimed to evaluate the relationship of length of questions of (MCQs) items on difficulty factor (DF), discrimination index (DI), and Point Bi-serial (rBP) of a dental program assessment. Materials and Methods: The cross-sectional study included 627 MCQs. The data were analyzed from the report achieved through ExamSoft software. The questions were divided into long (words >100); medium (words of 70–100); and short (words <70). We divided the DF into hard (DF <0.3), average (DF = 0.3–0.8) easy (DF >0.8); DI into negative (DI <0), DI = 0–0.2 and DI >0.2; Point Bi-serial into “Negative” (rBP <0), rBP = 0–0.2 and rBP >0.2; Pearson's Chi-square test was used to find a relationship between length of question with other variables. Results: Thirty-one long MCQs, 56 medium, and 540 short MCQs were achieved based on the analysis. There was a statistically significant association found between DF and length of the questions (P < 0.05). No significant relationship between the length of the questions with DI and Point Bi-Serial factors. The median of DF was 0.6300 (interquartile range [IQR] 0.41). The median length of the MCQs was found to be 35.0 (IQR 25.0). Conclusion: The study proved that the length of the question has an impact on the DF but not always with the DI or Point Bi-serial.

Keywords: Difficulty factor, medical education, multiple choice questions, validity


How to cite this article:
Aljehani DK, Pullishery F, Osman OA, Abuzenada BM. Relationship of text length of multiple-choice questions on item psychometric properties – A retrospective study. Saudi J Health Sci 2020;9:84-7

How to cite this URL:
Aljehani DK, Pullishery F, Osman OA, Abuzenada BM. Relationship of text length of multiple-choice questions on item psychometric properties – A retrospective study. Saudi J Health Sci [serial online] 2020 [cited 2020 Sep 29];9:84-7. Available from: http://www.saudijhealthsci.org/text.asp?2020/9/2/84/290322




  Introduction Top


In the field health profession education, competency assessment has become a cornerstone in evaluating clinical abilities and overall skills of the students. A good assessment method plays a vital role and offers an insight into students approach to learning and performances. Throughout the world, multiple-choice questions (MCQs) are the common type of format that is used to assess students in dental and other health allied science disciplines. This format allows the faculty to efficiently evaluate a large number of candidates and also helps to test a wide range of topics.[1],[2]

MCQs, when constructed properly, are one of the best tools to assess cognitive skills and could be efficiently used to discriminate high and low achievers. A very good MCQ should have very less or minimal items writing flaws (technical error), if present will affect the Student's performance thereby reducing the validity and reliability of the assessment process.[3]

Item analysis report (IAR) of an assessment method is an important and easy method to yield information regarding the reliability and validity of a test item. In item analysis, the commonly measured properties of an MCQ are difficulty index (facility value), discrimination index (DI), or Point Bi-serial (rPB). Difficulty index sometimes denoted as difficulty factor (DF) or P value tells us about the percentage of performers who correctly answered an item and it ranges from 0% to 100%. The optimal range of difficulty is from 30% to 80% (0.30–0.80). Items having difficulty index below 30% are considered difficult and those above 80% (>0.80) are easy items.[4],[5]

DI can be defined as the property of item to discriminate between students who top in scoring with low scorers. In short, it is the measure of how good performers are answering a particular item when compared to poor or low performers and it ranges from −1.00 to +1.00. The Point Biserial (rPB) also used as a measure of item discrimination and the only difference between DI and rBP is that DI compares the proportion of correct responses for an item between the high and low performer on the test as a whole whereas rPB is the correlation between the students overall examination scores and an individual question score.[6],[7] Items with a DI or rBP of 0.40 or more are considered as a “very good,” 0.30–0.39 as “reasonably good,” 0.20–0.29 as “marginal,” and <0.20 as poor.[6],[8],[9] There are many factors that have an effect on the DI and DF. Some of them include language, grammar, areas of controversies, types and number of distracters, and unfocused questions.[3],[10]

Even though Arabic is the first language used in the Kingdom of Saudi Arabia, most of the health professional courses are taught and evaluated in the English language. English is considered as a foreign language and students start to learn this language in the 4th year at the primarily level, which makes a total of 4 weekly session. Experts have the opinion that this duration is not sufficient for the students to acquire enough proficiency in the language for higher studies.[11] Thus, students may perceive some difficulties when attending health professional courses, as most of the courses are taught and evaluated in English. There are no known studies done to see the relationship of text length of MCQs on item psychometric properties. In countries where English is the first language students may not face this difficulty as they have excellent proficiency in their language skills. However, in Arabic speaking countries, students may face challenges as English is usually taught in the first or preparatory year of health sciences courses after which they are thrown into a sea of medical terms and texts.[12],[13],[14] Hence, this study was aimed to see the impact of text length of MCQs on DF, DI and Point Biserial in a final assessment for undergraduate dentistry program students conducted in one of the dental colleges in the Kingdom of Saudi Arabia.


  Materials and Methods Top


This study was carried out in the dentistry program of private dental school as a part of final assessment of the 2018–2019 academic year. Ethical approval and permission was obtained from the Institutional Research and Ethics Committee (BMC-Res-2018-0026) and informed written consent was taken from the Medical Education department to use the analysis report of the assessment. The assessment included a total of 627 MCQs that were taken from final assessment of eleven dental courses and all were of four-option type. The current dental program is a 7-year program, which included one preparatory year, 3 years of preclinical courses, 2 years of in-depth clinical courses, and a 1 year internship. The MCQs were randomly chosen out of 952 and were classified based on the text length of questions into long (words >100); medium (words of 70–100); and short (words <70). We analyzed the relationship of the length of MCQs with the level of difficulty (DF), the power of discrimination measured by DI and Point Biserial (rPB).

The IAR was calculated and supplied in reports achieved by ExamSoft software (ExamSoft Worldwide, Inc. USA). The IAR included three properties of such as DF, DI, and RPB and text length of each MCQ was calculated using Microsoft Excel worksheet. MCQs were categorized into easy, moderate, and hard questions based on difficulty index. The DI and Point Biserial were categorized into very good, reasonably good, marginal, poor and negative. Data were analyzed using SPSS ver 23 (IBM SPSS Statistics for Windows, Version 23.0. Armonk, NY: IBM Corp.) and Pearson's Chi-square test was used to find a relationship of text length with DF, DI, and RPB. A significance level, P < 0.05 is considered to be statistically significant.


  Results Top


The median for Difficulty Index was found to be 0.630 (interquartile range [IQR] 0.41), for DI it was 0.2500 (IQR 0.27), and for Point Biserial, it was 0.2700 (IQR 0.23). The median for length of words was found to be 35 (IQR 25). The maximum number of word for an MCQ was reported to be 165 and minimum was 12 [Table 1].
Table 1: Descriptive statistics for the parameters used in the study

Click here to view


There were a total of 31 (4.9%) long, 56 (8.9%) medium, and 540 (86.1%) short out of 627 MCQS. When MCQs where assessed for their difficulty index, we found out that out of 627 MCQs, 167 (26.6%) were easy, 348 (55.5%) were moderate, and 112 (17.8%) were hard questions, respectively. In our analysis, we found out that out of 540 short questions only 100 (18.5%) MCQs were hard, 299 (55.4%) remaining were moderate and 141 (26.1%) were easy. Among the 56 medium lengths MCQs, 10 (17.8%) were hard, 21 (37.5%) were easy, and 25 (44.6%) were moderate questions. When the long MCQs were assessed for the DF we noticed that only 2 (6.4%) were hard questions and remaining 5 (16.1%) were easy. 77.4% (24 out of 31 were moderate questions). When the relationship of text length and difficulty index was analyzed the results were statistically significant [P = 0.039; [Table 2].
Table 2: Relationship of text length with difficulty factor

Click here to view


When the DI was classified based on the text length it was found that 142 (22.6%) were very good, 101 (16.1%) were reasonably good, 175 (27.9%) were poor, and 120 (19.1%) were marginal MCQs. We also found that 89 (14.2%) MCQs were negatively discriminated, which showed no statistical significant association [P = 0.216; [Table 3]. The assessment of the relationship of text length with Point Biserial also did not show a significant association [P = 0.739; [Table 3].
Table 3: Relationship of text length with discrimination index and point biserial

Click here to view



  Discussion Top


This study was a pilot project that was done on MCQs used in summative examinations of eleven clinical dental subjects. To the best of our knowledge, no published research work was done on the same topic in any of the local or regional similar institutes. We identified an obvious scarcity in the literature of data related to the same topic. A total of 627 MCQs items were classified into three categories based on item length and three levels of item difficulty. When the relationship of text length and difficulty index was analyzed, the results were statistically significant. This showed that long questions are associated with increasing item difficulty compared to questions with short and moderate length. This finding could be explained on the basis that students may require more reading comprehension for items with increased text and may deviate from the actual aim of the assessment to more of a testing of the language skills. This sort of added difficulty should be considered as a type of construct irrelevant variance (CIV).[15] Evidences show that poorly constructed low-quality questions can cause construct-irrelevant variance.[16] Tests developed locally by teachers, which is the case in this study, are considered more vulnerable to CIV.[17] This is in addition to the fact that our students are nonnative English speakers. Studies done on students with limited English proficiency proved that their tests results were profoundly affected by their limited vocabulary.[18],[19] However, this effect of reading comprehension is not limited to students with limited English proficiency.[17] The ultimate effect of such increase in item difficulty due to long text is the risk of threatening the validity of the scores and decisions made on student's mastery based on it. This is because the student's ability to correctly answer a question was not limited to their level of learning.[20] Thus, the test measured an additional ability, which was not purposed to measure. To reduce the risk on validity, elimination or control of CIV is very essential. This could be achieved by faculty training by considering this particular factor.[17]

The analysis of the effect of item text length with both DI and rPB revealed no significant relationship, respectively. It is quite relevant to mention here that the length of the question and the reading comprehension might not be the only source of CIV in this study. Other sources of CIV including anxiety and test administration conditions could have affected the long questions and contributed to the added difficulty.[15] It should also be acclaimed that we most of the students were native speakers of the examination language, but the proficiency in the language was not matched between the examination takers which may pose some confounding bias. Furthermore, an important limitation of the current study is that the sources of the questions are clinical dental courses and therefore the results may not be generalized to other basic science courses, other medical disciplines or other institutes. However, the study confirmed what had been found in similar previous study.[21] More studies are recommended on more health sciences programs involving a large number of samples preferably at a multi-center level.


  Conclusion Top


Item writing flaws (IWFs) in test items have already proven to have a potential impact on the psychometric properties. In our study, the effect of text length did not have a significant impact on the Difficulty Index, DI and Point Biserial. Whether the text length should be considered as an IWF in constructing test item (especially in MCQs) for non-English speaking exam takers when attending an exam, which is constructed in English, is still not yet clear. There is a need for a wider study in this area and is also a topic of discussion by the experts in dental education.

Acknowledgment

All the authors would like to express their gratitude to Dr. Osama Kensara, the dean of Batterjee Medical College and the Medical Education department for extending their support for this research.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.



 
  References Top

1.
Downing SM. Assessment of knowledge with written test forms. In: Norman GR, van der Vleuten C, Newble DI, editors. International Handbook of Research in Medical Education. Dordrecht: Kluwer Academic Publishers; 2002. p. 647-72.  Back to cited text no. 1
    
2.
McCoubrie P. Improving the fairness of multiple-choice questions: A literature review. Med Teach 2004;26:709-12.  Back to cited text no. 2
    
3.
Tarrant M, Ware J. Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments. Med Educ 2008;42:198-206.  Back to cited text no. 3
    
4.
Miller MD, Linn RL, Gronlund NE, editors. Measurement and Assessment in Teaching. 10th ed.. Upper Saddle River, NJ: Prentice Hall; 2009.  Back to cited text no. 4
    
5.
Ebel RL, Frisbie DA. Essentials of Educational Measurement. 5th ed. Englewood Cliffs, New Jersey: Prentice-Hall Inc.; 1991.  Back to cited text no. 5
    
6.
Engelhardt PV. An introduction to classical test theory as applied to conceptual multiple-choice tests. In: Henderson C, Harper KA, editors. Getting Started in PER. Vol. 2. College Park: American Association of Physics Teachers. Reviews in PER; 2009.  Back to cited text no. 6
    
7.
De Champlain AF. A primer on classical test theory and item response theory for assessments in medical education. Med Educ 2010;44:109-17.  Back to cited text no. 7
    
8.
Rahim AF. What those Number Mean? 1st ed.. Kubang Kerian: KKMED; 2010. Available from: http://www.medic.usm.my/dme/images/stories/staff/KKMED/2010/item_analysis_guide.pdf. [Last accessed on 2019 May 25].  Back to cited text no. 8
    
9.
Odukoya JA, Adekeye O, Igbinoba AO, Afolabi A. Item analysis of university-wide multiple choice objective examinations: The experience of a Nigerian private university. Qual Quant 2018;52:983-97.  Back to cited text no. 9
    
10.
Dufresne RJ, Leonard WJ, Gerace WJ. Making sense of student's answers to multiple-choice questions. Phys Teach 2002;40:174-80.  Back to cited text no. 10
    
11.
Alhmadi NS. English speaking learning barriers in Saudi Arabia: A case study of Tibah University. Arab World Engl J 2014;5:38-53.  Back to cited text no. 11
    
12.
Rass RA. Challenges face Arab students in writing well-developed paragraphs in English. Engl Lang Teach 2015;8:49.  Back to cited text no. 12
    
13.
Malcolm D. Reading strategy awareness of Arabic-speaking medical students studying in English. System 2009;37:640-51.  Back to cited text no. 13
    
14.
Mourtaga KR. Some reading problems of Arab EFL students. J Al-Aqsa Univer 2006;10:75-91.  Back to cited text no. 14
    
15.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association; 1999.  Back to cited text no. 15
    
16.
Ware J, Kattan TE, Siddiqui I, Mohammed AM. The perfect MCQ exam. J Health Spec 2014;2:94-9.  Back to cited text no. 16
  [Full text]  
17.
Downing SM. Threats to the validity of locally developed multiple-choice tests in medical education: Construct-irrelevant variance and construct underrepresentation. Adv Health Sci Educ Theory Pract 2002;7:235-41.  Back to cited text no. 17
    
18.
Abedi J, Lord C, Hofstetter C, Baker E. Impact of accommodation strategies on English language learners' test performance. Educ Meas 2000;19:16-26.  Back to cited text no. 18
    
19.
Fitzgerald J. English-as-a-second-language learners' cognitive reading processes: A review of research in the United States. Rev Educ Res 1995;65:145-90.  Back to cited text no. 19
    
20.
Premadasa IG. A reappraisal of the use of multiple choice questions. Med Teach 1993;15:237-42.  Back to cited text no. 20
    
21.
Loudon C, Macias-Muñoz A. Item statistics derived from three-option versions of multiple-choice questions are usually as robust as four- or five-option versions: Implications for exam design. Adv Physiol Educ 2018;42:565-75.  Back to cited text no. 21
    



 
 
    Tables

  [Table 1], [Table 2], [Table 3]



 

Top
 
 
  Search
 
Similar in PUBMED
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

 
  In this article
Abstract
Introduction
Materials and Me...
Results
Discussion
Conclusion
References
Article Tables

 Article Access Statistics
    Viewed286    
    Printed26    
    Emailed0    
    PDF Downloaded61    
    Comments [Add]    

Recommend this journal