An application of image processing in optical mark recognition

Abstract

The Optical Mark Recognition (OMR) is very popular with

universities for the reading of multiple-choice questions. In this

article, we presented a software system for processing surveys at the

Vietnam National University of Agriculture based on digital image

processing. This software was built using MATLAB and easy to use.

The surveys were digitized using a scanner and sent to the software

tool. In this study, we tested more than 170 surveys of nine different

types. The software tool correctly detected all the valid answers. It

was also able to detect all questions with no or multiple marks.

An application of image processing in optical mark recognition trang 1

Trang 1

An application of image processing in optical mark recognition trang 2

Trang 2

An application of image processing in optical mark recognition trang 3

Trang 3

An application of image processing in optical mark recognition trang 4

Trang 4

An application of image processing in optical mark recognition trang 5

Trang 5

An application of image processing in optical mark recognition trang 6

Trang 6

An application of image processing in optical mark recognition trang 7

Trang 7

An application of image processing in optical mark recognition trang 8

Trang 8

pdf 8 trang xuanhieu 5400
Bạn đang xem tài liệu "An application of image processing in optical mark recognition", để tải tài liệu gốc về máy hãy click vào nút Download ở trên

Tóm tắt nội dung tài liệu: An application of image processing in optical mark recognition

An application of image processing in optical mark recognition
There are 
 many applications of OMR including multiple-choice examinations 
 (for students and pupils) and feedback collection (from customers, 
 students, and users, etc.). In universities (i.e., Vietnam National 
 University of Agriculture), collecting feedback from students plays 
 an important role in evaluating and improving the quality of 
 education. 
 Nowadays, many commercial solutions for OMR are available 
Received: April 17, 2020 
Accepted: December 5, 2020 (e.g., OpScan Series Product from SCANTRON). In common, these 
 products require a dedicated scanner and answer sheets, which 
Correspondence to motivates the finding of cheaper solutions. Hong Duc University 
ntthu@vnua.edu.vn created a software named TickREC for this purpose (Hong-Duc 
ORCID University, 2014). The Vietnam Forestry University also has its 
Le Thanh Ha software solutions (Mai Ha An, 2014). Increasingly more methods 
https://orcid.org/0000-0001-5090-
5491 for mark detection have been published. Gaikwad (2015) applied a 
864 Vietnam Journal of Agricultural Sciences 
 Tran Vu Ha & Nguyen Thi Thu (2020) 
template matching algorithm after finding the (iv) Student feedback about an advanced 
region of interest to find the answers marked education program 
(Gaikwad, 2015). Loke et al. (2018)et al. (v) Master student feedback about a specific 
proposed a method based on pixel counting and course 
simple thresholding that can be used under a 
 (vi) Graduate student feedback about an 
variety of conditions . Another method by Belag 
 educational program 
et al. was developed based on the creation of 
template answer sheets and key points detection (vii) Student feedback about a theoretical 
algorithms (Belag et al., 2018). Each of these course of an ordinary education program 
methods (and corresponding software tools) has (viii) Student feedback about a practical 
its own advantages and disadvantages. For course of an ordinary education program 
example, Belag’s tool used a dedicated sheet for (ix) Student feedback about a theoretical 
answers, this sheet also had checkmarks that course of a Professional Oriented to Higher 
helped in case the scanned image was rotated. Education (POHE) program 
This kind of sheet is suitable for tests but is not For each type of questionnaire, there were 
good for surveys. In cases of TickREC and the more than 30 sheets that were randomly filled. 
tool of Mai Ha An (2014), they could process the All of the sheets were scanned with an HP 
sheets that contained both questions and answers scanner (ScanJet Pro 3000 s3). The output file 
(Mai Ha An, 2014). Because each software format was normally JPEG but could also be 
works with a certain type of answer sheet, which PNG, BMP, or some other formats supported by 
was designed as needed by the authors, it is not MATLAB (see method section for more details). 
possible to apply these softwares instantly for the The width and the height of the images were 
surveys at the Vietnam National University of 1655 and 2338 pixels, respectively (these 
Agriculture. dimensions of images could be slightly different 
 Hence, in this work, we created a software depending on the scanner). The examples of 
for processing surveys at the Vietnam National surveys are shown in Figures 1 and 2. 
University of Agriculture. The surveys were 
scanned by an ordinary scanner and sent to the Methods 
software to process. This software was designed MATLAB - Environment for software 
in such a manner that it was easy to use and no development 
special training was required. This system was 
cost-effective because no dedicated machine or MATLAB (short name for matrix 
answer sheets were required. laboratory) was developed in the 1970s by Cleve 
 Moler (Haigh, 2008). Most of the codes of 
 MATLAB was written by Cleve Moler using 
Materials and Methods FORTRAN. Jack Little and Steve Bangert then 
Materials reprogrammed MATLAB in C. Together with 
 Cleve Moler, three of them founded the 
 In this project, we used nine different types 
 MathWorks in California in 1984. MathWorks 
of questionnaires. All of these were used by the 
 then develops, maintains, and distributes 
Center for Quality Assurance, Vietnam National 
 MATLAB as a commercial product (Sandeep, 
University of Agriculture: 
 2017). Nowadays, MATLAB supports various 
 (i) Employee feedback about the operation platforms such as LINUX, Windows, and 
of a number of divisions MacOS. With MATLAB, users write a few lines 
 (ii) Member feedback about the support of of code to acquire instant results without 
the Ho Chi Minh Communist Youth Union involving a compiler. MATLAB is used for data 
 (iii) Student feedback about the support of a analysis and visualization. It supports multiple 
number of divisions types of data (audios, images, videos, CSV, and 
https://vjas.vnua.edu.vn/ 865 
An application of image processing in Optical Mark Recognition 
 (a) A survey for employees (b) A survey for students 
 Figure 1. Example of surveys with one page 
 (a) The first page of a student survey (b) The second page of a student survey 
 Figure 2. Example of surveys with two pages 
866 Vietnam Journal of Agricultural Sciences 
 Tran Vu Ha & Nguyen Thi Thu (2020) 
different databases). MATLAB also provides To extract the region of interest (ROI), the 
App Designer tool which allows the users to region in which people filled in the options, we 
different databases). MATLAB also provides used a special image called a mask. As shown in 
App Designer tool which allows the users to Figure 4a, a mask contained only filled options. 
build GUI (Graphical User Interface) for their Our program would then find the ROI. The 
programs (Educba, 2020). For these reasons, we position and size of ROI (the region inside the red 
used MATLAB to develop our software tool for rectangle, Figure 4b) was then used to crop the 
data processing. other scanned images. 
 Processing workflow With the function imfindcircles from 
 Figure 3 shows the basic steps needed for MATLAB, we were able to locate all the options 
the processing of one scanned page of on the cropped images. The number of black 
questionnaires. For the first step, the selected pixels in each circle helped us to indicate the 
machine (ScanJet Pro 3000 s3) scanned multiple selected one. 
pages in a single run. After that, our software tool Our software tool then outputted the selected 
then came into play. options for every question on the sheet. The 
 Because our questionnaires were printed in output was eventually stored in a plain text file. 
monochrome and then filled using black or blue 
(the colors of most ballpoint pens), converting Results and Discussion 
images to binary would save us memory and time 
for processing. With the support from MATLAB, The software tool 
converting images to binary was straightforward. Figure 5 shows the main graphical user 
We only needed to call the im2bw function with interface (GUI) of the program. The user first 
the original image as a parameter, the function needed to specify the directory of scanned 
then returned a binary image. images by clicking Select image folder button 
 Figure 3. The proposed stages for data processing 
https://vjas.vnua.edu.vn/ 867 
An application of image processing in Optical Mark Recognition 
 (a) An example of mask image (b) ROI on mask image (the area inside the red rectangle) 
 Figure 4. Mask image 
 Figure 5. The main user interface of the program 
868 Vietnam Journal of Agricultural Sciences 
 Tran Vu Ha & Nguyen Thi Thu (2020) 
(area 1). All images in the selected directory 179 questionnaires belonging to nine different 
would be listed in the area below the button (area types. Our tool correctly detected all valid 
2). The user then selected the mask file by questions (questions having one option filled). It 
clicking Select mask button (area 3). Depend on correctly identified all questions that were not 
the type of questionnaire, we might need to select filled (not evaluated by students, as shown in 
two masks if the questionnaire contained two Figure 6a). The tool could also detect the 
pages. To start processing images, the user question that had multiple options filled (the 
clicked on Start button (area 4). The result would students changed their mind and chose another 
be displayed at the bottom right of the window option) (Figure 6b). 
(area 5). Because the number of black pixels in each 
 option was used to identify which options were 
Processing questionnaires filled, our tool might not work correctly in some 
 Table 1 shows a summary of the analysis of cases as follows: 
 Table 1. Results of data processing 
 Number of 
 Number of Number of Number 
 Total multiple 
 questions in Number of correctly of unfilled 
 Type of questionnaires number of filled 
 the questionnaires detected questions 
 questions questions 
 questionnaires questions detected 
 detected 
 Employee feedback about the 
 10 35 350 339 11 0 
 operation of a number of divisions 
 Member feedback about the support 
 of Ho Chi Minh Communist Youth 10 34 340 338 1 1 
 Union 
 Student feedback about the support 
 10 35 350 342 5 3 
 of a number of divisions 
 Student feedback about an advanced 
 25 35 875 866 2 7 
 education programs 
 Master student feedback about a 
 23 35 805 800 3 2 
 specific course 
 Graduate student feedback about an 
 43 35 1505 1498 2 5 
 educational program 
 Student feedback about a theoretical 
 course of an ordinary education 22 35 770 769 0 1 
 program 
 Student feedback about a practical 
 course of an ordinary education 18 35 630 628 1 1 
 program 
 Student feedback about a theoretical 
 18 35 630 629 0 1 
 course of a POHE program 
https://vjas.vnua.edu.vn/ 869 
An application of image processing in Optical Mark Recognition 
 Instead of filling in the option, the user used selecting the corresponding image from the list 
a checkmark (tick) or x a mark (cross) to mark of images. After checking the images, the user 
the selected option (Figure 6c). The number of was able to make direct modifications in the 
black pixels inside a checked option might not be result area before exporting the final result to the 
enough for a valid filled option. output file. 
 Options were not completely filled (Figure If the scanned images were rotated, our tool 
6d). Similar to the previous case, the option might encounter a problem due to the scanning 
might not be bold enough to be a marked one. or copying process. Especially, when the crop 
 The user used light colors to mark the area did not contain all the options, the program 
selected option. In this case, filled areas might could not obtain enough data for analysis 
become unfilled because of the conversion from (Figure 6e). In the future update, we will give a 
color images to binary images. warning for this kind of sheet. One possible 
 Apparently, our tool marked this question as solution to this problem is using checkmarks. 
NULL in the result area. The user could easily Checkmarks are black-filled rectangles or 
see this and check the answer sheet manually by squares located at the corners and the margins of 
 (a) No options filled 
 (b) Multiple options filled 
 (c) Checkmarks used 
 (d) Options not completely filled (e) Cropping the wrong area due to image rotation 
 Figure 6. Problems with questionnaires and scanned images 
870 Vietnam Journal of Agricultural Sciences 
the sheet. By first detecting checkmarks, it is plan to apply is using checkmarks (bold 
possible to identify whether the sheet is rotated rectangles located at the corners and the margins 
too much if one or more checkmarks at the of the questionnaires). 
corners are absent. If all of the checkmarks at 
four corners are detected, then we can calculate 
the rotate angle of the sheet. We can eventually Acknowledgments 
rotate the scanned sheet in the reverse angle We would like to thank the Vietnam 
before detecting the options. National University of Agriculture for funding 
 this project. 
Conclusions 
 In this study, we have proposed a solution for References 
optical mark recognition problems that do not Belag I. A., Gulpete Y. & Elmanti T. M. (2018). An Image 
require a dedicated machine or answer sheet. Processing Based Optical Mark Recognition with the 
Instead, we used ordinary scanners and printers Help of Scanner. International Journal of Engineering 
with A4 paper. We have built a software program Innovation and Research. 7(2): 5. 
that works with different image formats. It can Educba W. (2020). Matlab Features [Online]. Retrieved 
 from https://www.educba.com/matlab-features/ on 
detects filled options and questions with April 19, 2020. 
no/multiple filled options. The output of the 
 Gaikwad S. B. (2015). Image Processing Based OMR Sheet 
program is in plain text and can be easily opened Scanning. International Journal of Advanced Research 
in various softwares, including Microsoft Excel. in Electronics and Communication Engineering 
While other tools only work with one-page (IJARECE). 4. 
questionnaires, our tool can work with surveys Haigh T. (2008). Cleve Moler: Mathematical Software 
that contain two pages. The first result looks Pioneer and Creator of Matlab. IEEE Annals of the 
promising, but still has room for improvement. History of Computing. 30(1): 87-91. 
Most of the questionnaires contain an area for Hong-Duc University. (2014). An introduction to TickREC 
 - an automatic survey processing tool [Online]. Hong-
other ideas (and comments) which may contain Duc University. Retrieved from 
handwriting text. In the next version, it is our vn/4/3030/Gioi-thieu-phan-mem-xu-ly-phieu-dieu-
intention that our software tool will utilize the tra-tu-dong-TickREC.html on April 22. 
latest achievements of artificial intelligence to Loke S. C., Kasmiran K. A. & Haron S. A. (2018). A new 
solve this problem or at least give users a warning method of mark detection for software-based optical 
about having handwriting text on questionnaires. mark recognition. PLOS ONE. 13(11): e0206420. 
We also want to solve the problem with rotated Mai Ha An (2014). Research and applying image 
images. This can be done by detecting rectangles processing techniques to process the survey 
 questionnaire on training of Vietnam forestry 
on the questionnaires. The problem now university. Journal of Forestry Science and 
becomes selecting the right one (the rectangle Technology . 1(1): 6. 
that has options inside), but there are multiple Sandeep N. (2017). Introduction to MATLAB for 
and overlapping rectangles on a single sheet. Engineers and Scientists: Solutions for Numerical 
Another solution for the rotating problem that we Computation and Modeling. Apress. 222 pages. 
https://vjas.vnua.edu.vn/ 871 

File đính kèm:

  • pdfan_application_of_image_processing_in_optical_mark_recogniti.pdf