Jordan University of Science and Technology
Automated System for Arabic Optical Character Recognition
Authors:
Inad Aljarrah, Osama Al-Khaleel, Khaldoon Mhaidat, Mu’ath Alrefai, Abdullah Alzu’bi, and Mohammad Rabab’ah
Abstract:
In this paper an Arabic Optical Character Recognition system
is implemented. The system takes a scanned image of
an Arabic text as an input and generates an editable text
out of it. The system starts by segmenting the document
which is presented as an image into lines, then each line is
also segmented into separate words, after that each word
is further segmented to sub-words. Each word or sub-word
is segmented into separate characters, and then a features
extraction process is applied on each character to calculate
its features vector. The feature vector is then compared
with templates of feature vectors for each of the Arabic alphabet
with their variations. The minimum distance classifier
is used in the classification stage. Promising results are
achieved regardless that Arabic Optical Character Recognition
is considered many times harder to handle than its
counterparts in other languages like English due to the continuity
between the letters in the same word.