Jordan University of Science and Technology

Automated System for Arabic Optical Character Recognition

Inad Aljarrah, Osama Al-Khaleel, Khaldoon Mhaidat, Mu’ath Alrefai, Abdullah Alzu’bi, and Mohammad Rabab’ah

In this paper an Arabic Optical Character Recognition system is implemented. The system takes a scanned image of an Arabic text as an input and generates an editable text out of it. The system starts by segmenting the document which is presented as an image into lines, then each line is also segmented into separate words, after that each word is further segmented to sub-words. Each word or sub-word is segmented into separate characters, and then a features extraction process is applied on each character to calculate its features vector. The feature vector is then compared with templates of feature vectors for each of the Arabic alphabet with their variations. The minimum distance classifier is used in the classification stage. Promising results are achieved regardless that Arabic Optical Character Recognition is considered many times harder to handle than its counterparts in other languages like English due to the continuity between the letters in the same word.