Abstract:
Most research in Arabic roots extraction focuses on removing affixes from Arabic words. This process adds processing overhead and
may remove non-affix letters, which leads to the extraction of incorrect roots. This paper advises a new approach to dealing with this
issue by introducing a new algorithm for extracting Arabic words? roots. The proposed algorithm, which is called the Word Substring
Stemming Algorithm, does not remove affixes during the extraction process. Rather, it is based on producing the set of all substrings
of an Arabic word, and uses the Arabic roots file, the Arabic patterns file and a concrete set of rules to extract correct roots from
substrings. The experiments have shown that the proposed approach is competitive and its accuracy is 83.9%, Furthermore, its accuracy
can be enhanced more in the sense that, for about 9.9% of the tested words, the WSS algorithm retrieves two candidates (in
most cases) for the correct root.