ArIdentifier: Identify Arabic Text Segments


Khaled Al-Sham'aa

Understanding the language and encoding of a given document is an essential step in working with unstructured multilingual text. Without this basic knowledge, applications such as information retrieval and text mining cannot accurately process data and important information may be completely missed or mis-routed.

Any application that works with Arabic in multiple languages documents can benefit from the ArIdentifier class. Using this class, applications can take a fully automated approach to processing Arabic text by quickly and accurately determining Arabic text segments within multiple languages document.