Arabic Auto Summarize Class
This class identifies the key points in an Arabic document
for you to share with others or quickly scan. The class determines key points
by analyzing an Arabic document and assigning a score to each sentence. Sentences
that contain words used frequently in the document are given a higher score. You
can then choose a percentage of the highest-scoring sentences to display in the
summary. "ArAutoSummarize" class works best on well-structured documents
such as reports, articles, and scientific papers.
"ArAutoSummarize" class cuts wordy copy to the bone by counting
words and ranking sentences. First, "ArAutoSummarize" class identifies
the most common words in the document (barring "هو", "هي",
"في", "حتى", "من" and the like) and assigns
a "score" to each word--the more frequently a word is used, the higher
the score.
Then, it "averages" each sentence by adding the scores of its words
and dividing the sum by the number of words in the sentence--the higher the
average, the higher the rank of the sentence. "ArAutoSummarize"
class can summarize texts to specific number of sentences or percentage of the
original copy.
We use statistical approach, with some attention apparently paid to:
- Location: leading sentences of paragraph/document, title, introduction, and
conclusion.
- Fixed phrases: "خصوصا", "نتيجة", "خلاصة", "تحقيقات", "هام", in-text
summaries, etc.
- Frequencies of words, phrases, proper names
- Contextual material: query, title, headline, initial paragraph
The motivation for this class is the range of applications for key phrases:
The point of the list is that there are many uses for key phrases,
so a class for automatically generating good key phrases should have a sizable
market.
- Mini-summary: Automatic key phrase extraction can provide a quick mini-summary
for a long document. For example, it could be a feature in a web sites; just
click the summarize button when browsing a long web page.
- Highlights: It can highlight key phrases in a long document, to facilitate
skimming the document.
- Author Assistance: Automatic key phrase extraction can help an author or editor
who wants to supply a list of key phrases for a document. For example, the administrator
of a web site might want to have a key phrase list at the top of each web page.
The automatically extracted phrases can be a starting point for further manual
refinement by the author or editor.
- Text Compression: On a device with limited display capacity or limited bandwidth,
key phrases can be a substitute for the full text. For example, an email message
could be reduced to a set of key phrases for display on a pager; a web page
could be reduced for display on a portable wireless web browser.
This list is not intended to be exhaustive, and there may be some overlap in
the items.