Towards the Machine Reading of Arabic Calligraphy: A Letters Dataset and Corresponding Corpus of Text

Seetah Alsalamah, Ross King

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Arabic calligraphy is one of the great art forms of the world. It displays Arabic phrases, commonly taken from the Holy Quran, in beautiful two-dimensional form. The use of two dimensions, and the interweaving of letters and words makes reading a far greater challenge for Artificial Intelligence (AI) than reading standard printed or hand-written Arabic. To approach this challenge, we have constructed a dataset of Arabic calligraphic letters, along with a corresponding corpus of phrases and quotes. The letters dataset contains a total of 3,467 images for 32 various categories of Arabic calligraphic-type letters. The associated text corpus contains 544 unique quoted phrases. These data were collected from various open sources on the web, and include examples from several Arabic calligraphic styles. We have also undertaken both an explorative statistical analysis of this data, and initial machine learning investigations. These analyses suggest that combining knowledge of a limited variety of Arabic calligraphy texts, with a successful machine will be sufficient for the machine reading of forms of Arabic calligraphy.
Original languageEnglish
Title of host publication2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)
Place of PublicationLondon , UK
PublisherIEEE
Pages19-23
ISBN (Electronic)978-1-5386-1459-4
Publication statusPublished - 4 Oct 2018

Keywords

  • Arabic language
  • corpora
  • pattern recognition
  • Arabic dataset
  • calligraphy

Fingerprint

Dive into the research topics of 'Towards the Machine Reading of Arabic Calligraphy: A Letters Dataset and Corresponding Corpus of Text'. Together they form a unique fingerprint.

Cite this