Arabic Text Diacritization Using Deep Neural Networks | Paper Summary

Eman Elrefai
4 min readMay 5, 2021

Many people face difficulties in reading papers. In this story, I summarize this paper with clear points that make you understand so easily. The paper link: https://arxiv.org/pdf/1905.01965.pdf

I. INTRODUCTION

  • The Arabic language from most other languages is the right to left (RTL) writing style and the addition of diacritics to each letter.
  • Standard Arabic is also split into two categories: Classical Arabic (CA) and Modern Standard Arabic (MSA). CA is mainly used in the Holy Quran (HQ), old books, old poetry, etc., while MSA is used in news, lectures, letters, formal speeches, etc. Colloquial Arabic is used in daily life.
Screenshot by Author

II. DIACRITIZATION SYSTEMS AND APPROACHES

A. DL-based (neural) approaches:

There is some work published on non-neural approaches such as. Such works are mainly based on linguistic rules and statistical treatments. For example,

  • The MADAMIRA analyzer: built by Pasha et al. provides diacritization, tokenization, part-of-speech tagging
  • Other Arabic language processing tools, using morphological analysis. Elshafei et al.applied a statistical approach using the hidden Markov…

--

--