Notice: Undefined offset: 1 in /var/www/wp-content/plugins/visitors-online/visitors-online.php on line 425

Notice: Undefined offset: 2 in /var/www/wp-content/plugins/visitors-online/visitors-online.php on line 425
Shanmugam IAS academy in coimbatore,tnpsc,coaching center,bank exam centres,UPSC
bharati script

EASY OCR SYSTEM FOR INDIAN LANGUAGE

What’s in the news?

Taking a cue from European languages, several of which have the same (Roman letter–based) script, a team at IIT Madras has, over the last decade, developed a unified script for nine Indian languages, named the Bharati script.

The team has now gone a step further since developing the script: it has developed a method for reading documents in Bharati script using a multi-lingual optical character recognition (OCR) scheme.

 A Look at Specifics: 

  • The team has also created a finger-spelling method that can be used to generate a sign language for hearing-impaired persons.
  • In collaboration with TCS Mumbai, the researchers have found a way for persons with hearing disability to generate signatures using this finger-spelling technique.
  • The scripts that have been integrated include Devnagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, Malayalam and Tamil. English and Urdu have not been integrated so far.
  • It is important to note that Urdu and English alphabet systems have a very different phonetic organisation. But that does not mean a mapping is not possible. It is quite possible and can be done.

What does OCR Involve?

  • In general, optical character recognition schemes involve first separating (or segmenting) the document into text and non-text.
  • The text is then segmented into paragraphs, sentences words and letters.
  • Each letter has to be recognised as a character in some recognisable format such as ASCII or Unicode.
  • The letter has various components such as the basic consonant, consonant modifiers, vowels etc.

Easy to read:

  • The scripts of Indian languages pose a problem for such a character recognition because the vowel and consonant-modifier components are attached to the main consonant part.
  • This difficulty is removed in the Bharati script which can be easily read.
  • In Bharati characters, these different components are segmentable by design. So OCR works quite accurately.
  • Three-tiered structure:
  • The ease in design comes about because the Bharati characters are made up of three tiers stacked vertically.
  • The consonant at the root of the letter is placed in the centre and the modifiers are in the top and bottom tiers.
  • Currently, the team has developed a universal finger-spelling language for the nine Indian languages.
  • As of now, they are working on a system that can help people sign documents using a finger-spelling method, and future plans include developing a new Braille system with the Bharati script.

https://www.iasipstnpsc.in/iit-madras-team-develops-easy-ocr-system-for-nine-indian-languages/

https://www.deccanherald.com/national/dd-panel-censors-cpi-leaders-speech-729364.html

Leave a Comment

Your email address will not be published.