Loading [MathJax]/extensions/MathMenu.js
Code-Mixed Hinglish to English Language Translation Framework | IEEE Conference Publication | IEEE Xplore

Code-Mixed Hinglish to English Language Translation Framework


Abstract:

Indians, like many other non-English speakers around the world, avoid using single code in their social media conversations. They use transliteration and blend multiple l...Show More

Abstract:

Indians, like many other non-English speakers around the world, avoid using single code in their social media conversations. They use transliteration and blend multiple languages to exhibit their linguistic proficiency by randomly merging English words (English-Hindi, English-Spanish, etc.). As a result, a large amount of unstructured text is generated because of the wide use of social media applications. Code-mixing (CM) is a fast-evolving field of study in the domain of text mining. The present situation of various social media posts, blogs, and reviews have a large use of code-mixed messages, due to its modern yet localized way of speaking. Linguistic codes from various languages are used for different purposes. Code-mixed Hindi and English is a typical practice observed in India's day-to-day language usage. Most people have already started to consider this mixing as a new language which has given birth to a brand new language termed “Hinglish”. Hinglish is majorly used among the younger generation, as observed in the code-mixed data obtained via social sites and various other platforms. This mixing of languages stands as a new challenge to the concept of machine translation. It is important to recognize the foreign elements in a language and process them appropriately. As a result, a translation mechanism is needed to assist monolingual users, as well as for easier comprehension by language processing models. This paper proposes a pipelined mechanism for machine translation of a bi-lingual language i.e. Hinglish to monolingual English in this paper.
Date of Conference: 07-09 April 2022
Date Added to IEEE Xplore: 27 April 2022
ISBN Information:
Conference Location: Erode, India

I. Introduction

India is a linguistically diverse country with numerous languages spoken across its different regions. Moreover, due to its significantly long history of international acquaintances, English has become an essential part of India's education system and hence a population that is quite comfortable using bilingualism in communication has emerged. Such bilingualism initiates frequent code-mixing in informal conversations on a regular basis. Furthermore, due to the rapid rise of social media, this form of communication has grown even more popular. For instance, while conversing on social-media platforms like Facebook, WhatsApp, or Twitter, we have noticed that people are not very particular about expressing themselves in monolingual Hindi or English; instead, they prefer to mix up the languages. Translating such kind of data manually is burdensome and hence availing machines for the same is more desirable.

Contact IEEE to Subscribe

References

References is not available for this document.