Abstract:
Indians, like many other non-English speakers around the world, avoid using single code in their social media conversations. They use transliteration and blend multiple l...Show MoreMetadata
Abstract:
Indians, like many other non-English speakers around the world, avoid using single code in their social media conversations. They use transliteration and blend multiple languages to exhibit their linguistic proficiency by randomly merging English words (English-Hindi, English-Spanish, etc.). As a result, a large amount of unstructured text is generated because of the wide use of social media applications. Code-mixing (CM) is a fast-evolving field of study in the domain of text mining. The present situation of various social media posts, blogs, and reviews have a large use of code-mixed messages, due to its modern yet localized way of speaking. Linguistic codes from various languages are used for different purposes. Code-mixed Hindi and English is a typical practice observed in India's day-to-day language usage. Most people have already started to consider this mixing as a new language which has given birth to a brand new language termed “Hinglish”. Hinglish is majorly used among the younger generation, as observed in the code-mixed data obtained via social sites and various other platforms. This mixing of languages stands as a new challenge to the concept of machine translation. It is important to recognize the foreign elements in a language and process them appropriately. As a result, a translation mechanism is needed to assist monolingual users, as well as for easier comprehension by language processing models. This paper proposes a pipelined mechanism for machine translation of a bi-lingual language i.e. Hinglish to monolingual English in this paper.
Published in: 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS)
Date of Conference: 07-09 April 2022
Date Added to IEEE Xplore: 27 April 2022
ISBN Information: