computational

linguistics

Exploring the field of computational linguistics, theschwa leads the way into a more inclusive, diverse, equitable, and accessible world.

Whether you’re a researcher, linguistics expert, language enthusiast, or programmer, join us on this journey as we explore the depths of language technologies and shape the future.

Image

research

IruMozhi: Automatically classifying diglossia in Tamil

Findings of Association for Computational Linguistics (ACL) Journal (NAACL ‘24)

Tamil, a Dravidian language of South Asia, is a highly diglossic language with two very different registers in everyday use: Literary Tamil (preferred in writing and formal communication) and Spoken Tamil (confined to speech and informal media). Spoken Tamil is under-studied in modern NLP systems compared to Literary Tamil written in the Tamil script, as evidenced by a lack of datasets explicitly targetting the Spoken variety. In this paper, we release IruMozhi, a human-translated dataset of parallel text in Literary and Spoken Tamil. Using IruMozhi, we train classifiers on the task of identifying which Tamil variety a text belongs to. We use these models to gauge the availability of pretraining data in Spoken Tamil, to audit the composition of existing labelled datasets for Tamil, and to encourage future work on the variety.

MasterMind: A Novel Multi-Output Model Approach for Detecting Mental Illnesses through Natural Language Processing

Virginia Junior Academy of Science Journal

Identifying and diagnosing mental health conditions such as depression, anxiety, and suicidal thoughts is growing in importance, with Natural Language Processing (NLP) emerging as a viable key to do so. By utilizing publicly available, labeled data obtained from platforms like X (formerly Twitter) and Reddit, this research focuses on developing individual NLP models using Naïve Bayes, Random Forest, XGBoost, and KNN Classifiers. These models are then integrated into a comprehensive multi-output model, aimed to optimize efficiency. Although the individual models displayed a tendency to overfit, they demonstrated high accuracy and effectiveness in mental health detection, specifically the Random Forest Classifier. The tested multi-output models presented a range of low to high ROC-AUC scores, leading to promising results overall. This research underscores the potential of NLP in mental health analysis and its practical application in medical and psychiatric practices to accurately address the needs of those showing signs of these conditions in their writing.

USING SOFTWARE TO EXTRACT KEY FINDINGS FROM SCIENTIFIC RESEARCH PAPERS: A CASE STUDY USING RESEARCH ON DIETS

International Journal of Social Science and Economic Research (IJSSER) Journal @ DSC US Health ‘22 Conference

With the explosion of information on the Internet, search engine users still find themselves having to weed through a myriad of websites to ensure that the find the relevant information. This is even more cumbersome in dynamic subject areas, such as scientific research, where research findings may not be stable and even contradictory. Laypeople are especially burdened since they may lack the knowledge to evaluate what scientific papers are actually concluding. The present paper describes software that reads scientific papers and distills their principal findings in a format that laypeople can understand. This software is evaluated in the topic area of research on diet. A separate paper evaluates this software in the topic area of research on Covid-19.

products

EzhuthuMātri

Accurate transliteration tool from the Tamil Script to the Latin alphabet

LibiMāttam

Accurate transliteration tool from the Malayalam Script to the Latin alphabet

Cognate Detector

Tool to determine cognates between Tamil, Malayalam, and Sanskrit

email info@linguistics.world for demos and more information

in the Schwa’s podcast, Kabi interviews various individuals experienced in the different fields of artificial intelligence and linguistics, including topics like computational linguistics, historical linguistics, and natural language processing.

about us

the Schwa is an initiative dedicated to the intersection of linguistics and technology: come together and explore the frontiers of this dynamic field and create practical solutions that transform the way we interact with language.

Whether it's through research, engaging workshops, product development, podcasts, blogs, or annual conference, it is the Schwa's goal to share our knowledge, insights, and discovery with the greater computational linguistics community. Join us as we push the boundaries of what's possible in computational linguistics and unlock new opportunites for language technologies.

Kabilan Prasanna

  • Founder

mentors

...

Mr. Aryaman Arora

...

Mrs. Michele Sambiase

...

Mr. Sundaram K Thirukkurungudi