Sciences
Michela Marchini, Computer Science
Natural Language Processing and Spanglish: Approaches Towards Part of Speech Tagging Code-Switched Language
Project advisor: Lisa Ballesteros
Natural language processing (NLP) is a field dedicated to the computational understanding of human language. Through computational analysis of human text and speech the field has made incredible strides and created tools that many people use on a day-to-day basis such as Siri and Google Translate. At the core of these incredibly powerful systems are a number of common NLP tasks that are key to a system’s ability to process language. One of these tasks is part of speech (POS) tagging, which involves assigning parts of speech to each word in an input text.
Although NLP has been a heavily researched subfield of computing since the 1950s, to date the vast majority of analysis in the field has focused on monolingual language. While there has been significant research into multilingual text and speech, this research has historically not included code-switched language. Code switching, sometimes referred to as code mixing, is when multilingual speakers alternate between two or more languages or dialects within a single conversation or utterance (smallest unit of speech). Despite the prevalence of code-switching within multilingual communities, the vast majority of NLP tools and products are optimized for monolingual input, ignoring the natural speech and/or writing patterns of many communities. POS tagging research, specifically, has yielded many different models and methods to tag monolingual data, but much less work has been done to POS tag code-switched language.
This project takes one of these models, the Stanford Part of Speech Tagger, and analyzes different approaches through which its performance can be greatly improved for Spanglish (code-switched Spanish and English) input. Specifically, two models are developed and analyzed: 1) a multi-lingual approach that integrates separate monolingual models for the matrix and secondary languages (the language splitting model) and 2) a translation model where the code-switched language is translated before tagging (the translation model). Both models performed at a lower accuracy than the baseline for monolingual input, though they greatly improved upon the baselines for tagging the code-switched data.
Nana Aba Turkson, Computer Science
Toward a Multifunctional Microbot for Liver-Targeted Delivery and Parasite Tracking
Project advisor: Lisa Ballesteros
Plasmodium Vivax which causes Malaria is the only parasite among the five other Plasmodium parasites characterized by a periodic relapse of symptomatic blood-stage malaria infection which is likely initiated by activation of dormant hypnozoites parasites inside the liver 1. Researchers’ inability to develop a method that will enable continued growth and breeding of plasmodium vivax using in-vitro culture poses a challenge as it has discouraged researchers from knowing in full detail how P.vivax operates inside the liver and how it invades reticulocytes to cause relapsing malaria.1
Microrobots offer great promise for minimally invasive targeted medical applications at hard-to-access regions inside the human body2 . Research in micro robots has accelerated due to several leaps made by researchers in fabrication techniques, the material used, actuation, and imaging of micromachines 2. These advances in this field have led to micro robot’s ability to operate in superficial tissues such as the eye, locations with relatively easier access routes (e.g., gastrointestinal tract), and stagnant or low-velocity fluidic environments inside the human body2 . Unfortunately, there have not been any internationally recorded successful advances in the deeper location inside the human body at the clinical stages and this is due to the very harsh and dense conditions because of blood flow2 . It has been harder to go around the circulatory system which serves as the ideal navigation route to access the deeper locations such as the liver.
For researchers to get access to information about the dormant liver stage hypnozoites that cause relapsing malaria, micro robots must be able to invade the liver successfully under these harsh and dense conditions to investigate the biology of the liver stage development and dormancy of hypnozoites. This might aid in the discovery of new methodology that could lead to the development of new drugs and vaccines for the elimination of malaria infection.
This research focuses on designing and building a water-proof micro robot capable of swimming and withstanding the circulatory system’s environmental conditions (such as blood flow, densely crowded heterogenous fluidic environment) 3 around and inside the liver. This device will be programmed to be sent to the targeted area of interest inside the liver and under harsh conditions to track and record key information such as what cells the plasmodium parasite might invade in the liver cell and anything that might be helpful to understand how it caused the recurrence of malaria.
Due to time constraints, a proof of concept and experiment where conditions and environment like the liver will be created where this device scaled to a macro level will be designed to move against the strong drag forces produced by water like that of the blood flow and follow a specific targeted device providing information such as the targeted device’s movement.
1
“Plasmodium Vivax Liver Stage Development and Hypnozoite Persistence in Human
Liver-Chimeric Mice.” Cell Host & Microbe, vol. 17, no. 4, 8 Apr. 2015, pp.
526–535, www.sciencedirect.com/science/article/pii/S1931312815000682,
10.1016/j.chom.2015.02.011. Accessed 28 Sept. 2020.
2
Alapan, Yunus, et al. “Multifunctional Surface Micro rollers for Targeted Cargo
Delivery in Physiological Blood Flow.” Science Robotics, vol. 5, no. 42, 13 May
2020, 10.1126/scirobotics.aba5726.
3
Kim, Shin‐Young, et al. “Design and Usability Evaluations of a 3D‐Printed Implantable
Drug Delivery Device for Acute Liver Failure in Preclinical Settings.” Advanced
Healthcare Materials, vol. 10, no. 14, 23 June 2021, p. 2100497,
10.1002/adhm.202100497. Accessed 26 Mar. 2022.
Nayantara Das, Psychology and Education
The Prosody of Ambiguous Coordinate Structures in Hindi-English Bilinguals
Project advisor: Mara Breen
Across languages, speakers use different prosodic cues – mainly duration, intensity, and pitch – to signal information structure. Speakers of English, an intonation language, freely manipulate pitch accents and boundary tones to convey semantic and pragmatic content.1 Hindi, being a phrase language, is less variable – sentence melody arises primarily from phrasal tones, making prosodic cues less sensitive to information structure.2 A previous language production experiment on the prosody of ambiguous coordinate structures in German and Hindi elucidated this difference.3 German, similar to English, is an intonation language, and speakers used duration and pitch to signal differences in syntactic structure, resulting in a strong correlation between syntax and prosody. Hindi speakers, on the other hand, showed a lack of correlation between the two – the difference in prosody across six separate syntactic conditions was not significant, and speakers did not successfully resolve ambiguity across sentences. Although these conclusions can be explained by the overall difference in the use of intonation between these two kinds of languages, they also suggest that the proposed mappings among syntax, semantics and prosody is not a universal feature of languages, but instead depends on the language family in question.
Given that intonation and phrase languages use prosody in such different ways, there is a valuable opportunity to study speakers who are fluent in both. This study aims to investigate the differences in prosody between English and Hindi as spoken by bilinguals. Taking inspiration from Féry and Kentner (2010), we conducted a comparable speech production experiment in order to analyze the differences in speakers’ prosodic realizations of sentences that contain ambiguous coordinate structures. These structures were distributed across four sets of names and six separate syntactic conditions. Participants were shown images on a screen, each with four faces separated by coordinations, along with the target sentences written at the bottom of the image. They were then presented a question at the top of their screen and required to respond using the target sentence. In our ongoing analysis of these productions, we focus our attention on the differences between pitch contours and duration values at syntactic boundaries both within and across English and Hindi. Since the participants are bilingual, we also investigate the role of language dominance and transfer, and their relationship to the transparency of the syntax-prosody mapping in these productions.
1 Féry, C. (2017). Intonation and prosodic structure. Cambridge University Press.
2 Patil, U., Kentner, G., Gollrad, A., Kügler, F., Féry, C. and Vasishth, S. (2008). Focus, word order and
intonation in Hindi. Journal of South Asian Linguistics 1, 55-72.
3 Féry, C., & Kentner, G. (2010). The prosody of embedded coordinations in German and Hindi. In Speech
Prosody 2010-Fifth International Conference.