Career Profile
I am currently double majoring in the School of Computing and the School of Electrical Engineering at KAIST.
My research and experience have focused on large language models and automatic speech recognition, with topics broadly divided into (1) AI-driven education and (2) AI fairness. While working at Ringle, an English-speaking education startup, I spearheaded the planning and development of an AI-driven engine that predicts IELTS/TOEFL speaking scores by analyzing conversations between tutors and learners. Recognizing the low performance of automatic speech recognition (ASR) for the second language learners, I was dispatched to KAIST Interaction Lab (KIXLAB) as a visiting researcher affiliated with Ringle to conduct related research. I constructed a dataset of non-native English learners’ spontaneous speech and analyzed the reasons behind the low ASR performance for the second language learners. The paper about the research has been accepted for INTERSPEECH 2024. Furthermore, I conducted a course work project proposing a new evaluation benchmark task by attempting to measure the correlation between the knowledge level of large language models and fairness. I have also worked on projects such as generating 3D mesh from Single-View Multi-object images, winning the best poster awards.
Currently, I am interested in delving more mathematically into the field of artificial intelligence, especially minority group data augmentation and convex optimization.
Experiences
While working at Ringle, an English-speaking education startup, I spearheaded the planning and development of an AI-driven engine, named CAF, that predicts IELTS/TOEFL speaking scores by analyzing conversations between tutors and learners.
- Defined features related to the English proficiency of learners.
- Conducted analysis on both spoken and written language.
- Analyzed the performance of ASR for non-native English learners.
- Developed sentence structure classification algorithms through part-of-speech (POS) and dependency analysis.
- Collected ground truth data for IELTS speaking levels and addressed data imbalance issues through data augmentation.
- Conducted feature engineering based on the defined features.
- Developed a final prediction model by using the augmented data and the extracted features.
Recognizing the low performance of ASR for second language (L2) learners, I was dispatched to KAIST Interaction Lab (KIXLAB) as a visiting researcher affiliated with Ringle. I have published a paper about ASR for L2 spontaneous speech as the first author, which has been accepted for INTERSPEECH 2024. Additionally, I filed two patents under the names <L2 Speaker Annotation System> and <Pronunciation Diagnosis and Learning System>.
- Constructed a dataset consisting of 50.04 hours of audio with fully transcribed transcriptions.
- Defined new features attributed to ASR errors from L2 speech through linguistic analysis of transcriptions.
- Constructed human annotation experiment to measure newly defined features.
- Developed the automatic detection system to detect the features.
- Improved the performance of state-of-the-art models by fine-tuning with the constructed dataset.
- Attempted to synthesize non-native speaker-like audio that can mimic the defined features.
Publications
Projects
This project addresses the issue of synthesizing meshes from single-view images containing multiple objects. We inferred the relative distances between objects using depth maps image segmentation and stable diffusion inpainting, and based on this, we chose an approach to reassemble each synthesized mesh in a 3D scene.
The project proposes a new benchmark task for evaluating large language models (LLMs) by demonstrating the correlation between the level of knowledge in LLMs and the degree of bias. We utilize the concept of markedness and statistical methods, Finghtin’ Words method, to measure it.
This project demonstrate message passing GNN models, especially GCN, loses cyclic information theoretically and experimentally. We propose a novel approach using cycle nodes and cycle size dimension, and it performs significantly higher than prioir works.
This project addresses the issue of low performance in automatic speech recognition (ASR) for long-form spontaneous speech. By employing force alignment, we identified the positions of silence and filler words in the original audio and either removed them or replaced them with shorter silences, resulting in a 56% improvement in ASR performance.
This project involves applying StyleGAN at the application level. We took real soccer players’ images as inputs and carried out the task of transforming them into images in the style of FIFA online game graphics. Throughout this project, I acquired an understanding of GANs and learned techniques for efficient training by freezing each layer of the GAN.