About Me
I am an M.S. candidate at Machine Learning and Intelligence Lab (MLILAB) in Korea Advanced Institute of Science and Technology (KAIST), advised by Prof.Eunho Yang.
My research broadly focuses on large language models. Recently, I have concentrated on improving data efficiency in reinforcement learning and addressing the overthinking problem in LLMs through reinforcement learning-based approaches.
Experiences
While working at Ringle, an English-speaking education startup, I spearheaded the planning, development, and deployment of an AI-driven engine, CAF, which predicts IELTS and TOEFL speaking scores by analyzing conversations between tutors and learners.
- Created a speech dataset labeled with IELTS speaking levels and addressed data imbalance through data augmentation.
- Analyzed the causes of low automatic speech recognition (ASR) performance for non-native English learners and improved it via supervised fine-tuning.
- Defined features related to English proficiency and developed algorithms to extract these features from both speech and transcribed text.
- Developed and deployed a model to predict IELTS and TOEFL speaking levels based on the extracted features.
I was dispatched to KAIST Interaction Lab (KIXLAB) as a visiting researcher affiliated with Ringle, under the supervision of Prof.Juho Kim. I have published a paper, titled 「LearnerVoice: A Dataset of Non-Native English Learners’ Spontaneous Speech」, as the first author, which has been accepted for INTERSPEECH 2024. Additionally, I filed two patents: “System for Learning English Speaking and Method Thereof” and “System for Diagnosing and Learning Pronunciation and Method Thereof”.
- Constructed a dataset of L2 learners’ spontaneous speech with fully transcribed transcriptions.
- Defined new features related to ASR errors in L2 speech through linguistic analysis of transcriptions and human annotation.
- Improved the performance of state-of-the-art models by fine-tuning them on the constructed dataset.
- Explored synthesizing non-native speaker-like audio that can mimic the distributions of defined features.
Publications
Projects
This project addresses the issue of synthesizing meshes from single-view images containing multiple objects. We inferred the relative distances between objects using depth maps image segmentation and stable diffusion inpainting, and based on this, we chose an approach to reassemble each synthesized mesh in a 3D scene.
The project proposes a new benchmark task for evaluating large language models (LLMs) by demonstrating the correlation between the level of knowledge in LLMs and the degree of bias. We utilize the concept of markedness and statistical methods, Finghtin’ Words method, to measure it.
This project demonstrate message passing GNN models, especially GCN, loses cyclic information theoretically and experimentally. We propose a novel approach using cycle nodes and cycle size dimension, and it performs significantly higher than prioir works.
This project addresses the issue of low performance in automatic speech recognition (ASR) for long-form spontaneous speech. By employing force alignment, we identified the positions of silence and filler words in the original audio and either removed them or replaced them with shorter silences, resulting in a 56% improvement in ASR performance.
This project involves applying StyleGAN at the application level. We took real soccer players’ images as inputs and carried out the task of transforming them into images in the style of FIFA online game graphics. Throughout this project, I acquired an understanding of GANs and learned techniques for efficient training by freezing each layer of the GAN.