CV | Kevin Galim

Contact Information

Name	Kevin Galim
Professional Title	Senior AI Research Engineer
Email	galimkevin@gmail.com
Website	https://kevingalim.com

Professional Summary

Machine learning researcher working on efficient LLM inference, post-training systems, and accelerator-aware generative model pipelines. First/co-first author of publications at ICLR, ICML, ACL, ECCV, and WACV, with work spanning KV-cache and prompt/context compression, diffusion LLM parallel decoding, and parameter-efficient adaptation for state space models.

Experience

2021 - present

Seoul, South Korea
Senior AI Research Engineer

FuriosaAI
- Conducted research on large-scale generative models and LLMs, including efficient inference, KV-cache optimization, diffusion LLMs, PEFT, state space models, and advanced architectures.
- Studied post-training systems including asynchronous OPD/RL-style pipelines, stale rollout effects, teacher-cache constraints, and throughput–quality trade-offs.
- Co-authored multiple first- and co-first-author papers in top-tier conferences (ICLR, ICML, ACL, CVPR, ECCV, WACV).
- Designed end-to-end pipelines for training and evaluating LLMs, including accelerator-aware rollout generation and inference pipelines on custom AI hardware.
2020 - 2021

Seoul, South Korea
AI / Computer Vision Research and Development Engineer

Funzin

Worked on applied computer vision systems and deep learning models for autonomous and embedded platforms.
- Developed computer-vision models for object detection, segmentation, gesture detection, and autonomous golf cart perception.
- Optimized models for embedded deployment using TensorRT, OpenVINO, Coral, DSP acceleration, and ARM NEON.
- Built and demonstrated a real-time OpenGL 3D surround-view system at CES 2021.
2019 - 2020

Munich, Germany
Web / AR Developer (Freelance)

Dowosoft | Premium Software Development
- Built AR mobile apps, cloud-backed web apps, and cross-platform mobile apps using Unity3D, Flutter, AWS, and Google Cloud.
2015 - 2016

Munich, Germany
C++ / CUDA Software Engineer

ARRI
- Developed GPU-accelerated image-processing algorithms using CUDA and OpenCL.
- Built C++/OpenGL visualization and image-analysis tools for digital cinema workflows.

Ongoing Work

2026

AsyncOPD: How Stale Can On-Policy Distillation Be?

Submitted manuscript.

Studies stale rollouts, KL-direction sensitivity, teacher-cache constraints, estimator design, and throughput–quality trade-offs in asynchronous on-policy distillation pipelines.

Publications

2026

Draft-based Approximate Inference for LLMs

International Conference on Learning Representations (ICLR)

We present a unified framework for approximate inference in long-context LLMs using small draft models to predict token and KV-cache importance. We introduce SpecKV, SpecPC, and SpecKV-PC, enabling more accurate KV-cache and prompt compression while preserving the same efficiency gains in memory usage, latency, and throughput.
2026

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

International Conference on Learning Representations (ICLR)

Analyzes the fundamental limitations of parallel decoding in diffusion LLMs and introduces ParallelBench, the first benchmark designed to measure quality degradation caused by token dependency violations. Reveals key speed–quality trade-offs and highlights the need for new decoding strategies.
2026

Inference-Aligned SFT for Diffusion LLMs via Group-based Trajectory Sampling

ICLR Workshop on Decoding and Generation with Language Models (DeLTa)
2026

TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs

Conference of the European Chapter of the Association for Computational Linguistics (EACL)

We study speculative decoding for Large Vision-Language Models (LVLMs) and benchmark existing drafting strategies across diverse multimodal scenarios. We propose TABED, a training-free adaptive ensemble drafting method that dynamically combines batched drafts, achieving up to 1.74× inference speedup and improved robustness over single-draft approaches.
2026

UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation

IEEE Access

We study compositional text-to-image generation in Masked Generative Transformers, which remain underexplored compared to diffusion models. We propose UNCAGE, a training-free method that leverages contrastive attention guidance to prioritize object-representative tokens during unmasking, improving compositional fidelity with negligible inference overhead.
2025

State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models

Annual Meeting of the Association for Computational Linguistics (ACL)

We investigate parameter-efficient fine-tuning for State Space Models, where prompt-based approaches such as prompt tuning and prefix tuning are ineffective. We propose state-based PEFT methods, including State-offset Tuning, which directly adjusts model states at each timestep to improve adaptation.
2025

Parameter-Efficient Fine-Tuning of State Space Models

International Conference on Machine Learning (ICML)

Introduces Sparse Dimension Tuning (SDT), a parameter-efficient fine-tuning method specifically designed for state space models such as Mamba. By combining SDT for SSM modules with LoRA for projection layers, achieves state-of-the-art performance for adapting SSM-based language models with minimal additional parameters.
2025

Counting Guidance for High Fidelity Text-to-Image Synthesis

Winter Conference on Applications of Computer Vision (WACV) — Oral

We address the challenge of generating the correct number of objects in text-to-image diffusion models. We propose a guidance method that leverages a reference-less counting network and attention-based object masks to steer the denoising process, improving object-count fidelity in generated images.
2024

Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing

European Conference on Computer Vision (ECCV)

We study diffusion inversion for real image editing, where existing methods struggle to balance faithfulness to the source image and alignment with the edit prompt. We propose a diffusion inversion method with time- and region-dependent η control, enabling flexible editing while preserving image fidelity.
2020

Focus on Defocus: Bridging the Synthetic to Real Domain Gap for Depth Estimation

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

We address the generalization challenge in depth estimation, where models trained on synthetic data often fail on real-world scenes due to domain gaps. We propose a method that leverages domain-invariant defocus blur cues and a permutation-invariant network, enabling models trained on synthetic data to generalize effectively to real images.

Education

2016 - 2019

Munich, Germany
Master's Degree

Technical University of Munich

Informatics — Games Engineering
- Specialization combining computer science, computer vision, and high-performance CPU/GPU programming.
- Master’s Thesis: Deep Learning for Video Depth Estimation from Defocus (Grade 1.0) — later published at CVPR 2020.
2017 - 2018

Tokyo, Japan
Research (Semester Abroad)

The University of Tokyo

Computer Graphics
- Implemented a voxel-based rendering engine using C++ and OpenGL.
- Developed an anti-aliasing approach for ray tracing in voxel scenes.
2013 - 2016

Munich, Germany
Bachelor's Degree

Technical University of Munich

Informatics — Games Engineering
- Bachelor’s Thesis: Preconditioners for Tikhonov Regularization in Image Deblurring.
- Research focused on numerical optimization and inverse problems in image restoration.

Skills

LLM Inference & Post-Training (Expert): Speculative Decoding, KV-Cache Optimization, Prompt/Context Compression, On-Policy Distillation, Efficient Inference

Machine Learning Research (Expert): Large Language Models, Diffusion Models, Parameter-Efficient Fine-Tuning, State Space Models, Diffusion LLMs

Computer Vision (Expert): Object Detection, Image Segmentation, Depth Estimation, Generative Models

Programming (Expert): Python, C++, CUDA, PyTorch, TensorFlow, JavaScript

Languages

German : Native or bilingual proficiency

English : Full professional proficiency

Korean : Professional working proficiency (TOPIK Level 5)

Certificates

TOPIK (Test of Proficiency in Korean) — Level 5 - National Institute for International Education (2020)

Contact Information

Professional Summary

Experience

Senior AI Research Engineer

FuriosaAI

AI / Computer Vision Research and Development Engineer

Funzin

Worked on applied computer vision systems and deep learning models for autonomous and embedded platforms.

Web / AR Developer (Freelance)

Dowosoft | Premium Software Development

C++ / CUDA Software Engineer

ARRI

Ongoing Work

AsyncOPD: How Stale Can On-Policy Distillation Be?

Submitted manuscript.

Publications

International Conference on Learning Representations (ICLR)

International Conference on Learning Representations (ICLR)

Inference-Aligned SFT for Diffusion LLMs via Group-based Trajectory Sampling

ICLR Workshop on Decoding and Generation with Language Models (DeLTa)

Conference of the European Chapter of the Association for Computational Linguistics (EACL)

IEEE Access

Annual Meeting of the Association for Computational Linguistics (ACL)

International Conference on Machine Learning (ICML)

Winter Conference on Applications of Computer Vision (WACV) — Oral

European Conference on Computer Vision (ECCV)

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Education

Technical University of Munich

Informatics — Games Engineering

Research (Semester Abroad)

The University of Tokyo

Computer Graphics

Technical University of Munich

Informatics — Games Engineering

Skills

Languages

Certificates