publications | Kevin Galim

* denotes equal contribution.

2026

ICLR

Draft-based Approximate Inference for LLMs

Kevin Galim^*, Ethan Ewer^*, Wonjun Kang, and 3 more authors

In International Conference on Learning Representations, 2026

Abs arXiv

We present a unified framework for approximate inference in long-context LLMs using small draft models to predict token and KV-cache importance. We introduce SpecKV, SpecPC, and SpecKV-PC, enabling more accurate KV-cache and prompt compression while preserving the same efficiency gains in memory usage, latency, and throughput.
ICLR

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

Wonjun Kang^*, Kevin Galim^*, Seunghyuk Oh^*, and 8 more authors

In International Conference on Learning Representations, 2026

Abs arXiv

Analyzes the fundamental limitations of parallel decoding in diffusion LLMs and introduces ParallelBench, the first benchmark designed to measure quality degradation caused by token dependency violations. Reveals key speed–quality trade-offs and highlights the need for new decoding strategies.
ICLR WS

Inference-Aligned SFT for Diffusion LLMs via Group-based Trajectory Sampling

Seunghyuk Oh, Minjae Lee, Kevin Galim, and 5 more authors

In ICLR Workshop on Decoding and Generation with Language Models (DeLTa), 2026
EACL

TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs

Minjae Lee^*, Wonjun Kang^*, Byeongkeun Ahn, and 6 more authors

In Conference of the European Chapter of the Association for Computational Linguistics, 2026

Abs arXiv

We study speculative decoding for Large Vision-Language Models (LVLMs) and benchmark existing drafting strategies across diverse multimodal scenarios. We propose TABED, a training-free adaptive ensemble drafting method that dynamically combines batched drafts, achieving up to 1.74× inference speedup and improved robustness over single-draft approaches.
IEEE Access

UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation

Wonjun Kang, Byeongkeun Ahn, Minjae Lee, and 4 more authors

IEEE Access, 2026

Abs arXiv

We study compositional text-to-image generation in Masked Generative Transformers, which remain underexplored compared to diffusion models. We propose UNCAGE, a training-free method that leverages contrastive attention guidance to prioritize object-representative tokens during unmasking, improving compositional fidelity with negligible inference overhead.

2025

ACL

State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models

Wonjun Kang^*, Kevin Galim^*, Yuchen Zeng^*, and 3 more authors

In Annual Meeting of the Association for Computational Linguistics, 2025

Abs arXiv

We investigate parameter-efficient fine-tuning for State Space Models, where prompt-based approaches such as prompt tuning and prefix tuning are ineffective. We propose state-based PEFT methods, including State-offset Tuning, which directly adjusts model states at each timestep to improve adaptation.
ICML

Parameter-Efficient Fine-Tuning of State Space Models

Kevin Galim^*, Wonjun Kang^*, Yuchen Zeng^*, and 2 more authors

In International Conference on Machine Learning, 2025

Abs arXiv

Introduces Sparse Dimension Tuning (SDT), a parameter-efficient fine-tuning method specifically designed for state space models such as Mamba. By combining SDT for SSM modules with LoRA for projection layers, achieves state-of-the-art performance for adapting SSM-based language models with minimal additional parameters.
WACV

Counting Guidance for High Fidelity Text-to-Image Synthesis

Wonjun Kang^*, Kevin Galim^*, Hyung Il Koo, and 1 more author

In Winter Conference on Applications of Computer Vision, 2025

Oral

Abs arXiv

We address the challenge of generating the correct number of objects in text-to-image diffusion models. We propose a guidance method that leverages a reference-less counting network and attention-based object masks to steer the denoising process, improving object-count fidelity in generated images.

2024

ECCV

Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing

Wonjun Kang^*, Kevin Galim^*, and Hyung Il Koo

In European Conference on Computer Vision, 2024

Abs arXiv

We study diffusion inversion for real image editing, where existing methods struggle to balance faithfulness to the source image and alignment with the edit prompt. We propose a diffusion inversion method with time- and region-dependent η control, enabling flexible editing while preserving image fidelity.

2020

CVPR

Focus on Defocus: Bridging the Synthetic to Real Domain Gap for Depth Estimation

Maxim Maximov, Kevin Galim, and Laura Leal-Taixé

In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Abs arXiv

We address the generalization challenge in depth estimation, where models trained on synthetic data often fail on real-world scenes due to domain gaps. We propose a method that leverages domain-invariant defocus blur cues and a permutation-invariant network, enabling models trained on synthetic data to generalize effectively to real images.