CV

Contact Information

Name Kevin Galim
Professional Title Senior AI Research Engineer
Email galimkevin@gmail.com

Professional Summary

Machine learning researcher specializing in efficient inference for large-scale generative models and LLM systems. Author of 10+ publications at major ML conferences including ICLR, ICML, ACL, CVPR, ECCV, and WACV.

Experience

  • 2021 - present

    Seoul, South Korea

    Senior AI Research Engineer
    FuriosaAI
    • Conducted research on large-scale generative models and LLMs, including efficient inference (KV-cache optimization, diffusion LLMs), scalable training (PEFT, state space models), and advanced architectures.
    • Co-authored multiple first- and co-first-author papers in top-tier conferences (ICLR, ICML, ACL, CVPR, ECCV, WACV).
    • Designed end-to-end pipelines for training, evaluating, and deploying LLMs on custom AI accelerators.
    • Built and demonstrated real-time computer vision demos on custom hardware at CVPR 2022 and 2023.
  • 2020 - 2021

    Seoul, South Korea

    AI / Computer Vision Research and Development Engineer
    Funzin
    Worked on applied computer vision systems and deep learning models for autonomous and embedded platforms.
    • Developed and trained deep learning models for object detection, road/sidewalk segmentation, and gesture detection using PyTorch and TensorFlow.
    • Built perception systems for an autonomous golf cart platform, including detection and segmentation pipelines.
    • Optimized models for embedded deployment using TensorRT, OpenVINO, Coral, DSP acceleration, and ARM NEON.
    • Developed and calibrated a real-time OpenGL 3D surround-view system (SVM) for embedded hardware, demonstrated at CES 2021.
  • 2019 - 2020

    Munich, Germany

    Web / AR Developer (Freelance)
    Dowosoft | Premium Software Development
    • Built AR mobile applications using Unity3D.
    • Developed cloud-backed web applications using AWS and Google Cloud.
    • Created cross-platform mobile apps using Flutter.
  • 2015 - 2016

    Munich, Germany

    C++ / CUDA Software Engineer
    ARRI
    • Developed GPU-accelerated image processing algorithms using CUDA and OpenCL.
    • Built real-time visualization and image analysis tools using C++ and OpenGL.
    • Contributed to software used in professional digital cinema workflows.

Education

  • 2016 - 2019

    Munich, Germany

    Master's Degree
    Technical University of Munich
    Informatics — Games Engineering
    • Specialization combining computer science, computer vision, and high-performance CPU/GPU programming.
    • Master’s Thesis: Deep Learning for Video Depth Estimation from Defocus (Grade 1.0) — later published at CVPR 2020.
  • 2017 - 2018

    Tokyo, Japan

    Research (Semester Abroad)
    The University of Tokyo
    Computer Graphics
    • Implemented a voxel-based rendering engine using C++ and OpenGL.
    • Developed an anti-aliasing approach for ray tracing in voxel scenes.
  • 2013 - 2016

    Munich, Germany

    Bachelor's Degree
    Technical University of Munich
    Informatics — Games Engineering
    • Bachelor’s Thesis: Preconditioners for Tikhonov Regularization in Image Deblurring.
    • Research focused on numerical optimization and inverse problems in image restoration.

Publications

  • 2026
    Draft-based Approximate Inference for LLMs
    International Conference on Learning Representations (ICLR)

    We present a unified framework for approximate inference in long-context LLMs using small draft models to predict token and KV-cache importance. We introduce SpecKV, SpecPC, and SpecKV-PC, enabling more accurate KV-cache and prompt compression while preserving the same efficiency gains in memory usage, latency, and throughput.

  • 2026
    ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
    International Conference on Learning Representations (ICLR)

    Analyzes the fundamental limitations of parallel decoding in diffusion LLMs and introduces ParallelBench, the first benchmark designed to measure quality degradation caused by token dependency violations. Reveals key speed–quality trade-offs and highlights the need for new decoding strategies.

  • 2026
    Inference-Aligned SFT for Diffusion LLMs via Group-based Trajectory Sampling
    ICLR Workshop on Decoding and Generation with Language Models (DeLTa)
  • 2026
    TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs
    Conference of the European Chapter of the Association for Computational Linguistics (EACL)

    We study speculative decoding for Large Vision-Language Models (LVLMs) and benchmark existing drafting strategies across diverse multimodal scenarios. We propose TABED, a training-free adaptive ensemble drafting method that dynamically combines batched drafts, achieving up to 1.74× inference speedup and improved robustness over single-draft approaches.

  • 2026
    UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation
    IEEE Access

    We study compositional text-to-image generation in Masked Generative Transformers, which remain underexplored compared to diffusion models. We propose UNCAGE, a training-free method that leverages contrastive attention guidance to prioritize object-representative tokens during unmasking, improving compositional fidelity with negligible inference overhead.

  • 2025
    State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models
    Annual Meeting of the Association for Computational Linguistics (ACL)

    We investigate parameter-efficient fine-tuning for State Space Models, where prompt-based approaches such as prompt tuning and prefix tuning are ineffective. We propose state-based PEFT methods, including State-offset Tuning, which directly adjusts model states at each timestep to improve adaptation.

  • 2025
    Parameter-Efficient Fine-Tuning of State Space Models
    International Conference on Machine Learning (ICML)

    Introduces Sparse Dimension Tuning (SDT), a parameter-efficient fine-tuning method specifically designed for state space models such as Mamba. By combining SDT for SSM modules with LoRA for projection layers, achieves state-of-the-art performance for adapting SSM-based language models with minimal additional parameters.

  • 2025
    Counting Guidance for High Fidelity Text-to-Image Synthesis
    Winter Conference on Applications of Computer Vision (WACV) — Oral

    We address the challenge of generating the correct number of objects in text-to-image diffusion models. We propose a guidance method that leverages a reference-less counting network and attention-based object masks to steer the denoising process, improving object-count fidelity in generated images.

  • 2024
    Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing
    European Conference on Computer Vision (ECCV)

    We study diffusion inversion for real image editing, where existing methods struggle to balance faithfulness to the source image and alignment with the edit prompt. We propose a diffusion inversion method with time- and region-dependent η control, enabling flexible editing while preserving image fidelity.

  • 2020
    Focus on Defocus: Bridging the Synthetic to Real Domain Gap for Depth Estimation
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    We address the generalization challenge in depth estimation, where models trained on synthetic data often fail on real-world scenes due to domain gaps. We propose a method that leverages domain-invariant defocus blur cues and a permutation-invariant network, enabling models trained on synthetic data to generalize effectively to real images.

Skills

LLM Inference Optimization (Expert): Speculative Decoding, KV-Cache Optimization, Diffusion LLMs, Efficient Inference
Machine Learning Research (Expert): Parameter-Efficient Fine-Tuning, State Space Models, Large Language Models, Diffusion Models
Computer Vision (Expert): Object Detection, Depth Estimation, Image Segmentation, Generative Models
Systems & Hardware (Advanced): CUDA, TensorRT, OpenVINO, ARM NEON, Custom AI Accelerators
Programming (Expert): Python, PyTorch, C++, CUDA, TensorFlow, JavaScript

Languages

German : Native or bilingual proficiency
English : Full professional proficiency
Korean : Professional working proficiency (TOPIK Level 5)

Certificates

  • TOPIK (Test of Proficiency in Korean) — Level 5 - National Institute for International Education (2020)