I am a final year CS PhD student at EPFL working with Michael Kapralov. I am broadly interested in Agents, LLM post-training, and efficient inference, with recent work on data selection for on policy LLM distillation, fast attention, KV cache compression, and quantization for scalable vector retrieval. In the past, I have also worked on fast algorithms for large-scale and high-dimensional data analysis and numerical linear algebra.

I am on the industry job market for Applied AI and LLM post-training roles!

My recent work focuses on:

  • Data selection for post-training LLM distillation and fine-tuning .
  • Efficient kernels for TurboQuant and other quantization methods for scalable graph based vector retrieval.
  • Fast attention, KV cache compression, and long-context inference.

Experience:

  • Student Researcher, Google Research, Zurich, Switzerland (October 2025-February 2026): Developed efficient kernels for dot product vector quantization for graph based vector retrieval algorithms, and efficient data selection methods for post-training LLM distillation.
  • Applied Science Intern, Amazon, Luxembourg, Luxembourg (July 2024-January 2025): Developed applied ML and optimization tools for Amazon's internal customers to reduce operational costs.
  • Research Intern, Center for Data Driven Discovery, California Institute of Technology, Pasadena, USA (May 2017-July 2017): Developed software for processing telescope data about astronomical objects.

News

  • October 2025: Started as a Student Researcher at Google Research.
  • September 2025: Paper on BalanceKV, a novel KV cache compression method, accepted to NeurIPS 2025 as Spotlight.
  • January 2025: Paper accepted to ICLR 2025 (first author).
  • July 2024: “Improved Algorithms for Kernel Matrix-Vector Multiplication” won Best Paper at the ICML 2024 Workshop on Long Context Foundation Models.
  • July 2024: Started as an Applied Science Intern at Amazon Research.

Publications

Full list also on Google Scholar.

Streaming Attention Approximation via Discrepancy Theory.
Ekaterina Kochetkova, Kshiteej Sheth, Insu Han, Amir Zandieh, Michael Kapralov.
NeurIPS 2025 (Spotlight).
[arXiv | Code]

Improved Algorithms for Kernel Matrix-Vector Multiplication.
(alphabetical) Piotr Indyk, Michael Kapralov, Kshiteej Sheth, Tal Wagner.
ICLR 2025 (Poster) (I was first author). Best Paper at ICML 2024 Workshop on Long Context Foundation Models.
[OpenReview | Workshop]

Sublinear Time Low-Rank Approximation of Hankel Matrices.
Michael Kapralov, Cameron Musco, Kshiteej Sheth.
SODA 2026.
[arXiv]

Sublinear Time Low-Rank Approximation of Toeplitz Matrices.
Cameron Musco, Kshiteej Sheth.
SODA 2024.
[arXiv]

Toeplitz Low-Rank Approximation with Sublinear Query Complexity.
Michael Kapralov, Hannah Lawrence, Mikhail Makarov, Cameron Musco, Kshiteej Sheth.
SODA 2023.
[arXiv]

Towards Non-Uniform k-Center with Constant types of Radii.
Xinrui Jia, Lars Rohwedder, Kshiteej Sheth, Ola Svensson.
SOSA 2022.
[arXiv]

Fair Colorful k-Center Clustering.
Xinrui Jia, Kshiteej Sheth, Ola Svensson.
Mathematical Programming, 2021. Preliminary version in IPCO, 2020.
[arXiv | Talk | Journal]

Improved linear embeddings via Lagrange duality.
Kshiteej Sheth, Dinesh Garg, Anirban Dasgupta.
Machine Learning (Springer), 2019.
[Paper]

Deep-learnt classification of light curves.
Ashish Mahabal, Kshiteej Sheth, Fabian Gieseke, Akshay Pai, S. George Djorgovski, Andrew J. Drake, Matthew J. Graham.
SSCI 2017.
[arXiv]

Service

  • Conference review: ICLR, NeurIPS, ICML.