Zihan Zhang's Homepage

I am an assistant professor at Department of CSE , HKUST. I am seeking students interested in machine learning theory, spanning both foundational topics (statistical ML, online learning, game theory) and modern research into the theoretical principles underlying popular practical algorithms. Feel free to reach me at zsubfun@outlook.com. Previously I was a postdoc at Paul G. Allen School of Computer Science & Engineering at University of Washington and Department of Electrical and Computer Engineering at Princeton University, advised by Simon S. Du, Yuxin Chen and Jason D. Lee. Before that, I received Ph.D and B.S. in engineering from Department of Automation at Tsinghua University, where I was supervised by Prof. Xiangyang Ji.

I have broad interests in machine learning theory, including reinforcement learning, online learning, bandit theory and game theory.

Selected Works

Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang, Yuxin Chen, Jason D. Lee, Simon S. Du.
Annual Conference on Learning Theory (COLT), 2024.
Extended version accepted by Journal of the ACM
We present a new model-based algorithm for learning in the fundamental paradigm of reinforcement learning: the tabular Markov Decision Process (MDP) setting. Our algorithm achieves a regret bound of min{ $H \sqrt{SAHK}$ , HK} where $S$ is the number of states, $A$ is the number of actions, $H$ is the number of horizons and $K$ is the sample of samples. Notably, this is the first-ever result that matches the minimax lower bound for the entire range of sample size $K ≥ 1$ , settling a long-term open problem.

Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition
Zihan Zhang, Yuan Zhou, Xiangyang Ji.
Conference on Neural Information Processing Systems (NeurIPS), 2020.
We prove the first regret bound that matches the information-theoretic lower bound asymptotically with a model-free algorithm for finite-horizon MDP. The reference-advantage technique developed in this paper has been widely used in the literature.

Optimal Multi-Distribution Learning
Zihan Zhang, Wenhao Zhan, Yuxin Chen, Simon S. Du Jason D. Lee.
Annual Conference on Learning Theory (COLT), 2024.
Extended version accepted by Journal of the ACM
We propose a novel algorithm that achieves a sample complexity on the order of (d+k)/ε² for the multi-distribution learning problem where d is the VC-dimension, k is the number of distributions, and ε is the target accuracy. This bound matches the best-known lower bound, and settles an open problem in COLT 2023.

Anytime Acceleration of Gradient Descent
Zihan Zhang, Jason D. Lee, Simon S. Du, Yuxin Chen.
Annual Conference on Learning Theory (COLT), 2025.
For smooth (non-strongly) convex optimization, we propose a stepsize schedule that allows gradient descent to achieve convergence guarantees of O(1/T^1.03 ) for any stopping time T, where the stepsize schedule is predetermined without prior knowledge of the stopping time. This result provides an affirmative answer to a COLT 2024 open problem regarding whether stepsize-based acceleration can yield anytime convergence rates of o(1/T).

* denotes equal contribution,

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits
Zihan Zhang, Xiangyang Ji, Yuan Zhou.
International Conference on Learning Representations (ICLR), 2025.

Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
Victor Boone, Zihan Zhang.
Conference on Neural Information Processing Systems (NeurIPS), 2024.

Horizon-Free Regret for Linear Markov Decision Processes
Zihan Zhang, Jason D. Lee, Yuxin Chen, Simon S. Du.
International Conference on Learning Representations (ICLR), 2024.

Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes
Zihan Zhang, Qiaomin Xie.
Annual Conference on Learning Theory (COLT), 2023.

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
Runlong Zhou, Zihan Zhang, Simon S. Du.
International Conference on Machine Learning (ICML), 2023.

Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning
Zihan Zhang, Yuhang Jiang, Yuan Zhou, Xiangyang Ji.
Conference on Neural Information Processing Systems (NeurIPS), 2022.

Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies
Zihan Zhang, Xiangyang Ji, Simon S. Du.
Annual Conference on Learning Theory (COLT), 2022.

Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP
Zihan Zhang*, Jiaqi Yang*, Xiangyang Ji, Simon S. Du.
Conference on Neural Information Processing Systems (NeurIPS), 2021.

Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang, Xiangyang Ji, Simon S. Du.
Annual Conference on Learning Theory (COLT), 2021.

Model-Free Reinforement Learning: from Pseudo-Regret to Sample Complexity
Zihan Zhang, Yuan Zhou, Xiangyang Ji.
International Conference on Machine Learning (ICML), 2021.

Near Optimal Reward-Free Reinforcement Learning
Zihan Zhang, Simon S. Du, Xiangyang Ji.
International Conference on Machine Learning (ICML), 2021.

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function
Zihan Zhang, Xiangyang Ji.
Conference on Neural Information Processing Systems (NeurIPS), 2019.

Foundations of Machine Learning (80250943), 2018 Fall and 2019 Spring, Tsinghua University (Instructor: Prof. Xiangyang Ji) . TA

Reviewer for Operations Research, JMLR, ICML, ICLR, NeurIPS, COLT, ALT and AISTATS.

University of Washington, Sep 2024 - Present
Postdoc in Paul G. Allen School of Computer Ccience and Engineering

Princeton University, Apri 2023 - Aug 2024
Postdoc in Department of Electrical and Computer Engineering

Tsinghua University, Aug 2017 - Oct 2022
Ph.D. in Control Science and Engineering