I am a postdoc at Paul G. Allen School of Computer Science & Engineering at University of Washington, advised by Simon S. Du. Previously, I was a postdoc researcher at Department of Electrical and Computer Engineering at Princeton University advised by Yuxin Chen and Jason D. Lee. Before that, I received Ph.D and B.S. in engineering from Department of Automation at Tsinghua University, where I was supervised by Prof. Xiangyang Ji.
I have broad interests in machine learning theory, including reinforcement learning, online learning, bandit theory and game theory.
Selected Works
-
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang, Yuxin Chen, Jason D. Lee, Simon S. Du.
Annual Conference on Learning Theory (COLT), 2024.
We present a new model-based algorithm for learning in the fundamental paradigm of reinforcement learning: the tabular Markov Decision Process (MDP) setting. Our algorithm achieves a regret bound of min{,HK} where is the number of states, is the number of actions, is the number of horizons and is the sample of samples. Notably, this is the first-ever result that matches the minimax lower bound for the entire range of sample size , settling a long-term open problem. -
Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition
Zihan Zhang, Yuan Zhou, Xiangyang Ji.
Conference on Neural Information Processing Systems (NeurIPS), 2020.
We prove the first regret bound that matches the information-theoretic lower bound asymptotically with a model-free algorithm for finite-horizon MDP. The reference-advantage technique developed in this paper has been widely used in the literature. -
Optimal Multi-Distribution Learning
Zihan Zhang, Wenhao Zhan, Yuxin Chen, Simon S. Du Jason D. Lee.
Annual Conference on Learning Theory (COLT), 2024.
We propose a novel algorithm that achieves a sample complexity on the order of(d+k)/ε2 for the multi-distribution learning problem whered is the VC-dimension,k is the number of distributions, andε is the target accuracy. This bound matches the best-known lower bound, and settles an open problem in COLT 2023. -
Anytime Acceleration of Gradient Descent
Zihan Zhang, Jason D. Lee, Simon S. Du, Yuxin Chen.
preprint
For smooth (non-strongly) convex optimization, we propose a stepsize schedule that allows gradient descent to achieve convergence guarantees of O(1/T1.03 ) for any stopping timeT , where the stepsize schedule is predetermined without prior knowledge of the stopping time. This result provides an affirmative answer to a COLT 2024 open problem regarding whether stepsize-based acceleration can yield anytime convergence rates of o(1/T).
Publications and Preprints
* denotes equal contribution,-
Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
Victor Boone, Zihan Zhang.
Conference on Neural Information Processing Systems (NeurIPS), 2024.
-
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang, Yuxin Chen, Jason D. Lee, Simon S. Du.
Annual Conference on Learning Theory (COLT), 2024.
-
Optimal Multi-Distribution Learning
Zihan Zhang, Wenhao Zhan, Yuxin Chen, Simon S. Du, Jason D. Lee.
Annual Conference on Learning Theory (COLT), 2024.
-
Horizon-Free Regret for Linear Markov Decision Processes
Zihan Zhang, Jason D. Lee, Yuxin Chen, Simon S. Du.
International Conference on Learning Representations (ICLR), 2024.
-
Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes
Zihan Zhang, Qiaomin Xie.
Annual Conference on Learning Theory (COLT), 2023.
-
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
Runlong Zhou, Zihan Zhang, Simon S. Du.
International Conference on Machine Learning (ICML), 2023.
-
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning
Zihan Zhang, Yuhang Jiang, Yuan Zhou, Xiangyang Ji.
Conference on Neural Information Processing Systems (NeurIPS), 2022.
-
Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies
Zihan Zhang, Xiangyang Ji, Simon S. Du.
Annual Conference on Learning Theory (COLT), 2022.
-
Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP
Zihan Zhang*, Jiaqi Yang*, Xiangyang Ji, Simon S. Du.
Conference on Neural Information Processing Systems (NeurIPS), 2021.
-
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang, Xiangyang Ji, Simon S. Du.
Annual Conference on Learning Theory (COLT), 2021.
-
Model-Free Reinforement Learning: from Pseudo-Regret to Sample Complexity
Zihan Zhang, Yuan Zhou, Xiangyang Ji.
International Conference on Machine Learning (ICML), 2021.
-
Near Optimal Reward-Free Reinforcement Learning
Zihan Zhang, Simon S. Du, Xiangyang Ji.
International Conference on Machine Learning (ICML), 2021.
-
Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition
Zihan Zhang, Yuan Zhou, Xiangyang Ji.
Conference on Neural Information Processing Systems (NeurIPS), 2020.
-
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function
Zihan Zhang, Xiangyang Ji.
Conference on Neural Information Processing Systems (NeurIPS), 2019.
Teaching
- Foundations of Machine Learning (80250943), 2018 Fall and 2019 Spring, Tsinghua University (Instructor: Prof. Xiangyang Ji) . TA
Services
- Reviewer for ICML, ICLR, NeurIPS, COLT, ALT and AISTATS.
Working Experience
-
University of Washington, Sep 2024 - Present
Postdoc in Paul G. Allen School of Computer Ccience and Engineering -
Princeton University, Apri 2023 - Aug 2024
Postdoc in Department of Electrical and Computer Engineering
Education
-
Tsinghua University, Aug 2017 - Oct 2022
Ph.D. in Control Science and Engineering -
Tsinghua University, Aug 2013 - July 2017
B.S. in Automation