Papers
1–4 of 4A General Neural Backbone for Mixed-Integer Linear Optimization via Dual Attention
Mixed-integer linear programming (MILP), a widely used modeling framework for combinatorial optimization, are central to many scientific and engineering applications, yet remains computationally chall...
Stein-Rule Shrinkage for Stochastic Gradient Estimation in High Dimensions
Stochastic gradient methods are central to large-scale learning, yet their analysis typically treats mini-batch gradients as unbiased estimators of the population gradient. In high-dimensional setting...
Divide and Learn: Multi-Objective Combinatorial Optimization at Scale
Multi-objective combinatorial optimization seeks Pareto-optimal solutions over exponentially large discrete spaces, yet existing methods sacrifice generality, scalability, or theoretical guarantees. W...
The Effect of Mini-Batch Noise on the Implicit Bias of Adam
With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many task...