Methods for Nonlinear and Noisy Optimization

Xie, Yuchen

doi:https://doi.org/10.21985/n2-66e6-2297

Work

Methods for Nonlinear and Noisy Optimization

Public

Download PDF

Download All Files (.zip)

In this thesis, we aim to develop efficient algorithms with theoretical guarantees for noisy nonlinear optimization problems, with and without constraints, under various different assumptions. Apart from Chapter 1 which provides relevant backgrounds, the remaining of thesis is divided into four chapters. In Chapter 2, we establish the theoretical convergence results for a modified BFGS method when bounded noise is presented in both function and gradient evaluations. In order to guarantee the stability of BFGS update in the presence of such noise, we propose to modify the classic BFGS method by incorporating a procedure called lengthening, which spaces out the points at which gradient differences are collected when necessary. We show that on strongly convex functions, the proposed BFGS method is globally convergent to a neighborhood of the minimizer, whose size depends on the noise level. The lengthening procedure described in Chapter 2 requires the knowledge of the strongly convex parameter of the objective function, making it difficult to apply the proposed method in realistic settings. In Chapter 3, we build upon the ideas in Chapter 2 and propose a practical noise-tolerant BFGS/limited-memory BFGS (L-BFGS) method which addresses this difficulty. We design a new line search that is capable of performing adaptive lengthening without knowing exogenous function information. We prove that the proposed method is globally convergent on strongly convex functions, and performs better than unmodified BFGS/L-BFGS on noisy optimization problems both in terms of efficiency and the accuracy it reaches. In Chapter 4, we consider the derivative-free optimization (DFO) where we only have access to the noisy objective function but not its derivatives. One approach under this setting is computing an approximation of the derivative using finite-difference, and in order for this approach to be efficient, the finite-difference interval needs to be chosen appropriately. A typical choice is square root of machine epsilon, but this choice can perform poorly when we have noise in function evaluations. To address this issue, we propose an estimation procedure that performs a bisection search on the finite-difference interval to find near-optimal difference interval. We show that under suitable assumptions, the proposed procedure terminates in finite iterations, and outputs optimal differencing intervals (up to constant). In addition, this procedure can be generalized to any finite difference scheme beyond forward difference and central difference, including schemes for estimating higher-order derivatives. We also demonstrate its reliability and accuracy on a number of noisy functions. Apart from bounded noise we consider in previous chapters, another frequently seen type of noise is stochastic noise. In Chapter 5, we consider a constrained stochastic optimization problem, where the objective function is computed in expectation, and the variables are subject to deterministic and simple convex constraints. A common method for such problems is the mini-batching stochastic proximal gradient method, which computes a stochastic gradient for the objective function using a mini-batch of samples, and applies the proximal gradient method. One difficulty in this method is choosing the size of mini-batch appropriately so that the overall sample complexity is kept to a minimum. To achieve this goal, we describe an adaptive sampling strategy, which adaptively increases the batch size as the iterates approach the minimizer. We present global convergence results for the proposed algorithm, on both strongly convex and general convex objective functions. Numerical experiments are also presented to demonstrate the efficacy of the proposed method.

Creator