I. Introduction
Discrete choice models (DCM) have become an essential operational tool in modeling individual behavior. Many success stories have been reported in the scientific studies in transportation, marketing, health, or economics, among others. Estimating the parameters of those models requires to solve an optimization problem and yet, optimization algorithms are rarely mentioned in the discrete choice literature. One reason may be that classic nonlinear optimization algorithms (i.e., Newton-Raphson method) have been rather successful in estimating discrete choice parameters on available data sets of limited size. Thanks to recent advances in data collection, abundant data about choice situations become more and more available. While offering a rich potential for a better understanding of human choices, these new data sources also bring new challenges for the community. Indeed algorithms classically embedded in state-of-the-art discrete choice software's (such as Biogeme [1] or Larch [2]) can be computationally burdensome on these massive datasets.