Working Papers
- Econometric Inference with Machine-Learned Proxies: Partial Identification via Data Combination
This version: Apr 2026
Abstract Empirical researchers increasingly use upstream machine-learning (ML) methods to construct proxies for latent target variables from complex, unstructured data. A naive plug-in use of such proxies in downstream econometric models, however, can lead to biased estimation and invalid inference. This paper develops a framework for partial identification and inference in general moment models with ML-generated proxies. Our approach does not require restrictive assumptions on the upstream ML procedure, such as consistency or known convergence rates, nor does it require a complete validation sample containing all variables used in the downstream analysis. Instead, we assume access to two datasets: a downstream sample containing observed covariates and the proxy, and an auxiliary validation sample containing joint observations on the proxy and its target variable. We treat the proxy as a linking variable between these two samples, rather than as a literal noisy substitute for the latent target variable. Building on this idea, we develop a sharp identification strategy based on an unconditional optimal transport characterization and an inference procedure that controls asymptotic size using analytical critical values without resampling. Monte Carlo simulations show reliable size control and informative confidence sets across a range of predictive-accuracy scenarios. - Finite Sample Inference in Incomplete Models (with Marc Henry)
Revise and Resubmit, Journal of Political Economy. This Version: Oct 2025
Abstract We propose confidence regions for the parameters of incomplete models with exact coverage of the true parameter in finite samples. Our confidence region inverts a test, which generalizes Monte Carlo tests to incomplete models. The test statistic is a discrete analogue of a new optimal transport characterization of the sharp identified region. Both test statistic and critical values rely on simulation drawn from the distribution of latent variables and are computed using solutions to discrete optimal transport, hence linear programming problems. We also propose a fast preliminary search in the parameter space with an alternative, more conservative yet consistent test, based on a parameter free critical value. - Production Function Estimation without Invertibility: Imperfectly Competitive Environments and Demand Shocks (with Ulrich Doraszelski)
Revise and Resubmit, Econometrica. This Version: July 2025
Abstract We advance the proxy variable approach to production function estimation. We show that the invertibility assumption at its heart is testable. We characterize what goes wrong if invertibility fails and what can still be done. We show that rethinking how the estimation procedure is implemented either eliminates or mitigates the bias that arises if invertibility fails. In particular, a simple change to the first step of the estimation procedure provides a first-order bias correction for the GMM estimator in the second step. Furthermore, a modification of the moment condition in the second step ensures Neyman orthogonality and enhances efficiency and robustness by rendering the asymptotic distribution of the GMM estimator invariant to estimation noise from the first step. - Robust Counterfactuals in Centralized Schools Choice Systems: Addressing Gender Inequality in STEM Education (with Ismaël Mourifié)
This Version: Dec 2025
Abstract Counterfactual analysis is central to education market design and provides a foundation for credible policy recommendations. We develop a novel methodology for counterfactual analysis in Gale-Shapley deferred-acceptance (DA) assignment mechanisms under a weaker set of assumptions than those typically imposed in existing empirical works. Instead of fully specifying utility functions or students’ beliefs about admission probabilities, we rely on interpretable restrictions on behavior that yield an incomplete but flexible model of preferences. We address the core challenge that partial identification poses for counterfactual analysis by showing that sharp bounds on counterfactual stable matching outcomes can be computed efficiently through a combination of algorithmic techniques and integer programming. We illustrate the methodology by evaluating policies aimed at increasing female enrollment in STEM fields in Chile. - Identification and Counterfactual Analysis in Incomplete Models with Support and Moment Restrictions
This Version: Mar 2026
(this paper supersedes my job market paper, originally circulated under the title “Identification of Structural and Counterfactual Parameters in a Large Class of Structural Econometric Models.” [old version])
Abstract This paper develops a unified identification framework for counterfactual analysis in incomplete models characterized by support and moment restrictions. I demonstrate that identifying structural parameters and conducting counterfactual analyses are isomorphic tasks. By embedding counterfactual restrictions within an augmented structural model specification, this approach bypasses the conventional “estimate-then-simulate” workflow and the need to simulate outcomes from models with set predictions. To make this approach operational, I extend sharp identification results for the support-function approach beyond the integrable boundedness condition that is imposed in sharp random-set characterizations but may be violated in economically relevant counterfactual analyses. Under minimal regularity conditions, I prove that the support-function approach remains sharp for the moment closure of the identified set. Furthermore, I introduce an irreducibility condition requiring all support implications to be made explicit. I show that for irreducible models, the identified set and its moment closure are statistically indistinguishable in finite samples. Together, these results justify using support-function methods in counterfactual settings where traditional sharpness fails and clarify the distinct roles of support and moment restrictions in empirical practice. - A Generalized Control Function Approach to Production Function Estimation (with Ulrich Doraszelski)
This Version: Dec 2025
Abstract We develop a generalized control function approach to production function estimation. Our approach accommodates settings in which productivity evolves jointly with other unobservable factors such as latent demand shocks and the invertibility assumption underpinning the traditional proxy variable approach fails. We provide conditions under which the output elasticity of the variable input — and hence the markup — is nonparametrically point-identified. A Neyman orthogonal moment condition ensures oracle efficiency of our GMM estimator. A Monte Carlo exercise shows a large bias for the traditional approach that decreases rapidly and nearly vanishes for our generalized control function approach.
Publications
- Discordant relaxations of misspecified models (with Désiré Kédagni and Ismaël Mourifié) Quantitative Economics 15 (2): 331–79.
Working in Progress
- A General Method for Demand Inversion
- A Moment Inequality Approach via Optimal Transport Duality