学术活动

学术报告

当前位置: 首页 - 学术活动 - 学术报告 - 正文

Generating Instrumental Variables via Random Forest to Address Endogeneity due to Prediction (Measurement) Error in Data-Mined Variables

发布日期:2018-11-29

点击量:

主讲人 时间
地点

杨漠尘

印第安纳大学凯莱商学院助理教授

【主讲】杨漠尘,印第安纳大学凯莱商学院助理教授

【主题】通过随机森林产生工具变量来解决数据挖掘变量预测(度量)错误产生的内生性问题

【时间】2018年11月29日(周四)15:00-16:30pm

【地点】清华经管学院伟伦楼513

【语言】英语

【Speaker】Monchen Yang, Indiana University Kelley School of Business,Assistant Professor

【Topic】Generating Instrumental Variables via Random Forest to Address Endogeneity due to Prediction (Measurement) Error in Data-Mined Variables

【Time】Thursday, Nov. 29, 2018, 15:00-16:30pm

【Venue】Room 513, Weilun Building, Tsinghua SEM

【Language】English

【Abstract】The practice of combining machine learning with econometric analysis is increasingly prevalent in both research and practice. In this work, we address one common example: the use of predictive modeling techniques to "mine" variables of interest from unstructured data, e.g., predicting sentiment from text, followed by the inclusion of those variables into an econometric framework, with the objective of making statistical inferences. We consider recent work, which highlights that, because the predictions from machine learning models are inevitably imperfect, econometric analyses involving the predicted variables will suffer from biases and endogeneity deriving from measurement error. We propose a novel approach that mitigates these biases, leveraging instrumental variables generated from an ensemble learning technique known as the random forest. The random forest algorithm performs best when comprised of a set of trees that are individually accurate in their predictions, and which make "different" mistakes, i.e., have weakly correlated prediction errors. A key observation is that these properties are close analogs for the relevance and exclusion requirements for a valid instrumental variable. We design a data-driven procedure to select tuples of individual trees from a random forest, in which one tree serves as the endogenous covariate and the other trees as its instruments. Simulation experiments demonstrate the efficacy of the proposed approach in mitigating estimation biases, and its superior performance relative to an alternative method (simulation-extrapolation) proposed in prior work for addressing this problem.

关闭

地址:清华大学经济管理学院伟伦楼447(100084)

邮箱:rccm@mail.tsinghua.edu.cn

电话:010-62771663

传真:010-62784555

Copyright 2025清华大学现代管理研究中心 版权所有