Abstract: In the context of building predictive models, predictability is usually considered a blessing. After all – that is the goal: build the model that has the highest predictive performance. The rise of ‘big data’ has in fact vastly improved our ability to predict human behavior thanks to the introduction of much more informative features. However, in practice things are more differentiated than that. For many applications, the relevant outcome is observed for very different reasons. In such mixed scenarios, the model will automatically gravitate to the one that is easiest to predict at the expense of the others. This even holds if the predictable scenario is by far less common or relevant. We present a number of applications where this happens: clicks on ads being performed ‘intentionally’ vs. ‘accidentally’, consumers visiting store locations vs. their phones pretending to be there, and finally customers filling out online forms vs. bots defrauding the advertising industry. In conclusion, the combination of different and highly informative features can have significantly negative impact on the usefulness of predictive modeling.
Biography: Claudia Perlich leads the machine learning efforts that power Dstillery’s digital intelligence for marketers and media companies. With more than 50 published scientific articles, she is a widely acclaimed expert on big data and machine learning applications, and an active speaker at data science and marketing conferences around the world.
Claudia is the past winner of the Advertising Research Foundation’s (ARF) Grand Innovation Award and has been selected for Crain’s New York’s 40 Under 40 list, Wired Magazine’s Smart List, and Fast Company’s 100 Most Creative People.
Claudia holds multiple patents in machine learning. She has won many data mining competitions and awards at Knowledge Discovery and Data Mining (KDD) conferences, and served as the organization’s General Chair in 2014.
Prior to joining Dstillery in 2010, Claudia worked at IBM’s Watson Research Center, focusing on data analytics and machine learning. She holds a PhD in Information Systems from New York University (where she continues to teach at the Stern School of Business), and an MA in Computer Science from the University of Colorado.