Here I open the discussion about key challenge of applying ML methods to social and economic problems. It’s quality and amount of data. There is surprisingly little data in economic research and more often than not when dealing with macro economic problems economists rely on various often questionable assumptions and forcing variables that allow them to fit ever more complex models to handful of observation in hand, which is clearly dead end. Moreover, the existing system of economic statistics, commissioned around WWII is based on many untested preconceptions and arbitrary choices (take for example GDP), it is fraught with methodology inconsistencies across time and geographies and prone to regular drastic revisions. For example, recently, when selecting data for one rather advanced deep model among thousands relevant economic timeseries we found just few that satisfied our strict criteria of consistency.

It’s painful to accept but all those problems make existing economic statistics virtually unsuitable for advanced predictive modeling. Sparsity, inconsistency, untested preconception does not allow confident use of economic statistics either as feature variables x or target variables y. In addition there are serious doubts that economic statistics in its historic form measures right things in the modern world (should we measure welfare or happiness instead of GDP?, what is the meaning of unemployment in the age of robots? etc etc) These things make me suspect that trying to fit serious models to the legacy mid XX century statistics is a bit like charging Tesla by burning fossil fuel. Illogical but this is what it is.

Should we replace statistics with big data entirely? Probably not, because we will need old data for policy continuation, at least initially. So some sort of combination is probably the answer. In any case, more broad and systematic approach to gathering and processing data in modern age is clearly needed.

Good article on that is below:

Leave a Reply

Your email address will not be published. Required fields are marked *