WHY DATA-ECONOMIST?

Why do we need a special focus on applying novel data-analysis technics to economics? Because it’s a problem looking for solution from very start. There are several people in this group related to data science and ML in this group, including my friend Dhruv Agrawal who with his fellows cofounded “Shipsy”, a company that provides ML solutions in logistics to clients like FedEx and DHL, including routing, fraud-detection and data capturing and preprocessing. And neither of those people dealt with problems that can be broadly classified as “economic”. On other hand, we, economics folks have still little appreciation to opportunities provided by ML, which made giant progress over past 10 years. As my co-pilot Петр Гринцер can attest, econometrics remains a working horse of any economic analysis even though the task involves making predictions rather than estimating sensitivities.

I really have an impression that those two worlds are so far apart that they are barely aware of existence of each other. Moreover, even the world of quant finance with abundance of data and claw and tooth competition between funds mostly relies on more traditional methods based on hand-crafted features rather than ML and as far as I’m aware ML initiatives don’t produce immediate superior results. The case with quant trading probably to certain degree can be explained by the level of competition in the industry when all easy bread has been already exploited away and it’s very difficult to find new opportunities whatever methods you use. But there is clearly something else that prevents those worlds from coming together.

I can see number of reasons both related to objective situation in the economic science and educational/psychological ones.

It’s easy to start with the later. First of all, it’s a sort of arrogance on both sides. ML folks have very little appreciation to industry and data specifics. It’s very common to hear from folks around Kaggle something like that “I don’t really care what data is I can deal with anything”. Or often ML people who want to include some macro variables into model just come and ask: “give me some economic data!” – “What data?” – “Ah, anything! Model will sort it out”. For many reasons, in economics it simply doesn’t work like that. You need to be an industry expert here, you really need to know your data and its limitations to get something meaningful. Later I will explain why with examples. On the other hand economists despise so called “black box” methods. And here they are also wrong, the fact that your model choses relevant features for you doesn’t mean it’s a black box, not at least you are using some library having no clues what’s under the hood. The data will always speak to you if you know what to ask and what you are doing.

Another reason is purely educational. Modern cohort of professional economists are simply not trained as ML specialists while ML guys are mostly computer science people, there is very little intersection.

Next set of factors are more or less objective and the biggest of them is data quality and data frequency problem. We need a complete revision of our approach to collecting and processing economic statistics so it satisfies modern requirements. This is so huge that I won’t stop here now but this is something we will come here to discuss times and again.

Next comes quality evaluation problem. It’s not an accident that the biggest breakthrough in ML is related to image classification, text processing and speech recognition. All those are areas where human performance is a clear and very strong benchmark and you can tell whether the model is bad or good by simply looking at its results. Next those novel methods are applied to similar tasks. In reality, predicting cancer on X-ray frame is not so different from recognizing cat on the picture, or the task of telling what a person is seeing based on his brain activity is pretty similar to style transfer or semantic image inpainting. All that is a low hanging fruit. In the meantime the authors of AI papers often acknowledge that their models often produce visually better results than some other inferior models, which nevertheless have better objective quality metrics. The absence of good quality metrics is a reasons why AI methods often stall in areas where good old human judgement cannot be applied. Economics is one of those areas. You can’t tell the quality of inflation forecast and have a quick idea of what to fix in the model by simply looking at its output matrix.

Another reason why ML has so little advance in economics is that a lot of what economists have to deal with are so called “one-shot problems”. It’s not chess or go or Atari game, central banks can’t replay policy meetings again and again to figure out the best strategy. Every situation is unique and one off and there is no clear metrics of success. Normally, if you face a one-shot problem or insufficient data in ML you try transfer learning. In economics you have nothing to transfer from.

This is brief set of reasons why ML and AI has hardly advanced in economics beyond sentiments analysis based on flow of twits or estimating house price based on set of features. That said, neither of those problems above are insurmountable with enough focus applied. It’s my belief that ML can enrich economic science if we focus on overcoming difficulties hard enough.

Leave a Reply

Your email address will not be published. Required fields are marked *