To Explore or Exploit – That Is the Question
The exploration-exploitation trade-off is both a cognitive and algorithmic problem. Sometimes, computers have it easier.
Whether you like it or not, the holiday season is always a time of reflection. Looking back on the passing year we wonder, have we learned new things? Are we any closer to our goals than we were a year ago? Has this been a year of execution or incubation? Affected by a mathematical dilemma that had us thinking hard this year, my own new year reflections surrounded the exploration-exploitation trade-off.
The exploration-exploitation dichotomy captures the debate we all have when faced with the question of whether to make a certain decision (or to act) now, based on the partial knowledge we currently possess (“exploitation”), or rather wait and invest more time and effort in accumulating further information, with the hope that a broader perspective will lead to a better decision in the future (“exploration”). In machine learning, this trade-off applies to systems that want to maximize their reward but cannot do so until they undergo a process of experimentation to learn how to accomplish that maximization.
Working with our clients in a world of increasing personalization, we often see them faced with these two contradictory needs of maximizing their revenues on the one hand, and learning what makes their customers tick on the other. The former constitutes the bottom-line objective, yet it cannot be fully unleashed without the latter. This had our brains going for quite a while, eventually culminating in the inner-workings of Optibot, Optimove’s marketing optimization bot that debuted recently. But as this is both a human and a computer issue, it made me very much aware of the role this trade-off plays in my own life.
Fall Back or Venture into the Unknown?
People use the exploration-exploitation trade-off all the time, even when we’re not aware of it. When faced with a decision, in order to maximize our reward we sometimes fall back on our acquired knowledge, exploiting our experience. Other times we may venture into the unknown, choosing to explore unfamiliar domains.
At a restaurant, do you order the dish you know and love, or go for something new? At work, do you go through the same routines or do you try to find new solutions to your daily challenges? In fact, do you even stay at that job or step outside your comfort zone and explore a new vocation? Do you try extremely different things to explore what the optimal way is to achieve your goals, or do you exploit your current situation and knowledge to make the best out of it?
The equilibrium of this trade-off changes from person to person. The exploitation strategy reflects the character of the decision maker, who tends to make decisions quickly based on whatever she knows at the moment, while the exploration strategy reflects the attitude of the scientist or the mathematician, who strives to discover a globally optimal decision, even if it takes her forever.
Our personal inclination towards exploration or exploitation also changes as life progresses. Consider a baby – her path is that of total exploration, simply because she still has no knowledge, or information, to exploit. She will put anything in her mouth, even if it’s gruesome. She will reach out her arm to anything – even if it’s a scalding hot oven. With time, knowledge builds up. A school-age child won’t touch the stove top or put a shoe in his mouth, because he can already exploit his hard-earned knowledge.
For the first part of our life, exploration usually reigns supreme. That’s the reason teenagers engage in so much risky behavior and college students are open to new experiences. The older we get, however, the more likely we are to settle down, having found a set of habits and heuristics that seem to work best for us. At that point we move mainly to an exploitation strategy, as we feel that the effort and uncertainty involved in further searching is no longer worth it.
Using a Cost-Benefit Analysis to Understand Which is Better
When facing any task that aims at maximizing some quantity, neither of the two strategies is uniformly superior to the other, and the debate is compounded by an inherent cost-benefit trade-off: exploration is likely to help us identify the optimal decision, yet this may be true only in the long run, after gathering sufficient amounts of data. At the same time, delaying a decision has its costs (sometimes referred to as “opportunity cost” in economics, for the opportunity missed while waiting), and moreover – there’s no guarantee that exploring just slightly more will entail a clearly better decision.
Young people who cannot seem to decide which career path to pursue and end up procrastinating for many years manifest a form of prolonged exploration, and risk wasting their professional lives. Exploitation, on the other hand, means we act based on the knowledge we currently have. This has its own drawbacks, since if our familiarity with all possibilities is lacking then the chances of making a sub-optimal decision, or even a starkly erroneous one, may be substantial. People who lack the patience to consider alternatives and thereby act impulsively regarding life decisions may be said to manifest strong exploitation behavior.
The answer to this question has a lot to do with our goal, and the trick is to strike a satisfactory balance between the two strategies. Algorithms dealing with this trade-off have a set goal, for example: increase revenues. But which goal are we trying to maximize in life? What’s the right metric to optimize for? Our health or safety? How much money we make? The amount of fun we’re having?
Computers have it easy – they have a pre-determined goal and a trade-off strategy. For us humans, defining our goals and becoming aware of the trade-offs we’re making in order to achieve them is a life-long journey.