You ain’t getting a free lunch!
We sometimes hear that in life there is no such as thing as a “free lunch”. Usually what people mean by this is that whenever you are given something that seems like a one-way gain for yourself, actually it is not. The morale goes that in fact, more often than not, you end up paying more for it but in other (often subtler) ways.
In Computer Science
David Wolpert and William G. Macready introduced the expression in Computer Science in connection with Machine Learning and Search Optimisation between 1996 and 1997. Loosely speaking their research showed that with problems of search and optimisation, such as finding a solution by means of a Machine Learning method, no algorithm works equally well for all problems.
In the video below, filmed at the 2016 World Science Festivaland, Cognitive psychologist Gary Marcus discusses the “No Free Lunch Theorem” and summarises why it indicates that there’s no such thing as an unbiased algorithm.
This general consideration about biased algorithms can be particularly relevant in the light of recent findings by Princeton University that found machine learning “copying” human prejudices like sexism or racism when learning natural language. As Wired reported in a recent article, researchers found that whenever we build an intelligent system that learns enough about the properties of language to be able to understand and produce it, in the process it will also acquire historic cultural associations, some of which can be objectionable.
And if we built a “better” algorithm?
Prof. Pablo Domingo is a computer scientist and the author of The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. It is his view that one day, by developing and combining the algorithms and the techniques learned from different areas of Machine Learning,we will eventually arrive to an ultimate (or Master) Algorithm: an algorithm able to compute quick and effective solutions for all the problems we can think of.
Prof. Domingo classifies automated search algorithms or Machine Learning into five areas: the connectionists branch, the evolutionary branch, the Bayesian branch, the symbolist branch and the analogist branch. The connectionist branch of would be the branch that started this race for the ultimate algorithm in the first place. This school of thought takes its name from the belief that all our knowledge is encoded in the connections between neurons. Its founder, Geoff Hinton, discovered backpropagation and kickstarted the belief that by creating a complex and optimised enough neural network program we would eventually obtain an ultimate general purpose algorithm for finding solutions in any search space.
Unfortunately, neural network developers have yet to validate this claim. Thus, other computer scientists started to explore the possibility that by simulating evolution, rather than neural connections, we could obtain this single learning program capable of outperforming all other search optimisation algorithms. Under the name of Genetic Algorithms, these programs have been particularly good in certain areas such as specific parts of robotics, but again they have been unable to come up with an ultimate Master Algorithm.
The story pretty much repeats itself for the other fields of Machine Learning that don’t directly look at nature for inspiration. Whether an algorithm was derived by efficiently implementing Bayes’s theorem, a mathematical rule for updating our degree of belief in a hypothesis when we see new evidence, or by simulating deductive “symbolic” reasoning to validate and discover rules, or by finding out solutions based on analogy with previous cases, we have so far only been able to discover algorithms that are good for certain tasks but perform poorly in others.
Whilst scientists such as Domingo think that this inconsistency of performance is only due to the relatively short lifespan of Machine Learning as a field of research, others such as Wolpert and Macready claim that there is a much more fundamental reason. According to the research that brough to their No Free Lunch Theorems (NFL), the connection between effective optimization algorithms and the problems they are solving involves certain statistical considerations that bar them from being good at every possible problem solving search. In other words, if you find an algorithms that is great at finding certain solutions, you can be sure that it will be terrible at finding others.
Statistically speaking, this means that if we consider all possible problem instances of a certain problem, all the fancy optimizer methods used in Machine Learning to “improve” their own capacity to guess “the right answer” are just as good as any other optimizer that consider the same amount of candidate solutions, even Random Walk for example. Now, this does not necessarily mean that guessing random solutions is always as good as trying to discern patterns in data and then optimising the guessing function so as to be biased towards “right” solutions. The general purpose optimisers in fact will probably work well on many popular/real life problem instances, but these do not represent well the entire space of possible problem instances. Therefore, the very strength of learning in self-optimising algorithms is what brings about the drawbacks of being biased only towards certain specific applications.
Mathematically as well as intuitively speaking, here is a short video that shows us some math behind the NFL theorems and then goes on to compare search algorithms to small carpets in a huge room. The example of the carpets is to illustrate how, individually, the carpets are unable to cover the whole floor, which represents all possible solutions to a certain class of problems:
Reflections on having No Free Lunch and Natural Computing
What does it all mean in practice? Well, it pretty much means that if we don’t have any previous knowledge about the problem that we want to solve, we might as well try random guesses rather than using a search algorithm. Previous knowledge here is the key because if we know from experience about the characteristics of a given problem that we are interested in, we can then choose a priori to give a certain type of heuristic bias to our search method. This, as it’s been shown by the achievements of machine learning, can actually help us solve problems in the real world. However, it also means that our programs will always have biases (just as we humans are biased) and will never be able to implement a “supreme” algorithm that can feasibly crack all types of problems.
YouTuber Osiris Salazar some time ago published a video about applying the right heuristics (or bias) to solve a certain type of problems and his reference to the No Free Lunch theorems might help explain this last point better:
Based on what we have seen, we could say that every solution-search method such as a machine learning model for a specific dataset/problem is in some ways a highly specialized heuristic method. Any method based on learning would in fact progressively introduce a certain bias on its initially random search for solutions, until it optimises to solve certain problems “well enough” within a reasonable timeframe. However, the same solution cannot be applied everywhere and specialising in performing well under certain circumstances means taking a loss elsewhere. It also means that we still have no underlying mathematical model for solving in general the whole class of problems. As a matter of fact, any optimising (or learning) algorithm cannot escape a principle that was known even to Hume back in the XVI century:
“even after the observation of the frequent or constant conjunction of objects, we have no reason to draw any inference concerning any object beyond those of which we have had experience”
In other words we can’t be sure that what has happened in the past will happen again in the future just because we have observed a certain repetition of causes and effects in previous examples, at least not deterministically. Of course, in real life problems we don’t need mathematical proofs to perform well as human beings, and in general nature has been finding working and viable solutions to serious and complex problems out there for thousands, no, MILLIONS of years! So is there a lesson for us in this apparent contradiction between the limitations to search optimisation methods and the success of heuristics in the natural world of people and nature?
As Prof. Mohammad Majid al-Rifaie, lecturer at Goldsmiths University of London, stated in the first lecture of his Natural Computing course this year, problems that computers can solve must (1) be well-defined, (2) have reasonable computation time (with serial computers), and (3) be about somehow predictable phenomena. However, even leaving aside the metaphysical dilemmas that we can’t even appropriately define, for problems that we assume to be well-defined by our models too, we often can’t find a mathematical formula to solve them.
Computationally hard problems that are easily verifiable (NP) but do not seem to have a direct algorithmic solution include solving a Sudoku, the Travelling Salesman Problem, cracking the SHA-256 Cryptographic Hash Algorithm, certain games such as Chess or Go, etc. These problems have all solutions that we can quickly verify to work once they are found, but we don’t have (and we don’t know if we ever will have) a formula that can directly compute this solution given the initial state of the system.
There are then problems such as real-world movements of autonomous robots or business planning and management or reading and predicting market movements, which are saturated with unpredictable “unforeseen circumstances”. For this type of problems, every time we find a mathematical model that works in theory, noise in the data and unaccounted variables make sure that the model fail when trying to find solutions in real-life applications.
What we can see though, in the world around us, is that bees have in practice gotten around the travelling salesman problem by learning to fly the shortest route between flowers discovered in random order. We can also witness that humans often beat supercomputers at the game of chess, or that experienced stock traders know how to solve market prediction problems. Somehow.
Furthemrmore, research has shown that in many cases it is a more or less instinctual application of some heuristics that makes natural collections of cells (beings) or natural collections of beings (groups of beings) behave in a way that in practice solves problems. It may not give the individuals or the group of beings the mathematical key to solve all problems of a same type, but it enables them to approximately work out something that works for the actual cases they probably end up encountering in the real world.
To try to achieve a similar result with computers many scientists have now given up the task of disproving the No Free Lunch theorem with a single Master Algorithm. What these researchers are interested instead, is to find and work with an array of algorithms inspired by nature that can help them in specific (rather than general) real case scenarios if applied appropriately. This field has now come to be known as Natural Computing.
Quoting from Wikipedia’s definition:
Natural computing, also called natural computation, is a terminology introduced to encompass three classes of methods: 1) those that take inspiration from nature for the development of novel problem-solving techniques; 2) those that are based on the use of computers to synthesize natural phenomena; and 3) those that employ natural materials (e.g., molecules) to compute. The main fields of research that compose these three branches are artificial neural networks, evolutionary algorithms, swarm intelligence, artificial immune systems, fractal geometry, artificial life, DNA computing, and quantum computing, among others.
Computational paradigms studied by natural computing are abstracted from natural phenomena as diverse as self-replication, the functioning of the brain, Darwinian evolution, group behavior, the immune system, the defining properties of life forms, cell membranes, and morphogenesis. Besides traditional electronic hardware, these computational paradigms can be implemented on alternative physical media such as biomolecules (DNA, RNA), or trapped-ion quantum computing devices.
In the natural world life is not itself interested in finding the optimal solution to all problems. Life is usually content of finding practical systems of achieving what needs to be achieved to survive and thrive. This seemingly humble principle with its surprisingly random inception back when proteins formed and the first reproducible cells started to spread, has allowed nature to grow in strength and complexity all the way to the self-conscious humans. The way I visualise why this is possible and how it works, is by comparing Usain Bolt with a PhD in muscle science. Bolt might not know all the physiological theoretical details of what makes the muscle perform best. But he surely knows how to make it perform best. Somehow.
Of course, for the very same reason that Natural Computing normally accepts to work under the constraint of a-priori specialisation and of “good enough” heuristic methods, its algorithms may never become an ultimate general-purpose A.I. capable of solving the very mystery of why life and the universe came about. But after all… We might never really solve that one because… perhaps it is the universe that free lunch we were looking for!
Notes & References
The following references are listed in the order they appear along the blog post
- Various Authors, (last edited July 2017) No free lunch in search and optimization, Wikipedia, retrieved 4 October 2017
- Burgess, M (2017) Just like humans, artificial intelligence can be sexist and racist, Wired, article consulted online on 4 October 20017
- Domingos, P (2016) The race for the master algorithm has begun, Weird – magazine issue January 2016
- Wolpert, D. H. (1997) The Cost of Dinner, NASA Ames Research Center, retrieved online on 4 October 2017
- Sewell, M. (web edit.) No Free Lunch Theorems, http://www.no-free-lunch.org/, retrieved 4 October 2017
- Wolpert and Macready (1997) No Free Lunch Theorems for Optimization, IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 1, NO. 1, APRIL 1997, Retrieved online
- Majid al-Rifaie, M. (2017) Natural Computing Lecture 1, Goldsmiths University of London – Student VLE (not normally available to the general public), lecture attended 2 October 2017
- Editorial (2010), Bees’ tiny brains beat computers, study finds, The Guardian – Sunday 24 October 2010
- Various Authors, (last edited September 2017) Natural Computing, Wikipedia, retrieved 4 October 2017
- Gary, Marcus (2017) Are Algorithms Biased?, World Scence Festival YouTube Channel, viewed 4 October 2017
- Hoang, L.N. (2016) The No-Free Lunch Theorem, Wandida.Com YouTube Channel, viewed 4 October 2017
- Salazar, Osiris (2012) 9 3 Heuristics 929, Osiris Salazar YouTube Channel, viewed 4 October 2017
- Kaku, M. (2013) Space Bubble Baths and the Free Universe, Big Think YouTube Channel, viewed 4 October 2017
- Powell (2002) Rock, Fish Cartoons ™, retrieved from http://christianfunnypictures.com/
- Raschka, S. (2015) “No Free Lunch”, Roughly An Introduction to Supervised Machine Learning and Pattern Classification: The Big Picture, Retrieved at https://www.slideshare.net on 4 October 2017