Reinforcement learning – when machines learn to think

Google invests in many different sectors and projects, especially when it comes to future technologies. At the moment, the internet company already has one heavy iron in the fire in the area of artificial intelligence (AI) with the DeepMind project. The idea is to use AI programs and develop them further until they are able to solve complex problems without any human influence. Reinforcement machine learning is an essential component to the continued development of AI.

What is reinforcement learning?

The term “reinforcement learning” describes a method in the area of machine learning. Alongside supervised learning and unsupervised learning, reinforcement learning is the third option for teaching algorithms in such a way that they are able to make decisions on their own. The focus here is on the development of intelligent solutions for complex management problems.

However, in contrast to supervised and unsupervised learning, this machine learning option does not require any data for conditioning. With the first two methods, programs are fed data first. This step is completely omitted in reinforcement learning. Instead, the data is generated in a trial-and-error process during the training and simultaneously assigned a label. As such, the program is subjected to a large number of test runs in a simulation environment in order to provide a sufficiently accurate result. So, instead of confronting the system with the correct results during training (as is the case with supervised learning), the system is only supported through stimuli (i.e. rewards and penalties).

The desired result of this training is that the artificial intelligence is able to solve very complex management problems on its own without any prior knowledge provided by humans. Compared to conventional engineering, this is faster, more efficient, and provides the best possible result.

Research into reinforcement learning is often conducted through games. Video games provide the perfect basis for researching and understanding reinforcement learning, because they generally include a predefined simulation environment, various management possibilities, and an interactive environment. In addition, most games present complex problems or tasks to be completed within various periods of play. Most games also include a supplementary point system which is similar to the reward system used in reinforcement learning.

Leading experts in the area of artificial intelligence consider reinforcement learning to be a very promising method for achieving  artificial general intelligence. This would make it possible for a machine to make inherently rational decisions, just like a person, and to execute successfully any number of tasks. The machine observes and learns and, in this way, is able to solve problems independently.


To summarize, reinforcement learning is a method by which a machine learns through interactions with its environment and then uses what it has learned to solve complex problems without the need for any manual input from humans.

How does reinforcement learning work?

Reinforcement learning describes numerous individual methods through which an algorithm or software agent learns strategies autonomously. The goal is to maximize rewards within a simulation environment. Within that simulation environment, the computer executes an action and subsequently receives the relevant feedback. The software agent does not receive any prior information as to which action is the most promising and has to determine the approach to take on its own through a process of trial and error.

Instead, at various points, the computer receives rewards that have an effect on its strategy. Through these events, the software agent learns how to evaluate the consequences of certain actions within the simulation environment. This system creates the basis for the software agent to develop long-term strategies and maximize its rewards.

In order to train a reinforcement learning system properly, Q-learning is used. It is named after the Q-function which calculates the expected reward of an action in a given state. The goal of reinforcement learning is to create the most optimal policy possible. The term “policy” here is meant to describe the learned behavior of a software agent that tells it which action should be performed in any behavior variant (observation) within the learning environment.

The policy is represented in a Q-table in which the rows contain all possible observations and the columns all possible actions. The corresponding cells are then filled in with values during the training which indicate the expected future reward.

However, Q-tables have their limitations. The visual representation only functions properly in a small action-observation space. If there is a large number of possibilities, the software agent has to make use of a neural network.

To display this video, third-party cookies are required. You can access and change your cookie settings here.

Where and when is reinforcement learning used?

Google is among the companies already using the machine learning method. The company uses reinforcement learning to control the air conditioning in its data centers. Using AI technologies, Google has been able to reduce the amount of energy required to cool its servers by 40%.

Reinforcement learning is also used to manage complex systems, such as smart traffic systems, in order to deliver intelligent solutions for quality control. In addition, reinforcement learning is also used in smart power grids, to control robots, to optimize supply chains for various logistics companies, and in factory automation.

For consumers, the most concrete examples of reinforcement learning are parking assistants, which utilize AI to recognize objects and then display the optimal parking path to a user.

Before a new reinforcement learning algorithm can work properly, it has to go through numerous test runs, since rewards are sometimes found slowly. However, reinforcement learning is a machine learning method that will control many processes and solve complex problems in the future.

We use cookies on our website to provide you with the best possible user experience. By continuing to use our website or services, you agree to their use. More Information.
Page top