Google invests in many different sectors and projects, es­pe­cial­ly when it comes to future tech­nolo­gies. At the moment, the internet company already has one heavy iron in the fire in the area of ar­ti­fi­cial in­tel­li­gence (AI) with the DeepMind project. The idea is to use AI programs and develop them further until they are able to solve complex problems without any human influence. Re­in­force­ment machine learning is an essential component to the continued de­vel­op­ment of AI.

What is re­in­force­ment learning?

The term “re­in­force­ment learning” describes a method in the area of machine learning. Alongside su­per­vised learning and un­su­per­vised learning, re­in­force­ment learning is the third option for teaching al­go­rithms in such a way that they are able to make decisions on their own. The focus here is on the de­vel­op­ment of in­tel­li­gent solutions for complex man­age­ment problems.

However, in contrast to su­per­vised and un­su­per­vised learning, this machine learning option does not require any data for con­di­tion­ing. With the first two methods, programs are fed data first. This step is com­plete­ly omitted in re­in­force­ment learning. Instead, the data is generated in a trial-and-error process during the training and si­mul­ta­ne­ous­ly assigned a label. As such, the program is subjected to a large number of test runs in a sim­u­la­tion en­vi­ron­ment in order to provide a suf­fi­cient­ly accurate result. So, instead of con­fronting the system with the correct results during training (as is the case with su­per­vised learning), the system is only supported through stimuli (i.e. rewards and penalties).

The desired result of this training is that the ar­ti­fi­cial in­tel­li­gence is able to solve very complex man­age­ment problems on its own without any prior knowledge provided by humans. Compared to con­ven­tion­al en­gi­neer­ing, this is faster, more efficient, and provides the best possible result.

Research into re­in­force­ment learning is often conducted through games. Video games provide the perfect basis for re­search­ing and un­der­stand­ing re­in­force­ment learning, because they generally include a pre­de­fined sim­u­la­tion en­vi­ron­ment, various man­age­ment pos­si­bil­i­ties, and an in­ter­ac­tive en­vi­ron­ment. In addition, most games present complex problems or tasks to be completed within various periods of play. Most games also include a sup­ple­men­tary point system which is similar to the reward system used in re­in­force­ment learning.

Leading experts in the area of ar­ti­fi­cial in­tel­li­gence consider re­in­force­ment learning to be a very promising method for achieving  ar­ti­fi­cial general in­tel­li­gence. This would make it possible for a machine to make in­her­ent­ly rational decisions, just like a person, and to execute suc­cess­ful­ly any number of tasks. The machine observes and learns and, in this way, is able to solve problems in­de­pen­dent­ly.

Fact

To summarize, re­in­force­ment learning is a method by which a machine learns through in­ter­ac­tions with its en­vi­ron­ment and then uses what it has learned to solve complex problems without the need for any manual input from humans.

How does re­in­force­ment learning work?

Re­in­force­ment learning describes numerous in­di­vid­ual methods through which an algorithm or software agent learns strate­gies au­tonomous­ly. The goal is to maximize rewards within a sim­u­la­tion en­vi­ron­ment. Within that sim­u­la­tion en­vi­ron­ment, the computer executes an action and sub­se­quent­ly receives the relevant feedback. The software agent does not receive any prior in­for­ma­tion as to which action is the most promising and has to determine the approach to take on its own through a process of trial and error.

Instead, at various points, the computer receives rewards that have an effect on its strategy. Through these events, the software agent learns how to evaluate the con­se­quences of certain actions within the sim­u­la­tion en­vi­ron­ment. This system creates the basis for the software agent to develop long-term strate­gies and maximize its rewards.

In order to train a re­in­force­ment learning system properly, Q-learning is used. It is named after the Q-function which cal­cu­lates the expected reward of an action in a given state. The goal of re­in­force­ment learning is to create the most optimal policy possible. The term “policy” here is meant to describe the learned behavior of a software agent that tells it which action should be performed in any behavior variant (ob­ser­va­tion) within the learning en­vi­ron­ment.

The policy is rep­re­sent­ed in a Q-table in which the rows contain all possible ob­ser­va­tions and the columns all possible actions. The cor­re­spond­ing cells are then filled in with values during the training which indicate the expected future reward.

However, Q-tables have their lim­i­ta­tions. The visual rep­re­sen­ta­tion only functions properly in a small action-ob­ser­va­tion space. If there is a large number of pos­si­bil­i­ties, the software agent has to make use of a neural network.

Where and when is re­in­force­ment learning used?

Google is among the companies already using the machine learning method. The company uses re­in­force­ment learning to control the air con­di­tion­ing in its data centers. Using AI tech­nolo­gies, Google has been able to reduce the amount of energy required to cool its servers by 40%.

Re­in­force­ment learning is also used to manage complex systems, such as smart traffic systems, in order to deliver in­tel­li­gent solutions for quality control. In addition, re­in­force­ment learning is also used in smart power grids, to control robots, to optimize supply chains for various logistics companies, and in factory au­toma­tion.

For consumers, the most concrete examples of re­in­force­ment learning are parking as­sis­tants, which utilize AI to recognize objects and then display the optimal parking path to a user.

Before a new re­in­force­ment learning algorithm can work properly, it has to go through numerous test runs, since rewards are sometimes found slowly. However, re­in­force­ment learning is a machine learning method that will control many processes and solve complex problems in the future.

Go to Main Menu