Long short-term memory (LSTM) is a way of training neural networks and storing important in­for­ma­tion for the long term. The tech­nol­o­gy uses short-term and long-term memory for this purpose and is of crucial im­por­tance for the further de­vel­op­ment of ar­ti­fi­cial in­tel­li­gence.

What is long short-term memory (LSTM)?

Long short-term memory (LSTM) is a computer science technique that’s used to store in­for­ma­tion within a neural network over a longer period of time. This is par­tic­u­lar­ly important in the pro­cess­ing of se­quen­tial data. Long short-term memory allows the network to access previous events and take them into account for new cal­cu­la­tions. This dis­tin­guish­es it in par­tic­u­lar from Recurrent Neural Networks (RNN) or can ideally com­ple­ment them. Instead of a simple “short-term memory”, the LSTM has an ad­di­tion­al “long-term memory” in which selected in­for­ma­tion is stored over a longer period of time.

Networks with long short-term memory can therefore store in­for­ma­tion over long periods of time and thus recognize long-term de­pen­den­cies. This is par­tic­u­lar­ly important in the field of deep learning and AI. The basis for this is the gates. We’ll explain how these work in more detail later in this article. The networks provide efficient models for pre­dic­tion and pro­cess­ing based on time series data.

AI Tools at IONOS
Empower your digital journey with AI
  • Get online faster with AI tools
  • Fast-track growth with AI marketing
  • Save time, maximize results

Which elements make up an LSTM cell?

A cell that has a long short-term memory consists of different building blocks that offer the network various options. It must be able to store in­for­ma­tion over a long period of time and link it to new in­for­ma­tion as required. At the same time, it’s important that the cell in­de­pen­dent­ly deletes unim­por­tant or outdated knowledge from its “memory”. For this reason, it consists of four different com­po­nents:

  • Input gate: The input gate decides which new in­for­ma­tion is to be added to the memory and how.
  • Forget gate: The forget gate de­ter­mines which in­for­ma­tion should be stored in a cell and which should be removed.
  • Output gate: The output gate de­ter­mines how values are output from a cell. The decision is based on the current status and the re­spec­tive input in­for­ma­tion.

The fourth component is the cell interior. This is subject to its own linking logic, which regulates how the other com­po­nents interact and how in­for­ma­tion flows and storage processes should be handled.

How does long short-term memory work?

Similar to the afore­men­tioned Recurrent Neural Network or the simpler Feed­for­ward Neural Network (FNN), cells with long short-term memory also act in layers. Unlike other networks, however, they store in­for­ma­tion over long periods of time and can sub­se­quent­ly process or retrieve it. To do this, each LSTM cell uses the three gates mentioned above as well as a type of short-term memory and long-term memory.

  • Short-term memory, i.e., the memory in which in­for­ma­tion from previous cal­cu­la­tion steps is stored for a short time, is also known from other networks. In the case of long short-term memory, it’s called hidden state. Unlike other networks, however, an LSTM cell can also retain in­for­ma­tion in the long term. This in­for­ma­tion is stored in the so-called cell state. New in­for­ma­tion now passes through the three gates.
  • In the input gate, the current input is mul­ti­plied by the hidden state and the weighting of the last run. This is how the input gate decides how valuable the new input is. Important in­for­ma­tion is then added to the previous cell state to create the new cell state.
  • The forget gate is used to decide which in­for­ma­tion should continue to be used and which should be removed. The last hidden state and the current input are taken into account. This decision is made using a sigmoid function (gooseneck function), which outputs values between 0 and 1. 0 means that previous in­for­ma­tion is forgotten, while 1 means that the previous in­for­ma­tion is retained as the current status. The result is mul­ti­plied by the current cell state. Values with 0 are therefore dropped.
  • The final output is then cal­cu­lat­ed in the output gate. The hidden state and the sigmoid function are used for this. The cell state is then activated with a tanh function (hy­per­bol­ic tangent) and mul­ti­plied to determine which in­for­ma­tion should pass through the output gate.

What different ar­chi­tec­tures are there?

While this mode of operation is similar for all networks with long short-term memory, there are sometimes serious dif­fer­ences in the ar­chi­tec­ture of LSTM variants. For example, peephole LSTMs are widely used, which owe their name to the fact that the in­di­vid­ual gates can see the status of the re­spec­tive cell. An al­ter­na­tive is peephole con­vo­lu­tion­al LSTMs, which use discrete con­vo­lu­tion in addition to matrix mul­ti­pli­ca­tion to calculate the activity of a neuron.

What are the most important areas of ap­pli­ca­tion for long short-term memory?

Countless ap­pli­ca­tions now rely entirely or partially on neural networks with long short-term memory. The areas of ap­pli­ca­tion are very diverse. The tech­nol­o­gy makes a valuable con­tri­bu­tion in the following areas:

  • Automated text gen­er­a­tion
  • Analysis of time series data
  • Voice recog­ni­tion
  • Fore­cast­ing stock market de­vel­op­ments
  • Com­po­si­tion

Long short-term memory is also used to identify anomalies, for example in the event of attempted fraud or attacks on networks. Cor­re­spond­ing ap­pli­ca­tions can also recommend media such as films, series, bands or books based on user data or analyze videos, images or songs. This is a simple way of not only in­creas­ing security, but also sig­nif­i­cant­ly reducing costs.

Numerous large cor­po­ra­tions use long short-term memory for their services and products. Google uses cor­re­spond­ing networks for its smart as­sis­tance systems, the trans­la­tion program Google Translate, the gaming software AlphaGo and speech recog­ni­tion in smart­phones. The two voice-con­trolled as­sis­tants Siri (Apple) and Alexa (Amazon) are also based on long short-term memory, as is Apple’s keyboard com­ple­tion.

Go to Main Menu