Around 300 images with the chosen person’s face (at best from all possible perspectives) should be enough to get a decent result. The deepfakes code contains a neural network, a so-called autoencoder: the network is trained to compress data in order to decompress it again. During decompression, the autoencoder tries to achieve a result that is as close as possible to the original. To achieve this, the networks learns to distinguish between important and unimportant data during the compression process.
By feeding the algorithm with images of dogs, the artificial neural network learns to focus only on the dog and ignore anything in the background (noise). The autoencoder can then create its own dog from the data. This is also how face swaps work with deepfakes: the neural network learns what the person’s face looks like and can then create it independently – even if the face and mouth are moving at the same time, for example.
To effectively swap faces, two heads need to be recognized: the face that appears in the original material and the one that you want to exchange it with. So, one input (the encoder) and two outputs (the decoders) are used. The encoder analyzes all the material while the two decoders each generate a different output: face A or face B.
In the end, it works in such a way that the algorithm doesn’t insert face A into the video, but face B instead, even though it doesn’t belong there at all. This is the difference between other fakes where the face is cut out of an image, retouched, or adjusted, and inserted into another image. Regarding deepfakes, however, the image material isn’t copied into another image: totally new material is created. This is the only way to match the facial expressions of the original face.
This explains why some errors occur with deepfakes: the neural networks reach their limit when it comes to unusual facial movements. If there isn’t enough material from the relevant perspective, the frame will appear blurry. The algorithm tries to generate an image from the little source material it has, but will unfortunately leave it lacking in detail.