The disruptive effect of AI-enhanced media manipulation

Artificial intelligence techniques such as convolutional neural networks (CNNs) and their variant generative adversarial networks (GANs) can be used to create extremely convincing fake images, voices and videos. The technologies are being developed within research institutions, enterprises and the open source community and being deployed in multiple applications – for good (such as Hollywood fantasy blockbusters) and ill (such as fake celebrity videos). 

Such are the capabilities of AI to help improve the traditional ways of creating manipulated media that there is the potential to disrupt sectors of commerce – as well as presenting a challenge to news organisations and publishers (see A new way of thinking about media forensics). Two industries that might feel the impact are cyber security and entertainment.

Can biometric authentication be made secure?

Biometric authentication making use of images of faces, fingerprints, and iris patterns, speech patterns and videos, is an important part of the cyber security landscape, and has become particularly significant in the financial services sector where it is often used to replace or complement passwords. The benefits are clear. You always have your credentials with you, you don’t need to remember anything, and many biometric indicators are – if not unique – then highly individual, providing for reduced-friction access to secure services. But the drawbacks are well-known too – for instance, it’s not easy to change your biometric indicators if their security is somehow compromised.

With AI now able to create realistic, new images of celebrities after learning about the features of those celebrities from publicly available pictures, or to create a voice clone after listening to a few minutes of recordings, or to “face swap” one person on to another in a realistic video, how do biometric authentication specialists respond?

AimBrain – a UK biometric security specialist – has identified “liveness” as one of the keys to its server-based service used by a number of finance and other enterprises. Andrius Sutas, CEO, told me “Our LipSync feature provides stronger authentication because matching voice and video accurately is computationally extremely difficult to do, especially if you present randomized challenges for the person to respond to, so any attempt to fool the system has to be done on the fly.” AimBrain also uses combinations of multiple techniques that can be combined in a modular – and intelligent way – to increase security in specific contexts. As a research-intensive developer of AI, the company understands the improving quality of image and video fakes – and the arms race between fakers and fake-detectors – but Sutas points out that even with GANs, the “discriminator” intelligence (the AI that determines whether an image is real or not) always outperforms the “generator” (the AI that creates the new image).

Bringing dead stars back to life

Would you spend your cash on supporting new talent, or would you prefer to go to a gig featuring Freddie Mercury? Or attend an audience with Carrie Fisher?  Digital Domain, the Hollywood special effects company responsible for turning the actor Josh Brolin into Thanos in Avengers: Infinity War, has also worked for many years on systems that can capture motion and expression, and render 3D digital humans in real time at 60 frames per second. It famously brought virtual versions of rapper Tupac and singer Teresa Teng for live performances in front of an audience. Compared with this capability, Deepfakes’ two-dimensional face-swapping videos look quite straightforward.

Darren Hendler, head of Digital Domain's Digital Humans Group, points out that “a 2D face swap is very different from a full 3D head that can be rendered from every angle with the right lighting.” His company’s work – which involves use of self-developed algorithms for deep learning, and convolutional neural networks –  currently focuses on the processes of capture and rendering (the virtual representation mimics an actor), rather than using semantic inputs to control the movements and expressions of its digital humans. However, Digital Domain and other companies – for instance those involved in computer gaming – are working on AI for the semantic control of avatars and digital humans – and the technologies could come together to enable a realistic virtual representation of any character, controlled by a human at a keyboard, and using AI to behave realistically.

Hendler says that today’s technology is sophisticated enough to fool a biometrics-based authentication system even when presented with real-time challenges: “The state of the art involves sub-dermal blood flow monitoring and extreme fidelity of skin tone and facial movement, and in the movie world, rendering takes 5 or 6 hours per frame. But what we can do in 20ms for real-time systems is improving all the time as the speed of GPUs (graphics processing units) is tripling every year.” Hendler says the processing power of the rendering hardware is still the main constraint, but adds that in a year or two, the technology to produce extremely realistic fakes will be accessible to anyone.

If fakes are becoming harder to detect – especially as GANs allow completely new images and videos and voices to be created that don’t look like media already in the public domain – then some industries such as authentication and the entertainment and publishing industries will have to consider how they need to evolve, both in technical and business model terms. There is likely to be some disruption in the next 2–5 years.

 

[Images licensed to Ingram Image.]

Add this: