Introducing DALL-E 2: An AI-powered 'artist'

Have you ever wanted to paint a multi-coloured elephant shopping for milk in a supermarket?

Felt like creating an oil painting of teddy bears playing poker in the style of da Vinci?

Or always planned to Photoshop an astronaut on a horse in a photorealistic style, but never quite found the time?

DALL-E 2 is here to make that happen, turning simple text descriptions into mind-blowing AI art quicker than you can say ‘Terminator 3’.

Created by the AI research lab, OpenAI, DALL-E 2 is a new artificial intelligence application that can instantly create images in any artistic style or medium. This groundbreaking technology leaves you with realistic, original artwork that is seriously impressive.

And it's certainly making waves in the digital world. Just a month after DALL-E 2 was announced, Google Research unveiled its own image-making AI called Imagen.

With results that are reportedly even better than DALL-E 2, Imagen is shining an even brighter spotlight on AI art.

But how does a text-to-image generator like DALL-E 2 work? What are the real-world use cases for AI Art? And is DALL-E 2 capable of replacing art with a human touch?

Creating AI art from text

In January 2021, OpenAI announced the launch of DALL-E, an AI system that can create realistic images and art from a simple text description. The name ‘DALL-E’ fittingly comes from a mash-up of Pixar’s robot, WALL-E and the 20th-century Surrealist artist, Salvador Dali.

Image created by DALLE-2 of Salvador Dali mixed with the WALL-E robot

Introducing DALL-E 2

Fast-forward 15 months from the release of DALL-E, and DALL-E 2 was already here. The new and improved version is capable of producing 4 x higher resolution images with breath-taking results.

Image of a fox created by DALL-E1 compared to an image of a fox created by DALL-E2

Unlike its predecessor, DALL-E 2 allows you to retouch and edit images. Want to move your pink flamingo from outside the pool to inside? No problem.

Two images of a blue pool with pink background, with a flamingo placed in different places

The AI system generates several options for you based on your original text description, giving you the power to choose the image that best suits your needs.

Image created by DALL-E 2 showing variations of 'The Girl with the Pearl Earring' painting

So, how does it work?

The underlying model of DALL-E 2 is based on two core technologies - CLIP and diffusion.

CLIP

DALL-E has learnt to link textual semantics with their visual representations. This is achieved through an OpenAI model called CLIP. (Or Contrastive Language-Image Pre-training).

CLIP has been trained on hundreds of millions of images on the internet and sorts them according to their labels.

These labels are often the alt text or meta-text that users have attached to images. But they can also be implied or unintentional labels, such as a group of wedding-themed images added to a wedding Pinterest board.

Over time, DALL-E has amassed a huge collection of specifically-labelled images. And it studies these images to learn about different colours, visual elements and artistic styles.

Through machine learning, CLIP can recognise how much an image relates to its given label. And it calls on this knowledge to link textual and visual representations of the same object.

Recognising the link between text and images is just the first step though. DALL-E still needs a way to generate images. That’s where the second building block comes in.

Diffusion

Diffusion models work by taking a piece of data, like a photo, and scrambling its pixels until it's no longer recognisable. The photo is turned into pure noise.

The model then works backwards, unscrambling the pixels step by step, until it reconstructs the original image (or something similar to it). This leaves you with a model that has learnt how to generate an image.

These two technologies work together to create images from text descriptions. OpenAI’s website describes the process as follows:

DALL-E 2 has learned the relationship between images and the text used to describe them. It uses a process called “diffusion”, which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognises specific aspects of that image.

The exact way it works is obviously much more complicated than described here. But it gives you a gist of the mechanics behind the solution.

Image created by DALL-E 2 of an anxious tiger at the dentist

Using DALL-E 2 in the real world

DALL-E 2 isn’t available for public use yet but, once it is, how could you use it in the real world?

Photo Editing

Editing your photos could become much quicker and easier by using DALL-E 2. It’ll allow you to cut down the time spent on editing tasks by seamlessly changing your imagery almost instantly. And it’ll enable non-technical users to get creative without the need for any specialist editing skills.

Stock Photography

From a designer’s perspective, DALL-E 2 could be a great alternative to stock photography. Instead of trawling through image libraries to find the right picture, you could just ask for what you need and voila! This would also be useful for bloggers or anyone with a website looking for a quick way to generate images to sit alongside their content. Cosmopolitan magazine recently took this a step further by creating the world’s first artificially intelligent magazine cover using the DALL-E 2 technology.

Recreating Photographs

Been on holiday but come home to realise that some of your favourite photos are duds? Why not use DALL-E 2 to recreate those special moments. After all, the memories are there, you just don’t have an accurate reflection of them to share yet. That’s exactly what this guy did, mixing in his real holiday snaps with DALL-E 2 ones in a Facebook album, and seeing if anyone spotted the difference.

Gaming

An exciting evolution of DALL-E 2 could lie in the gaming world. Creating virtual worlds is a laborious task - and we all know that time is money. DALL-E 2 could be used by developers to speed up the creation of these worlds. Games would get to market quicker and there’s potential to save a huge amount of money in development.

The Metaverse

Seeing as we’re talking about virtual worlds, it’d be remiss not to mention the Metaverse. In the broadest of terms, the Metaverse is an immersive 3D version of the internet. An advanced version of DALL-E 2 could be developed to help everyday users create their own space in the Metaverse.

Ethical challenges of text-to-image generators

As impressive as it is, DALL-E 2 doesn’t come without its flaws. And the same can be said for other text-to-image generators like Google’s Imagen.

The core limitation of technologies like DALL-E 2 is biased images.

Think about it. DALL-E has learned the link between images and their labels by scraping millions of images from the internet.

Negative stereotypes, social bias and racism have all been fed into the model. DALL-E’s outputs are limited by its inputs. And a poorly curated data set will clearly result in images that automate discrimination.

OpenAI is aware of the ethical issues with DALL-E 2. To combat this, they’ve put together a ‘red team’ - a group of external experts who look for limitations in DALL-E 2 before it's made publicly available.

Initial ‘red team’ findings, however, aren’t promising. Early tests have shown that DALL-E 2 leans toward generating images of white men, reinforces racial stereotypes and overly sexualizes images of women.

More needs to be done to tackle these biased outputs to ensure that images generated by DALL-E 2 do not have a negative societal impact.

Is art really art without a human touch?

Bias aside, there’s no doubt that the capabilities of DALL-E 2 are impressive.

Its ability to compose images in a way that makes sense to us makes it feel like there’s real imagination and thought behind the process.

But can we call it art?

The reality is that there is no emotion behind DALL-E 2’s images.

There’s no deep and meaningful story behind the mad scientist teddy bears below. Or at least, not one that can be understood by the machine that created it. (Unless you’re of the thought school that AI is becoming sentient.)

Image created by DALL-E 2 of mad teddy bear scientists doing chemistry

Final thoughts

DALL-E 2 is miles ahead of the original DALL-E application.

It could prove to be very useful for creating and editing imagery in a quick and simple way. And its potential to further evolve as a cost-saving tool in the gaming industry and the Metaverse makes it one to keep a close eye on.

But, whether DALL-E 2 can be called an artist is down to opinion, and depends on what the word ‘art’ means to you.

For me, art tells a story. It captures an experience. Offers a glimpse into the artist’s imagination. Or is fuelled by emotion.

Arguably, DALL-E 2 does that because the images created come from a human description. But, in my view, whilst this is an impressive tool, AI art can never replace the human artist.

And there’s certainly work still to be done to remove the harmful bias that it comes with.

___________________________________________________

What does the word ‘art’ mean to you? Do you think AI art counts as ‘real’ art? Share this article on your socials with your thoughts.