Posted: 30 Aug 2022 Contributor: Ghia Marnewick
DALL-E 2: A Revolutionary AI Model That Can Create Realistic Art
Developed by OpenAI, DALL-E is an artificial intelligence program designed to generate text description images. It was originally launched back in January 2021, but the second generation of the AI system, DALL-E 2, is now in the works. DALL-E 2 is not yet available to the public, but there are some impressive new upgrades including a 4x higher image resolution that allows it to create realistic photos.
How Does DALL E 2 Differ from DALL E?
DALL·E 2 is a new version of DALL·E, a generative language model that uses sentences and descriptions to produce corresponding original images. This digital human technology is changing the art scene in so many different ways.
In 3.5B parameters, DALL·E 2 is a large model, but not as large as GPT-3 and, interestingly, smaller than its predecessor (12B). Despite its size, DALL·E 2 produces images with 4x better resolution than DALL·E and is preferred by human judges +70% of the time, both consistent with names and photorealism.
As with DALL · E, OpenAI is not releasing DALL·E 2 (you can always join the infinite waiting list). However, they will open a source CLIP which, although not directly connected to DALL·E, becomes the basis of DALL·E 2. (CLIP is also the standard for apps and notebooks used by people who don't have access to DALL·E 2.) However, OpenAI CEO Sam Altman has said that it will eventually release models into DALL·E through its API—currently only a select few (they open the model to 1000 people a week).
What Can DALL-E 2 Do?
Simply out, DALL·E 2 takes written instruction to create an image or text. The more descriptive you are in your request, the better the result will be. Of course, it uses complex technologies that include machine learning, and artificial intelligence to get the job done, and given what it has already been able to do, we can only hold our breath in anticipation to see what happens next.
How To Use DALL-E 2
Here are four key high-level concepts to keep in mind when it comes to DALL·E 2:
- CLIP: A model that takes pairs of image names and creates "mental" representations in the form of vectors, called text/image embeddings (number 1, above).
- Previous model: Inserting a CLIP label/text and creating insert elements for CLIP images.
- Decoder Diffusion Model (unCLIP): Captures the embedded CLIP image and produces images.
- DALL·E 2: Combination of previous models + diffusion decoders (unCLIP).
DALL·E 2 is a specific example of a two-part model (number 1, below) made with the first and decoder. By combining these two models, we can go from a sentence to an image.
So, we deal with DALL·E 2. We enter the sentence into the "black box”, and it creates a well-defined picture. It is interesting to note that the decoder is called unCLIP because it is an iterative process of the original CLIP model - instead of creating a "mental" representation of the (embedded) image, it creates an original image of generic mental performance. A mental representation encodes the main components that are semantically meaningful: people, animals, objects, style, color, background, and so on.
The Limitations of DALL E 2
Let's take a quick look at where DALL·E 2 struggles, what tasks cannot be solved, and what problems, damages, and risks it presents.
Prejudice and stereotypes
DALL·E 2 is likely to describe people and environments as white/western unless the urge is specific. This also includes gender stereotypes (e.g., flight attendant = female, builder = male). This is called representational bias and occurs when models like DALL·E 2 or GPT-3 reinforce stereotypes found in a data set that categorize people into one form or another depending on their identity (e.g. race, gender, nationality, etc..).
The specificity of the stimuli helps to reduce this problem, but it is not necessary to deliberately condition the model to produce outputs that better represent facts from all corners of the globe.
Deepfakes use GANs, which is a different deep learning method than the ones used in DALL·E 2, but the problem is the same. People can use inpainting to add or remove objects or people - even though this is prohibited by OpenAI's internal policies - and then threaten or harass others.
The saying "a picture is worth a thousand words" illustrates this problem. From one image, we can imagine many, many different titles that can provide something similar, and efficiently cope with well-intentioned filters. OpenAI's Violent Content Policy does not allow a challenge like "a dead horse in a pool of blood", but users can completely create a "visual synonym" with the challenge "Photo of a horse sleeping in a pool of red liquid". is given below. It can also happen unintentionally, which is called a "fake".
We tend to think of language models that generate text when we think about misinformation, but visual deep learning technology is readily available for "information operations and disinformation campaigns," as OpenAI acknowledges. While deepfakes may be better for faces, DALL·E 2 can create incredible scenarios of a different nature. For example, anyone can ask DALL·E 2 to take pictures of burning buildings or people calmly chatting or walking with a famous building in the background. T
DALL · E 2 creatures generally look beautiful, but harmony is sometimes lost in a way that humans never miss. This shows that DALL·E 2 is very good at pretending to understand how the world works, but not really. Most people will never be able to paint like DALL·E 2, but they absolutely cannot make these mistakes unintentionally. spelling DALL·E 2 is good at drawing, but great at spelling words. Probably the reason they perform so poorly is the DALL·E 2 code. If something is not represented in CLIP embedders, DALL·E 2 will not draw it correctly.
The Future Of DALL-E 2
DALL·E 2 shows how far the AI research community has come in harnessing the power of deep learning and addressing some of its limitations. It also provides insight into how deep learning models can eventually open up new creative applications that everyone can use.
At the same time, it reminds us of some of the obstacles that remain in AI research and the conflicts that need to be resolved. OpenAI has not yet decided whether and how DALL·E 2 will be available to the general public. But given the potential, we expect to see many applications for this technology in the marketing space.