Use cases for AI-generated images
2022 has been a wild year for artificial intelligence. In January, most mainstream debates about AI centered around a Westworld- or 2001: A Space Oddesy-style existential reckoning for mankind. Should we allow military equipment to decide who to murder based on algorithms? What are the dangers of an AI becoming sentient? Can we guarantee that advanced AIs can be controlled?
But the release of several image generators (DALL·E 2, Midjourney, Stable Diffusion) made it clear that long before computers gain sentience, they will compete for creative jobs. These AI image generators went from nowhere to everywhere in a matter of months & debates quickly pivoted to labor rights & copyright ownership. Now the technology has creatives questioning the future of their industry & what their role will be in the tech-enabled tomorrow.
I've spent the year messing around with Midjourney, so I have some thoughts. And while I think the tech is revolutionary & will change creative industries, some of the more hysterical pieces I've read seem to be a bit overblown. I think the technology is mostly akin to the microwave. You can accomplish some pretty amazing things in a short amount of time with a microwave. Wrap a hot dog in a wet paper towel & you can have a delicious steamed wiener in less than a minute. And sometimes that's plenty good enough. But you do run up against its limitations pretty quickly. So 25 years after the mass adoption of the microwave, there still exists plenty of fully staffed kitchens, the best of which have little use for the charming, radioactive box.
Consider the image below...
These painting robots pretty clearly create what's behind the anxiety plaguing the creative class. The aprons actually give the robots a human-like quality, as if they feel consciously responsible for their own appearance, suggesting they're a suitable full-on replacement for illustrators. And yet, despite its emotional resonance, much of the portrait is totally incoherent. And it holds up to very little close scrutiny.
The image was generated with Midjourney v3, which is known for such artifacts. And while later Midjourney models actually provide more coherent images, they do so by sacrificing sophistication.
This image, generated with the latest Midjourney test algorithm, is less chaotic but only marginally so. The table legs are still a mess, as are the chairs & the perspective of the robot figures. It's also a great deal less emotionally resonant. And this, it turns out, is the tradeoff: coherency for creativity. Asking the algorithm to simply generate a table is no problem. But a portrait of a table with robots sitting at it, learning to paint, in a painterly style is bound to create chaos because the algorithm doesn't actually know what any of these things mean. It can approximate a table, but it doesn't understand the object's use or relationship to gravity, so it cannot determine whether or not the output is logical. And yet, the more the algorithm is trained on isolated images of objects, the more likely they are to reflect those boring, isolated objects as opposed to the "scenes" that are emotionally resonant but difficult for an algorithm to deconstruct.
If you ask for something very simple & direct, however, like "robots learning to paint," the v4 model is more than willing to accommodate with a bunch of Pixar-lookalike images.
The less direction you provide the model, the better your outcome will be. It understands modern, digital illustration. And it's great at executing on themes in this style. Which means you have basically two options when using AI to generate imagery: embrace the default style of the model or embrace the chaos inherent to straying from that style.
Consider this music video from the band Disturbed.
It's created by generating a series of scenes through AI & producing dozens of variants of each scene, strung together to give the appearance of motion. When you're seeing several photos per second, you miss the artifacts: disfigured faces, drum sets that make make no logical sense, mangled hands, missing body parts, multiple sets of teeth, etc. Pausing on individual frames reveals the chaos within. But for some uses, that's totally fine. This is a microwaved music video, after all. And yet, my bet is that it wasn't trivial to make.
The portrait of the robot painters at the top of this page, for all its flaws, took the better part of an evening to generate. It was chosen from over 200 images created by 50 prompts. And even then portrait required some photoshopping. If I had to do this for a 3-minute video, I might question whether or not the novelty of the effect was worth all the effort. (There are numerous ways to create an economical video, after all.)
The age of the so-called prompt artist
There's a growing number of AI enthusiasts trying to fashion themselves as "prompt artists," but that seems very similar to calling oneself a "microwave chef." The reality is that the more coherent & ostensibly impressive AI images were created by pulling the lever on a slot machine & hoping something nice pops out. There is very little control over the layout or perspective of the piece, & only vague references to style & substance. And this is precisely why there is so much overlap in the Midjourney Community Feed (requires an account to view).
These are impressive if you assume that the "prompt artist" had a specific vision in mind before they started typing. But that's rarely the case. The algorithm's idea of a woman is through the gaze of a modern digital illustrator: fair skin, tiny noses, pointy chins, a very specific eye shape... & for some reason a lot of cat ears. The artist may have input "anime-style girl" or "emocore girl" or "cyberpunk girl," but the output will vary little from the idealized "AI girl." Which means its difficult to impossible to add character to the output outside of what is entirely superficial (freckles, an eyepatch, etc.).
Consider these tweets by TiMi Studio art director Nicolas Bouvier...
What's become apparent is that the "free details" filled in by the model when you prompt a basic phrase like "anime-style girl" actually have a cost in that they can be easily copied & are hard to adjust. A concept artist, for instance, often considers every detail as they construct a character. What is the shape of a character's eyes? What should their body shape be? What is the significance of their jewelry? How worn is their clothing? What style of shoes do they wear? How does their resting mouth position reflect their personality? But when you get into trying to adjust these specific things via text prompt, you find very quickly that it's terribly limiting.
This is, in other words, the difference between microwaving a pre-made, frozen pizza & creating one from scratch.
What's more, it turns out that AI generated artwork cannot be copyrighted. This makes it impractical for many commercial uses. If the generated image is edited by a designer or illustrator after the fact, THAT work can be copyrighted, but there's nothing that prevents anyone else from generating that same image or even finding the original generated image in the community feed (access to all Midjourney generated images are available to anyone with a user account) & using it themselves, perhaps even applying the same edits. For this reason, the community feed search is oftentimes a more convenient source of imagery than generating new images. It's all copyright-free, after all, despite what some "prompt artists" try to claim.
None of this is to say that AI image generation won't continue to improve. A lot of the shortcomings reflect a homogeneity within digital illustration broadly & a thoughtfully curated training dataset can minimize the influence of, for instance, the idealized "AI girl," normalizing other face shapes & characteristics & randomizing the outputs. This would be an essential evolution, in fact, if AI intends to be used commonly. All of these models inherit & harbor the biases of society, which must be filtered out of the training data, lest they be institutionalized.
Certain technologies are being developed to describe the composition of the output, which will give AI enthusiasts much more of a voice in the output. And more tools are sure to come. But the reality is that for AI actually replace serious illustration, the prompt would need to be so detailed & the training dataset so highly curated that it just wouldn't be worth the effort.
What might be more practical, but seems further away, is an AI component integrated into a modelling software, where one could describe a a hair style or a glove type in a text prompt and use that as a starting point for further modelling. But at the end of the day, it's the intentionality that AI cannot replace.
The state of AI copyright
Consider this set of horror movie character portraits in the style of expressionist painter Egon Schiele, which I generated with Midjourney v3.
Many of these turned out great & the imperfections give them a lot of character. I did have to remove some chaotic elements in Photoshop... extra eyes or weird skin folds. But I intentionally chose a style that allowed me to lean into the chaos. Even generating these images, though, was akin to a crap shoot, forcing me to work around pollution in the training dataset.
I was simply unable, for instance, to generate a recognizable portrait of Ghostface, the serial killer from the Scream series, because the model kept fusing the concept of "Ghostface" with "Ghostface Killah," the rapper from the Wu-Tang Clan. No amount of detail added to the prompt or parameters meant to minimize the influence of "Killah" or "Wu-Tang" produced an acceptable output. And even for the characters that didn't suffer from such glaring dataset pollution, I oftentimes had to make a concerted effort to minimize the influence of Egon Schiele's own self-portraits.
Of course, it's not out of the question that the models could eventually learn to distinguish between identical phrases in these tricky edge cases, or to minimize the influence of self-portraits of artists when invoking an artist's style. But it seems that the ideal solution is actually to train one's own dataset for a project like this. If a model was trained on all of Schiele's paintings, minus his self-portraits, & only on photos of the characters intended to be output, a lot of really great images could be generated. But that's a LOT more effort than simply typing in "portrait of Steve Harrington in the style of Egon Schiele" into a pre-trained model. This is no longer akin to a microwave, but instead building out an automated kitchen. Which is quite a bit of work.
There are debates carrying on about whether this is theft. And it might be, but not in the way that many artists are claiming.
The reality is that if I were to buy a couple of Egon Schiele books, I could study them & learn to paint in his style. I could also legally scan them & index them on my computer. And, theoretically, I could also train a computer to produce art in his style. What I couldn't do is copy & paste parts of Schiele's paintings & claim them as my own. But that's different from deconstructing his style & recreating it. The only sticking point here is whether or not I legally purchased the books to gain access to the images.
Imitation is inherent to creative production. We imitate our parents' recipes. We imitate the Beatles' songwriting ideas. We imitate the language of our peers. And this is not a zero-sum activity. It's not theft. It's a process that any artist--without exception--has participated in, & it's the way we develop shared culture. So it seems hypocritical of artists to claim, as many are, that an algorithm doing the same is theft. The only sticking point is whether the training data was legally acquired.
Which begs the question, is scraping a publicly available image off of a site like ArtStation.com for AI training legal? And/or SHOULD it be legal?
The irony of the situation is that, whether it's ArtStation or a design-centric platform like Dribbble or even an old-school community Deviant Art, creators have long been learning from & copying one another, gaining the skills to start careers without posing an existential threat to one another & without paying for methods gleamed. Recognizable styles develop because exposure is the opening salvo of influence. But many within the illustration community have decided that they would rather not influence the robots.
ArtStation has responded by updating its terms of service & giving the author control over whether or not they "permit" the art they post to be scraped for AI training purposes. But it remains to be seen whether that will hold up in court, or if that even truly matters on an internet where protecting anything from downloading has shown to be a fool's errand. When users can curate their own training dataset & obscure the source, the legality becomes moot.
The court is therefore more likely to crack down on copyrighted output than input. For instance, one can draw Mickey Mouse in the privacy of their own home all they want. But the moment you publish an image of Mickey Mouse, you're likely to receive a letter from Disney lawyers. In this way, the copyright onus is largely on the publisher to determine what is & isn't legal to publish. How the image was produced matters less. In other words, it's perfectly fine to write a song that sounds like the Beatles. But using a sample Paul McCartney's voice will likely land you in court.
So what's bound to happen, I believe, is that art demands that require a low level of intentionality will be subsumed by AI, & art demands that require a higher level of intentionality will be aided by AI. And that there will be a new prestige for art that steers clear of AI entirely.
Artists have long enjoyed the benefits of technological advances. First digital tools did away with the cost & inconvenience of keeping art supplies on hand. Then tools like cloning & copying & pasting sped up the process. And finally AI-driven tools like content-aware fill gave artists the ability to eliminate mistakes with a single click. These tools now allow an artist to generate theoretically infinite (though in practice quite limited) objects or patterns to use in their work. Each of these advancements has made creating art more accessible to the masses, as is also true with music, photography, & films. But they have also (arguably) pushed each medium forward. The definition of an artist periodically changes relative to the technology of the time, but each cry of existential threat tends to be overblown.
A hundred years ago, landscape photography was Ansel Adams lugging his tripod out into the American West, carefully capturing the untrodden wilderness & then painstakingly developing his portraits to share with the world. For this, he was awarded a Presidential Medal of Honor. Now hashtag van life Instagram influencers are capturing those same images & sharing them with the entire world instantly. All of the minutiae of tonal range & exposure are handled by algorithms within their phone. And yet, the professional photographer still exists, even if the description has evolved.
This gives me hope that digital artists might, in a post-AI world, strive to make recognizably human work, differentiating themselves from the homogeneity produced by AI. Optimistically, the advent of AI image generation could create a moment like Nirvana displacing all of the hair metal bands in 1992. A new "punk" era of illustration defying the neon, pixie-faced art of the mainstream. What could that look like? The possibilities are infinite.
And yet, for use cases that require low intentionality, these will inevitably be subsumed by AI. And this might not be a bad thing. The outcome may be higher quality art in lower value situations.
AI as a replacement for stock imagery
I recently used Midjourney to generate cover art for a series of Spotify playlists. This was a situation where there was no value proposition for me to create original artwork or to pay someone else to do it. Oftentimes this type of situation falls to free stock imagery or stolen imagery from a google search. In this case, Midjourney recognized "Spotify playlist cover art" as a specific style, which allowed me to quickly generate a number of images in a similar style that were better than what I would've otherwise used.
Of course, nothing is preventing anyone from using similar prompt phrases & generating similar images. This may be inevitable, in fact. But this is a case where "looking the part" may actually be better than standing out from the crowd, & the stakes are low enough that it doesn't really matter, either way.
Playlist cover imagery, blog post cover graphics, online profile pictures... use cases that are already rife with theft as a function of the value proposition are ripe for AI disruption, & the web will likely be a better place for it. And designers will be able to create nicer graphics, assisted by AI, within a timeframe that makes sense relative to the image's use.
My positions on these issues may change as the technology evolves, but a number of technological innovations over the years lead me to believe that the tech may be transformative but not existential, be it the microwave, the digital camera, digital illustration, or any of a number of other sea-changing inventions. Even carriage drivers, despite the changes in technology, are now Uber drivers. Because the need for human intervention will always persist, despite our technological ambitions & even our greatest concerns.