rewks

AI Art - A gateway to our own imagination

2023-03-06

Introduction

For anyone not living under a rock it is clear that the availability of advanced "AI" models has exploded in just the last year. ChatGPT in particular has garnered a lot of mainstream media attention, with panicked stories about outperforming humans at certain tasks, potential for spreading misinformation, aiding malware authors and bringing the integrity of university student's coursework into question. All eyes are focused on this space and questions are being asked on how it will affect our lives going forward. The subject of this blog however is with a slightly different application of machine learning - the generation of artwork based on user defined prompts.

japanese-man-bike-tokeo.png

Are human artists becoming surplus to requirements? At first glance the images that people post online from using these tools will look incredible. If anyone can just run some words through a program and get this high quality art, what do we need artists for? Surely their jobs have been automated away?

In my opinion the answer is no. Not yet. Perhaps not ever.

Interspersed throughout the post I will include some of the images that I, or rather the AI, has generated. Admitedly this is partly to show off, as even though my involvement in their creation is limited I still get that urge to say "hey, look what I made!" Is that weird? I think it is, for it took no talent on my part, I just input some text and adjusted a few settings until a nice image came out. This does raise an interesting philosophical question - who should the credit for such images be attributed to? Would I be justified in feeling aggrieved should someone take the images, reuse them and claim them as their own? It's a strange new world we're on the edge of.

Stable Diffusion

There is a saying "When it rains, it pours" which applies to many things in life, and AI art generators are no different. As soon as they started being made available for public use, more and more started appearing. Often they seem to be an entrepreneurial endeavour, with limited free access and a subscriber model to remove rate-limiting or increase output resolution. There is no way I was handing money over to someone who has probably just put third party paywall infront of the actual source of the art so I went looking. It did not actually take long to come across Stable Diffusion and Midjourney. I can't comment on Midjourney as so far I have only tried Stable Diffusion, but after the initial setup and seeing what I was able to produce, I was blown away.

woman-asian-temples.png

So if such high quality images are able to be generated in seconds, how can I argue that artists are not in danger?

Training data limitations

What I think is important to understand is that everything that is output is based on a finite set of training data. When you want to generate some images you need to choose a checkpoint file to use. In the simplest terms a checkpoint file is simply an end result of a set of training data, and the creators probably had a particular style in mind when collating it all. Maybe it focuses on animals, photorealistic humans, RPG style art, anime, cyberpunk and so on. The larger the training data set, the more varied the outputs may be but a good checkpoint file has to strike a balance. Too much variation and you risk more chaotic and malformed images. Therefore if you generate enough images using the same prompts and checkpoint you will notice striking similarities despite each output using a randomised seed.

An obvious example is facial features. If I ask you to picture a completely imaginary 25 year old white woman, her face would probably be distinctly different from what anyone else had imagined if asked to do the same. There are infinite variations that could be made. In a training data set however, this is not the case. The AI is limited in a way a human artist would not be - it does not have an imagination, just a set of reference material. Generate a bunch of images of a 25 year old white woman and you'll probably notice similar noses, jaw line, eye shape or any other feature across many of them. This can be countered with carefully crafted prompts to an extent, but not completely.

old-man-in-forest.png

Understanding of the user's desires

AI does what I consider to be a seriously impressive job of taking in text prompts and making images that more or less conform to them. However, they will not be truly accurate or as flexible as a human can be. You can feed in the most descriptive prompt in the world but not everything you describe will be in the end result. You can assign weightings to keywords to improve likelihood of them influencing the output, use ControlNets, inpainting etc.. but in the end you're still at the mercy of randomness. I think this is part of the excitement, the AI interpretation of your prompt may even result in an amazing image that you hadn't actually thought of. Though if you have an image in your mind that you want recreated as accurately as possible your best bet is to work with a human who can take feedback throughout the development of the artwork and make changes based on your direction.

female-white-gold-angel.png

Flaws in the image

It is not hard to find examples of glaring flaws in images, even if they are perfect at a glance. If you scroll back up to the first image I included on this page you can see half of his index finger is missing! From minor flaws like this, or a missing earring, or liquidy eyes to major mistakes like an arm bending the wrong way or fireplaces blending into each other through some weird extradimensional plane, there is almost always something you can point to and say "yep, this is AI made". Whilst all this may be acceptable for the average person having fun with this as a hobby, or making a super cool character portrait for their Dungeons and Dragons character, it just will not cut it for professional use cases.

female-dark-wings-fire.png

Thoughts on the future

So will AI replace artists? I don't think so. At least not completely. The main reasons why are:

  1. Training data limitations result in limited output (e.g. common facial structures on people across images)
  2. Output is random, not an actual replication of what you are imagining so it may come close but will never be exact.
  3. There are always flaws. An extra finger here, a disfigured eye there. A missing earring. Broken backgrounds.

AI art is only going to get better. In fact the improvements that have been made in just the last six months or year alone are phenomenal. I still don't believe it will reach a point where artists are made redundant though. It is more likely that digital artists will adapt their workflow to include AI in some form, just like we are seeing in other industries where things like ChatGPT are being used as another tool to help a professional do specific tasks more efficiently.