I made the decision to get into the world of Artificial Intelligence and Machine Learning!
I'm starting from a fun spot, too: A.I. Generated Artwork
This seems like a great entry point, and it's even better since I was told of an easy pre-built environment to get going: Automatic1111 Stable Diffusion WebUI which can be found here:
I started playing around with the WebUI, trying different combinations of keywords, or "prompts" as they're called. Really, I was just tinkering to learn the UI, but I hadn't really done anything that meant anything, just a bunch of random stuff to see what would happen. Like a friend telling me about the game "Stray" in a conversation, so I would put together a prompt like "vaporware stray cat in a cyberpunk dystopian alley" and creating this cool 4 image bundle:
I was feeling the itch. By this point, I already knew I was hooked.
But now I wanted to do something a bit more involved than just throwing random words together and seeing what the system spits out. I needed to have a specific direction I wanted to go.
I had recently downloaded the mobile time killing paint-by-numbers app, Dark Colors, and had saved one of the pieces of artwork of D.C. comic villainess, Harley Quinn:
I had this image on my phone and wanted to try a few things out anyway, so I connected to my network and used the Img2Img feature, making sure to set the "Denoising Strength" all the way down to 0 (zero) so I could see just how exact it matches. In other words, is this still AI generated or is it simply an image copy.Â
It has some minor flaws, but I knew this was a real deal A.I. generated image, and not just a photocopy of the original.
Looks pretty damn good, doesn't it? You can notice things like the eyes or the letters in the watermark, even the fishnets are a bit washed out. But this was a great test! Also, it put the image in my render history, so now I could just pull the image from there using the "Send to img2img" feature on the "History" tab of my WebUI instance.
If you're curious why I wanted to bring this into my history to do this way if I was already able to use my WebUI instance on my phone, the answer is actually pretty simple.
I had a concept I really wanted to do involving this depiction of Harley Quinn, and with her holding that baseball bat, I wanted to put her in a ballpark... but... using this image as-is was going to cause 1 of 2 things to happen:Â
Either I could turn the denoising value up to get a more relevant background, which would completely change Harley's appearance, orÂ
I could turn the denoising value down to keep Harley looking more like this, but the background wouldn't look at all correct
I also noticed that whatever I was doing, the background was heavily influenced by the background of the original image.
High Denoising Value (0.84)
Low Denoising Value (0.49)
The high denoising value gets a closer background to what I was wanting, but Harley just feels out of place here.
And the low denoising value has all kinds of other problems, mostly that while Harley looks closer to the original image, she has problems with what she's holding, the changes in her clothes, etc. But more than that, the background is awful.
I mostly hated how dark and red the backgrounds kept ending up.
This is where we get into another feature: Inpaint
Inpaint gives you the capability to draw a mask on top of your image before using that image for processing. You also have a few cool options with Inpaint that allow you to either:
Only inpaint everything that has been masked, or
Only inpaint everything that has not been masked.
It was easier for me to trace over the figure of Harley and go with option 2:
Now that I had Harley masked, I decided to crank the Denoising strength all the way up. Also, since I was effectively removing Harley from the new render, I removed her from the prompt, making the key subject material be "standing in baseball park."
My full settings were as follows:
Prompt: Masterpiece, best quality, standing in baseball park
Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
Steps: 20, Sampler: Euler a, CFG scale: 12, Seed: 3445035576, Face restoration: GFPGAN, Size: 512x512, Model hash: 587f05f3, Denoising strength: 1, Mask blur: 4
And this gave me something I was ready to work with!
Hell yeah! We have Harley standing on the field! And she looks good!
Good... but not "great." And I'm not cool with my girl Harley only being "good" so I gotta do something about that...
As you can see, the background was a success, but I missed or didn't adequately cover areas with my mask, so she's a bit rough around the edges.
Also, I wanted to push this. I wanted to see this flat image as a 3D styled render with some realism.
So I added that to my prompt. And since we now wanted the AI to recognize Harley for who she is, I also put her name in the prompt.
The complete settings were:
Prompt:Masterpiece, best quality, harley_quinn satisfied standing at ballpark 3d realism
Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
Steps: 20, Sampler: Euler a, CFG scale: 12, Seed: 3679158294, Face restoration: GFPGAN, Size: 512x512, Model hash: 587f05f3, Denoising strength: 0.5, Mask blur: 4
And the results... just wow!
Okay, her eyes aren't great, her right arm isn't properly attached at the elbow, and she may not be holding the bat anymore, but the likeness between the images is unmistakable!
Now that I have this creation, I'm like "what else can I do with my mask?" So I started changing ballpark with all kinds of places!
I had these ideas of her busting someone up in a club or being in the middle of a heist!
Nightclub
Bank
Bank Vault
Inside Bank Vault
These were good...
Again, not great... good... but what wasn't right about them?
The problem with these was the subject size relative to the canvas. Because Harley takes up so much of the frame, it leaves very little room for anything to fit well. So I decided I needed to change my approach on the theme.
I thought, "how can I frame her in a way that sticks to the crime-in-progress theme?"
Grand Theft Auto
So now we're not just looking at a picture as an observer of art... we're engaged in the art! We're about to get carjacked!
New settings:
Prompt: Masterpiece, best quality, POV looking into car
Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
Steps: 20, Sampler: Euler a, CFG scale: 12, Seed: 3445035576, Face restoration: GFPGAN, Size: 512x512, Model hash: 587f05f3, Denoising strength: 1, Mask blur: 4
Result:
Well... shit... I didn't even think that POV looking into car would be interpreted as "from my point of view as if I'm looking into a car"
Quick fix on the prompt: Masterpiece, best quality, POV looking out from car
This didn't quite look right, so I modified the prompt just a little bit more:Â
Masterpiece, best quality, POV looking out from car window
I knew at this point I probably needed to consider updating the mask so she would appear on the other side of the door, but I figured I'd see what I could make happen with this as my new baseline image.
She's outside a car, so let's update the prompt to hailing a cab
Prompt: harley quinn hailing a cab
Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
Steps: 20, Sampler: Euler a, CFG scale: 12, Seed: 2341660601, Size: 512x512, Model hash: 587f05f3, Denoising strength: 0.75, Mask blur: 4
Okay, that's cool but it changed a bit too much... let's tone down our denoising...
Prompt: harley quinn hailing a cab
Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
Steps: 20, Sampler: Euler a, CFG scale: 12, Seed: 2341660601, Size: 512x512, Model hash: 587f05f3, Denoising strength: 0.61, Mask blur: 4
I wanted to keep these settings and try out a few different actions besides "hailing a cab."
Side note: starting below, I updated "harley quinn" to "harley_quinn" in the prompt just so the 2 words were treated as a single name.
asking for a ride
stealing car
stealing a car
asking for money
On that last one, I started thinking about different actions. I noticed there were slight changes between "stealing car" and "stealing a car" so I thought maybe there's going to be larger differences based on which verb I use in place of "asking [for]"
The results were pretty cool
begging for money
pleading for money
stealing money
taking money
This is when things took an interesting spin. All of those images were at a "Denoising Strength" of 0.61.
I saw how similar each of these images was to the next, with some definite noticeable differences. Especially in that last one with "taking money."
So I decided to bump the value up, but only a little, to 0.66
This is when things took an interesting turn. Being a car enthusiast, I am a strong opponent to self-driving cars. But this was a ride I was excited to go on with AI behind the wheel!
Check out this new set of results:
taking money
stealing money
begging for money
That last one really struck me for some reason. There just appears to be a level of emotion in her expression.
So now I wanted to see what results I could get using that image, but with a different prompt.
I really wanted to keep a close likeness, so I turned my Denoising settings down to 0.51. I also set my seed value to -1 to randomize the seed so I could generate many different images using the same prompt.
The core formula for the next series of renders is as follows:
Prompt: Masterpiece, best quality, harley_quinn [keyword]
Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
Steps: 20, Sampler: Euler a, CFG scale: 12, Seed: [seed_value], Size: 512x512, Model hash: 587f05f3, Denoising strength: 0.51, Mask blur: 4
I will put each keyword and seed_value in each image caption:
keyword: thinking
seed_value: 1571408106
keyword: begging
seed_value: 1149304386
keyword: sad
seed_value: 1352919633
keyword: sad
seed_value: 3378621575
keyword: sad
seed_value: 3378621578
keyword: adorable
seed_value: 914532471
keyword: adorable
seed_value: 914532472
keyword: adorable
seed_value: 914532473
So there you have it! What an amazing transformation from artwork captured from a mobile app to a completely original composition filled with expressive emotion. I am blown away by my first adventure in AI generated artwork!