After a bumpy start (see my other thread about it), I start to feel a bit comfortable with SDXL to the point that I probably wont look back at the 1.5 models. This wizard-hat wearing cat was generated in A1111 with:
"a cute kitty cat wearing a wizard hat, candy rays beaming out of the cat ears, (a swirling galaxy of candy pops background:0.7), 1980's style digital art, hyperrealistic, paintbrush, shallow depth of field, bokeh, spotlight on face, cinematic lighting "
Negative (from a standard style I use): "(bad anatomy:1.1), (high contrast:1.3), watermark, text, inscription, signature, canvas frame, (over saturated:1.2), (glossy:1.1), cartoon, 3d, ((disfigured)), ((bad art)), ((b&w)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, ugly, tiling, poorly drawn hands, 3d render
Generated at 1024x1024 without refiner.
There's a few things to be aware of, when working with SDXL in A1111 that I found:
- make sure you upgraded A1111 to version 1.5.1 (do a "git pull" in the install directory)
- I needed to add "--medvram" to my command line arguments, otherwise I'd get out of memory errors (12 GB VRAM)
- make sure you have your VAE as "automatic" or using the SDXL VAE (can be downloaded from huggingface). Older VAE's wont work
- older LoRa don't work and you will get errors
- there is a noise offset LoRa for SDXL (sd_xl_offset_example-lora_1.0) which does work, but I don't see too much difference in the images. With LoRa they are a tiny bit crisper. However, this LoRa doesn't work with the Refiner model (you will get errors)
And the biggest one for me:
- don't use arbitrary image proportions and stick to the ones posted here: https://platform.stability.ai/docs/features/api-parameters
This was the biggest mistake I made initially. By using other image sizes I'd get super wonky images and very unsatisfying results. I stick to the recommended dimensions and now my images are much, much better.
A word to the refiner model: as of now I don't see big quality improvements if I go with the refiner model in img2img @about 0.1 - 0.25 denoising. I think I will play around more with this at higher denoising strength and see what I can get out of it.
Anyway, I think the SDXL is a huge improvement and I start getting really exciting results already
Cheers :)
The prompt was just an example and usually my prompts get quite a bit longer than that. But in 1.5 models I manage to get what I want to see eventually. I also find that throwing in qualifiers like "mesmerizing" does do something to the image, although in can be subtle.
However, what I wanted to say here was that in SDXL my prompting seems to go to nowhere and I feel I'm not able to get out the kind of image I have in my head. Keeping the prompt example, in SD1.5 using a custom model like Deliberate 2.0 I'm able to end up with an image of a hat wearing cat surround by surreal looking candy pops. (however the final prompt for this reads). In SDXL my images "break" (i.e start looking flat, unrefined or even bizarre) at some point long before I can direct them towards my imagined result. All my usual approaches like reducing CFG, re-ordering prompts, using a variety of qualifierts don't seem to work like I'm used to.
And tbh, I think this has to be expected. These are new models, so we need new toools (prompts) to work with them. I just haven't learned how to do it yet and I'm asking how others do it :)