Hi,
I'm a bit struggling to get good results with SDXL and I'm wondering if I do something wrong ... I tried A1111 and ComfyUI and have been underwhelmed in both cases. I can get bland looking boring images out of it, which seem to be ok from a technical point of view (like they seem to be correctly generated, without weird artifacts or something like that). However, whenever I try to get something more elaborate my prompting leeds to nowhere. Like I can get "a cat" and it will generate a picture of a cat. But if I try to get "a cat wearing a wizard hat floating in a mesmerizing galaxy of candy pops" - these kind of prompts seem to quickly break the final image. I'm not talking about tailored models and LoRa here, but I seem to be able to do much more interesting stuff with the Deliberate 2.0 model than with SDXL.
So, what's your experience so far? Does the community need to catch up first and do work on custom models, LoRa, and so on to really get thinks cooking? Or do I need to learn better how to work with XL?
I was actually looking forward to have a "bland" and hopefully rather unbiased model to work with where not every prompt desperately trys to become a hot anime girl, but I'm struggling to get interesting images for now.
For reference, I updated my A1111 installation with "git pull" (which seems to have worked, as I now have a SDXL tab in my settings) and downloaded the 1.0 model, refiner and VAE from huggingface. I can generate text2imgage in A1111 with the base model, however I can't seem to get the img2img with the refiner model to work ...
On ComfyUI I found a premade workflow that uses the base model first and the refiner from the latent and which seems to work just fine technically, but also seems to require a different approach to prompting than I'm used to.
Are you entering your prompts exactly like you mentioned in you post? I've never had success with full sentences like that even with SD 1.5.
I usually break things down and separate by comma. It seems to give better results.
Like: "a cat, wearing a wizard hat, floating in a galaxy, candy pop planets"
I think words like mesmerizing don't do much without you being more specific about what you want to see. Also, if that still doesn't get me the result I want, I start adding in more similar phrases, like "a cat wearing a hat", etc.
As far as the refiner goes, I haven't found I needed it much. The base model produces nice though output on it's own. But you would just switch to it in your model selection, use the same prompt, but set a denoising strength of around 0.25, leaving the resolution the same as the original image.
The prompt was just an example and usually my prompts get quite a bit longer than that. But in 1.5 models I manage to get what I want to see eventually. I also find that throwing in qualifiers like "mesmerizing" does do something to the image, although in can be subtle.
However, what I wanted to say here was that in SDXL my prompting seems to go to nowhere and I feel I'm not able to get out the kind of image I have in my head. Keeping the prompt example, in SD1.5 using a custom model like Deliberate 2.0 I'm able to end up with an image of a hat wearing cat surround by surreal looking candy pops. (however the final prompt for this reads). In SDXL my images "break" (i.e start looking flat, unrefined or even bizarre) at some point long before I can direct them towards my imagined result. All my usual approaches like reducing CFG, re-ordering prompts, using a variety of qualifierts don't seem to work like I'm used to.
And tbh, I think this has to be expected. These are new models, so we need new toools (prompts) to work with them. I just haven't learned how to do it yet and I'm asking how others do it :)