From ‘Houdini’ to ‘Horrendous’
What happens when you ask an AI to fix one small thing.
I was working on a visual design project the other day and wanted to pair some of my poetry with visual imagery. I can’t draw for shit, so I figured it would be fun to explore what I could achieve with AI tools. I started with what I know best: Claude and ChatGPT. It took me exactly one attempt to identify the better tool for the task at hand:
As I leaned into it, I was impressed with what ChatGPT could conjure from a blank page. That first iteration genuinely feels like magic… until you try to change >1 thing. I burned through my image credits churning out a dozen “not quite right” versions of the same picture - each new attempt occasionally fixing the thing and occasionally breaking something else.
So I went looking for other tools and burned through their free credits just as fast. The outcome was similar - AI is great at conjuring from scratch - terrible at editing an existing image. Interpretation, ideation, atmosphere, mood - genuinely impressive… But the other half: iteration, refinement, fine detail, holding a fix in place while you work on the next one; maddening.
I decided to pick another visual use case and see if I ran into the same issues. This time though, I would push AI & artistic sensiblities to limit... this time, we would create... A DOG IN A ROCK BAND.
WHY IT LOOPS
Not all of these tools actually edit - a lot of them just regenerate. When you ask for a change, the model doesn’t reach into your existing image and adjust it. It redraws the whole thing from scratch, guided by your new words, with no memory of the last version. That’s why fixing the drum moved the drummer, and fixing the drummer turned the amps around.
There’s a split between looking and reasoning. These models are phenomenal at look - light, texture, vibe - and weak at logic: counting, geometry, left versus right. “Put the pedal in front of the foot” sounds trivial, but it’s a spatial-reasoning task, and spatial reasoning is exactly what the visual models are worst at. You’re asking a brilliant painter to also be a draughtsman, and those turn out to be different jobs.
The good news: this is being actively fixed. A newer class of tools - native multimodal models, in-context editors - actually condition on the image that’s already there instead of starting over. The shift is from redraw to edit, and it’s the difference between arguing with a slot machine (guilty) and working with an assistant.
CLAUDE TO THE RESCUE
It was at this point I decided to use Claude to help me diagnose where I was going wrong. The sharpest thing Claude pointed out: most drum kits in the training data are photographed from the front - that’s how you shoot a drummer. So when I kept demanding a specific rear three-quarter angle with the logo facing the crowd, I wasn’t fighting my own phrasing. I was fighting the model’s entire sense of what a drum kit is. No sentence was ever going to talk it out of millions of front-facing reference images.
That reframes the whole problem. The question stops being “what’s the magic wording?” and becomes “am I hitting a phrasing wall, or a capability wall?” A phrasing wall you can prompt your way around. A capability wall you can’t - and the longer you bang your head against it, the more credits (and sanity) you burn. So I stopped banging. I decided to go with the grain and shift to a perspective the model had likely seen a million times before. That small compromise made the exercise immediately easier.
create a picture of a dog performing in a rock band.
1Pain and/or suffering.
2As above.
A golden retriever fronting a punk band on a small club stage, shot from a three-quarter angle behind and to the right of the band, cheering crowd beyond. Stage left: a scruffy terrier drummer seated behind a full kit, the kick drum’s front head printed “THE BARKS” angled toward the crowd, pedal and the drummer’s foot on the near side. Stage right: a beagle bassist in a denim vest. Overhead banner: “THE BARKS – LIVE!”. Moody concert lighting, haze, cinematic, photographic.
Some tips to help with your next visual project
So, having burned an afternoon and an embarrassing number of credits, here’s what I’d actually do next time:
Lead with one rich prompt, then re-roll clean. Put the effort into the opening description rather than trying to surgically repair a flawed image. Starting over is usually faster than fixing.
Give spatial information spatially. If position or geometry matters, don’t describe it in another paragraph - show it. A rough sketch, a reference image, a mask. Words are a terrible medium for “over there, slightly to the left.”
After about two failed corrections, stop. Two strikes and you’re probably looking at a wall, not a wording problem. Don’t keep rephrasing - step back and diagnose. And use another AI to help you do it: asking Claude why the image tool kept failing was the single most useful move I made all day.


