Diffusion not following prompt? We fixed that in ComfyUI —

Marei
12 min readApr 25, 2024

--

In the world of AI-generated imagery, the barrier between the big boys and us lowly plebs, was the ability to make complex Hi-Res visuals that follow natural language prompts and their intricacies closely. I’m glad to share that that time is OVVEER Ladies and Gentlemen.

It’s Over Ladies & Gentlemen

Thanks to significant strides in open-source technology, tools once accessible only to the big boys like OpenAI and MidJourney are now available for enthusiasts and creators in their own homes. I will share a few visuals below and how we got her, how we solved it, and providing a ComfyUI workflow for you (free of charge).

ShoutOuts / Credits

Jukka Seppänen (KIJAI): A Trailblazer in AI Image Generation

A special shoutout goes to Jukka Seppänen, also known as Kijai, who has been instrumental in integrating sophisticated AI models to enhance image generation. Kijai’s development of a user-friendly wrapper for ELLA (Efficient Large Language Model Adapter) has significantly simplified the integration of advanced text-to-image capabilities into existing workflows. His contributions have enabled artists and developers to harness the power of complex AI tools more easily, turning intricate textual prompts into stunning visual creations. A few hours ago he added HiDiffusion into his Ella wrapper and things changed for me.

https://github.com/kijai

The ELLA Team at Tencent

The team behind ELLA at Tencent, including Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, and Gang Yu, has introduced a groundbreaking approach to improve text alignment in diffusion models. Their work on ELLA has paved the way for more precise interpretations of dense prompts, enabling the generation of images that more closely align with the creators’ visions. By designing the Timestep-Aware Semantic Connector (TSC), they have enhanced the model’s ability to handle detailed attributes, complex relationships, and extended narratives within visual content.

https://huggingface.co/QQGYLab/ELLA

MEGVII Technology’s HiDiffusion

The team at MEGVII Technology, led by Shen Zhang, Zhaowei Chen, Zhenyu Zhao, Yuhao Chen, Yao Tang, and Jiajun Liang, deserves recognition for their development of HiDiffusion. This model enhances the resolution and efficiency of pretrained diffusion models, enabling the creation of higher-resolution images without compromising speed. Their innovation allows creators to produce visuals with stunning detail and clarity, making high-resolution creative endeavors more accessible and enjoyable.

The OSS Community in the world of Image Generation is large, and many more can be credited. I will keep it to these 3 for this article.

NOW ONTO BUSINESS (Some Examples)

Please Note: I did not nitpick the results below. These are of the first 10 I made using this workflow.

1 - “A Medieval Interlude in the Future”

Here’s the Full Prompt

Title: “A Medieval Interlude in the Future”

**Subject Details:**
- **Character**: Mr. Spock from Star Trek
- **Age**: Appears mid-30s
- **Gender**: Male
- **Features**: Mr. Spock’s distinctive features include pointed Vulcan ears, raised arched eyebrows, and a composed, serious facial expression. His hair is neatly styled in his typical short, cropped manner.

**Costume and Accessories**:
- **Attire**: Tailored by the renowned Joe Casely-Hayford, Mr. Spock is dressed in a luxurious, medieval-inspired theater costume blending elements of sci-fi and historical garb. The costume includes a long, flowing robe with intricate embroidery reminiscent of the Middle Ages, combined with subtle, futuristic silver accents that hint at his Starfleet origins. The fabrics are an interplay of velvet and lightweight metallic fibers.
- **Accessories**: Around his neck hangs a small, ornamental 3D-printed pendant, a fusion of his Vulcan heritage and the symbolic icons from Game of Thrones.

**Scene Description**:
- **Setting**: Inside Cologne Cathedral, a grand and magnificent Gothic structure renowned for its awe-inspiring architecture. The scene is set in a dimly lit area near the high

2— “Whimsical Winter Wonderland in Sequoia National Park meets Future Metropolis”

Here’s the Full Prompt

Title: Whimsical Winter Wonderland in Sequoia National Park meets Future Metropolis

Scene Description:
Imagine a sprawling digital canvas where nature and futurism blend seamlessly. The setting is Sequoia National Park, stylized in a whimsical, Loish-inspired manner with exaggerated, glowing giant sequoias piercing through a gentle snowy landscape. Each tree radiates an ethereal, angelic glow, casting soft lights on the snow below, creating a warm, inviting ambiance amidst the cold.

In the heart of this enchanted forest, a futuristic cityscape emerges, juxtaposing the natural elements with sleek, shimmering skyscrapers. These buildings boast a luminous, almost transparent quality with vibrant neon accents, reflecting advanced, sustainable technologies melding with nature. High above, eco-friendly drones softly hum while they flutter like fireflies, illuminating the scene and interacting playfully with the environment.

Character Design:
The central figure of this artwork is a young woman, about 25 years old, reflecting a blend of the mystique of the natural world and the innovation of futurism. She has long, flowing hair that subtly shifts in color from a deep forest green to a snowy white, mirroring her surroundings. Her eyes sparkle with a vivid, unnatural shade of

3 — “Echoes of Ancestral Bonds”

Here’s the Full Prompt

In the realms of art and imagination, let’s manifest a scene that infuses the modern touch of realism with the vibrant, symbolist essence of Post-Impressionism, inspired by the legendary works of Paul Gauguin. This work, titled “Echoes of Ancestral Bonds,” will dive deeply into the profound connection between heritage and the evolving identity amidst the urban sprawl. We’ll craft a canvas that speaks to the soul, bridging time and tradition through the lens of a bustling cityscape.

**Scene Overview:**
Under the spellbinding sunset of a contemporary metropolis, our canvas comes to life at the intersection of a crowded urban park and the looming silhouette of the city’s skyline. The golden hour bestows a tranquil, yet contrasting warmth on the cold, hard concrete, and steel giants reaching for the heavens.

**The Subjects:**
At the heart of our scene, capturing the spirit of “Celtic (Grandma and Daughter:1.1)”, we find our protagonists — a grandmother and her daughter.

- **Grandmother (Siobhan):** In her early 70s, she exudes a strength carved from years of wisdom. Her silver hair is tied back, showcasing the proud, yet gentle features of a face marked with the map of a life fully embraced. Her eyes, a vibrant green, reflect a soul touched by stories of old. She wears a simple, emerald-green shawl over a traditional black dress, a nod to her Celtic heritage. A silver pendant with intricate Celtic knotwork rests against her heart.

- **Daughter (Aisling):** At 35, Aisling is the embodiment of youthful curiosity and the unbridled joy of discovery. Her auburn hair, wild and untamed, cascades in waves, mirroring her grandmothers’ in color but alight with the promise of tomorrow. She wears a modern, playful dress adorned with a digital print of Gauguin’s Tahitian landscapes, seamlessly weaving the vibrancy of her ancestor’s tales with her own urban canvas.

**The Scene:**
Amidst the roar of the city, they find an oasis of quiet. The park bench they share is a bridge across generations. Before them, an open book of Celtic folklore rests, a physical manifestation of the stories shared. Around them, the world moves — a tapestry of diverse faces, each a story unto itself. In the background, the city’s architecture pays homage to the past’s primitivism and the present’s innovation.

4— “Dawn of the Cyborg Metropolis”

Here’s the Full Prompt

Title: “Dawn of the Cyborg Metropolis”

Subject: The focus of the painting is a young woman, around 25 years old, standing prominently in the foreground amidst an advanced urban landscape. She is of mixed heritage, with striking amber eyes that seem to reflect the city’s glow, and her hair is a cascade of dark curls, partially tied up to reveal cybernetic enhancements running along her neck and disappearing into her collar. She wears a smart, form-fitting jacket that integrates seamlessly with her cybernetic arm — a marvel of engineering, sleek with an almost organic integration into her flesh.

Scene Description:
The scene is set at the cusp of dawn, with the first hints of sunlight piercing through the towering skyscrapers of a futuristic metropolis. The city is a blend of advanced technology and decaying structure, showcasing a stark contrast between the new world and the remnants of the old.

Foreground: The young woman stands on a balcony of a high-rise building, overlooking the sprawling city below. Her gaze is intense, focused, reflecting a determination and a hint of melancholy. Her cybernetic arm is slightly raised, interfacing with a holographic display that projects from her wrist, showing complex schematics and data streams.

Midground: Below her, the streets are alive with activity. Autonomous vehicles glide seamlessly alongside human-operated machines, while pedestrians, both human and android, navigate the sidewalks in a choreographed dance of coexistence. Street vendors offer exotic, synthesized foods, and neon signs flicker with advertisements for the latest augmentations.

Background: The skyline is a mixture of ultra-modern skyscrapers illuminated by the rising sun and older buildings that bear the history of the city’s past. Gigantic holograms float above some of the structures, advertising various corporations and entertainment venues. In the distance, a massive, partially constructed tower looms over the city, symbolizing the unending push towards progress.

Additional Details: To add depth to your painting, incorporate subtle elements of daily life in the metropolis. A group of children, both human and cybernetically enhanced, playing with an anti-gravity ball, their laughter a reminder of innocence amidst the complex urban backdrop. A lone, older street musician playing a traditional instrument, his eyes telling stories of a world before the cybernetic revolution. Lastly, let the changing sky reflect the transition, with colors moving from the cool pre-dawn blues to the warm palette of sunrise, encapsulating the city in a moment of serene beauty

5 — “Morning Solace in the City”

Here’s the Full Prompt

Title: “Morning Solace in the City”

Scene Description:
The setting is a bustling New York City morning, where the early sun casts a soft, golden hue over the streets, buildings, and the people starting their day. The location is a charming sidewalk café nestled among towering skyscrapers, offering a serene oasis amid the urban rush. The café’s exterior is adorned with greenery and has a vintage, welcoming vibe, contrasting with the modern cityscape.

Subject Description:
The focal point of the scene is a redheaded lady, in her early 30s, sitting alone at a wrought-iron table outside the café. She possesses an air of quiet confidence and contemplation. Her hair, a vivid shade of auburn, cascades over her shoulders and is gently tousled by the morning breeze. She has fair skin, sprinkled with freckles, and her facial features are soft yet striking, with expressive green eyes that reflect a depth of thought.

She is dressed in a chic, yet comfortable style — a tailored emerald green blazer that complements her hair, paired with a simple white blouse and black, high-waisted trousers. A small, stylish pendant necklace and minimal makeup accentuate her natural beauty. Her posture is relaxed, and she holds a porcelain cup of coffee in one hand, the steam swirling up into the cool morning air.

Scene Dynamics:
The lady appears absorbed in her own world, gazing out at the city with a contemplative look, as if she’s both a part of the city and momentarily detached from it. Around her, the city wakes up — pedestrians walk by, some in a hurry, others taking their time, absorbed in their morning routines. A soft murmur of conversations, the clinking of cups, and distant city sounds fill the air, creating a lively yet soothing ambiance.

In the background, a yellow NYC taxi stops at the light, adding a pop of color and a sense of movement to the scene. A street vendor is setting up for the day nearby, and a small group of pigeons pecks at crumbs on the sidewalk, adding a touch of whimsy.

To her right, a small, well-cared-for potted plant sits on the edge of the café’s outdoor setup, its green leaves vibrant against the urban backdrop. To her left, an open book lies on the table, suggesting she’s been reading, but it now rests forgotten as she takes a moment to simply enjoy her coffee and the scenery.

6— “Reign of Fire: The King’s Covenant”

Here’s the Full Prompt

Title: “Reign of Fire: The King’s Covenant”

Scene Description:
In the grand hall of a magnificent castle, carved from the purest white marble, the air is heavy with anticipation. The vast space is dimly lit, creating an atmosphere of mystery and power. The only sources of light are the numerous candles placed meticulously around the hall, their flames dancing gently, casting long, flickering shadows on the walls and the polished marble floor. These candles are not ordinary; they are encased in elaborate holders made of gold and silver, adorned with precious stones, each telling a story of past victories and honors.

At the center of the hall stands a majestic statue, a masterpiece of craftsmanship. It depicts the first king of the land, his expression stern yet wise, a reminder of the legacy that weighs on the current king’s shoulders. The statue is positioned in such a way that it seems to watch over the proceedings, its shadow a constant presence.

The focus of the scene is a large, ornate fireplace at the far end of the hall. The fire within it roars, its flames reaching high, as if trying to escape their confines. The fire’s glow bathes the room in a warm, orange hue, contrasting with the cool marble and the darkness outside. This fire is a symbol of the king’s power — unyielding and all-consuming.

In front of the fireplace stands the king, a figure of authority and strength. He is in his late 40s, his face marked by the burdens of leadership and the scars of countless battles. His hair, once a vibrant black, is now streaked with silver, but his eyes still burn with the fire of youth. He is dressed in regal attire, a cloak of deep red draped over his shoulders, the fabric rich and heavy. On his head rests a crown, simple yet elegant, a testament to his royal status.

Flanking the king are his most trusted warriors, a diverse crew known across the lands for their valor and loyalty. They are an imposing sight, each bearing the distinctive armor and weapons of their order. Among them are a towering figure clad in armor that seems to absorb the light, a lithe warrior whose presence is almost ethereal, and a seasoned strategist, his eyes scanning the room, always thinking, always planning.

SO WHAT IS GOING ON HERE?

How did we accomplish this type of detail. In simple terms

1- Using SD15 with HiDiffusion and ELLA allows us to generate 1536 x 1536 visuals in one shot.

2- The details when using ELLA and SD15 are lacking, so we send the result over to SDXL for 2 round of post processing.

2a- Hi Res Fixing Result

2b- Advanced Tiled Upscale Using SDXL

SO WHERE IS THE WORKFLOW DAWG?

Prerequisites (I assume you already have ComfyUI know how)

HYG.

https://drive.google.com/file/d/10oQ-zDPN8kcFtHATjJIqzvnKMfEPeDMN/view?usp=sharing

Please subscribe, and I will try and get more of these out.

Omar Marei

--

--

Marei
Marei

No responses yet