- Diffusion Digest
- Posts
- NEW IPAdapter, Font Generator, AuraFlow UPDATES | This Week In AI Art 🎨
NEW IPAdapter, Font Generator, AuraFlow UPDATES | This Week In AI Art 🎨
Cut through the noise, stay informed — new stories every Sunday.
Aannddd we’re changing the color template again. Sorry, not sorry. I initially thought I could make a neon, cyberpunk-esque newsletter work, but it lacked a certain je ne sais quoi, maybe I’ll try again in the future. In related, but unrelated news, I finally got around to finishing the landing page for the newsletter, I went with a Shonen Jump theme - if you know, you know. Check it out, and give me a rating out of 5 below ↓ (remember that I have your email :)).
(ノ◕ヮ◕)ノ*:・゚✧ |
Here are the topics we’ll be reviewing this week:
So grab your favorite drink, get comfy, and let’s get caught up (and ideally, inspired too).
🎭 AuraFlow: Promising New Text-to-Image Model Rivals DALL-E and Stable Diffusion 3
Remember that text-to-image model that got released last week, went by the name of AuraFlow? The main trait it boasted was prompt adherence. Well, a user by the name of MarcS- on Reddit decided to put AuraFlow through its paces, comparing it to other leading models like DALL-E and Stable Diffusion 3 (SD3). Their findings shed some interesting light on how this new model stacks up in real-world use.
MarcS- reused a series of prompts they had previously tested with DALL-E and SD3, allowing for a direct comparison. The results? In many cases, AuraFlow showed impressive performance, sometimes even surpassing its more established competitors. Let's dive into some of the key takeaways from this comprehensive test:
Complex Scenes: AuraFlow excelled at creating detailed, complex scenes like the "Garden Dome in a space station orbiting Uranus" prompt.
Prompt: “The breathtaking view of the Garden Dome in a space station orbiting Uranus, with passengers sitting and having coffee.”
Fantasy Elements: It performed well with fantasy-themed prompts, such as the D&D adventurers and the Gothic manor scene.
Prompt: “A trio of typical D&D adventurer are looking through the bushes at a forest clearing in which a gothic manor is standing. In the night sky, three moons can be seen, the large green one, the small red one and the white one.”
However, the comparison also revealed some limitations. AuraFlow struggled with accurately depicting specific historical contexts, such as Soviet-era scenes. It also showed inconsistencies in rendering human anatomy and proportions, a common challenge in AI image generation. Additionally, the model had difficulties accurately representing certain animals and mythical creatures.
Prompt: “A woman wearing a 18th century attire, on all four, facing the viewer, on a table in a pirate tavern.”
It's worth noting that AuraFlow is still in its early stages (version 0.1), suggesting significant potential for growth and refinement. This early performance has generated excitement in the AI art community, with many eager to see how the model develops. As an open-source project, AuraFlow's evolution could be accelerated by community contributions, potentially positioning it as a strong competitor in the text-to-image landscape.
🛠️ reForge: Enhancing Stable Diffusion WebUI with Improved Performance and Features
If you're familiar with Stable Diffusion WebUI and looking for an advanced tool that builds on its foundation while offering improved performance and new features, then you'll want to hear about reForge.
reForge is a fork of the original Stable Diffusion WebUI Forge project, which itself was built on top of the popular AUTOMATIC1111 (A1111) Stable Diffusion WebUI. The goal of reForge is to make development easier, optimize resource management, speed up inference, and explore experimental features while maintaining compatibility with the A1111 ecosystem.
Recent updates to reForge include:
Improved Performance: reForge boasts faster inference times compared to both A1111 and the original Forge, especially when working with large LoRA files.
Better VRAM Management: It can run SDXL models with as little as 4GB VRAM and SD1.5 models with just 2GB VRAM, making it more accessible for users with lower-end hardware.
New Features: reForge has implemented several new capabilities, including:
Scheduler selection
DoRA support
Soft inpainting
Multiple checkpoints loaded simultaneously
New samplers and optimizations
Compatibility: reForge maintains compatibility with most A1111 extensions, making it a drop-in replacement for many users.
Open-Source Development: The project is actively encouraging community contributions, with a focus on implementing new models and features.
For developers, reForge introduces a patchable UNet, making it significantly easier to create extensions and implement new features like FreeU, Stable Video Diffusion, and custom ControlNet implementations.
While reForge offers exciting new possibilities and is actively encouraging community contributions, it's worth noting that it's still in active development. Some features from the latest A1111 builds may not yet be implemented, and certain complex extensions might require updates to work properly.
🖌️ Kwai-Kolors Introduces IP-Adapter-Plus for Realistic Style Transfer in ComfyUI
A new ipadapter! If you’ve been using comfy ui long enough, then you’ve definitely used an ipadapter. For the uninitiated, an ipadapter acts as the ‘style transfer’ part of conditioning, think applying a leather texture to an apple, for example (has that been done before?).
Kwai-Kolors has released their IP-Adapter-Plus, designed specifically for their Kolors base model. This new tool is now available for ComfyUI users, bringing some exciting features:
Easy Integration: The models are available on Hugging Face, and there's a dedicated ComfyUI node with installation instructions and workflow examples on GitHub.
Improved Realism: Early users report highly realistic results without needing additional LoRAs, making it a powerful tool for style transfer and detailed image generation.
Open Source: The project is open-sourced under the Apache-2.0 license, encouraging community involvement.
However, users should be aware of a few considerations:
VRAM Requirements: The model can be demanding on graphics memory, which might be challenging for lower-end GPUs.
Current Limitations: As of now, it doesn't support LoRAs or ControlNet, which some users might miss.
Despite these factors, the Kwai-Kolors IP-Adapter-Plus represents a significant step forward in image generation capabilities, especially for achieving realistic results with less fine-tuning.
🔠 Generate Unique Fonts with 414design's LoRA: From Concept to Usable Font in Minutes
Here is an incredibly useful tool that was released by user, 414design: it's a LoRA that allows users to generate entire, consistent fonts. Trained on SD 1.5 weights for optimal results, users can prompt the model with typographic terms like "serif," "italic," "pixel," etc., to guide the font generation process. The LoRA's flexibility allows for experimental and creative font designs, especially when combined with img2img techniques.
The creator has developed a post-processing script that enables easy digitization using Calligraphr, streamlining the workflow. With this system, you can go from concept to usable font in under 15 minutes!
📡 Put This On Your Radar…
ControlNet++: All-in-One Image Editing
ControlNet++ is a versatile all-in-one model for AI image generation and editing. It combines 10+ control types into a single model, enabling efficient image manipulation with multiple condition inputs. ControlNet++ produces high-quality, high-resolution images and works with 8GB VRAM.
Seamless Textures with ComfyUI, Deep Bump, and Maya
Create AI-generated seamless textures and associated shader maps using ComfyUI, Deep Bump, and Maya. This method allows rapid generation of tileable textures from text prompts and produces multiple shader passes for realistic 3D rendering. Integrate these AI-generated assets into your professional 3D workflows.
Realistic Upscaling with Tile ControlNet and Tiled Diffusion
Combine tile controlnet and tiled diffusion for high-quality image upscaling while maintaining realism. This workflow is particularly effective on real photos and realistic AI-generated images. It works well with limited VRAM and reduces hallucinations compared to other methods.
High-Res Image Generation with Ultrapixel
Ultrapixel is a new tool for generating high-resolution images (2K-4K) using Stable Cascade weights. It produces detailed images without upscaling and retains Stable Cascade's artistic style. The ComfyUI-UltraPixel node is available on GitHub, but it requires significant computational power and time to generate results.
GitHub repository: https://github.com/2kpr/ComfyUI-UltraPixel
PPM: Useful Nodes for ComfyUI
PPM is a set of custom nodes for ComfyUI, created by Pamparamm. These nodes offer features like multi-subject attention coupling, guidance limiting, and negative weighting in positive prompts. Download them from the GitHub repository and add them to your ComfyUI custom nodes folder to incorporate them into your workflows.
Reply