Diffusion Digest
Posts
SD 3.5 Medium, AI Fashion Models, Open Source Debate | This Week in AI Art 👗

SD 3.5 Medium, AI Fashion Models, Open Source Debate | This Week in AI Art 👗

Cut through the noise, stay informed — new stories every Sunday.

November 03, 2024

Interesting find of the week. Take a look at this indie game dev using AI to generate his assets.

🎮 AI Indie Game - "Robo Janitor" (assets made w/ AI)
Dev used:
- DALL-E 3 for all art/sprites
- Suno for music
- ElevenLabs for voices
- GDevelop + After Effects for dev
Source: reddit.com/r/aigamedev/co…
#gamedev#AIgamingx.com/i/web/status/1…
— diffusion digest (@DigestDiff93383)
3:56 PM • Nov 2, 2024

👗 AI RESHAPES FASHION:
How AI models are saving fashion brands millions while raising questions about the future of human modeling
🎯 OPEN SOURCE INITIATIVE:
Debate heats up as OSI challenges tech giants like Meta over what "open source" really means in AI
📡 ON OUR RADAR:

If you’re trying to get to inbox zero but you still want to read this later:

AI MODELS ENTER FASHION INDUSTRY

Fashion retailer Mango has begun implementing AI-generated models in its advertising campaigns, marking a significant shift in the fashion industry's approach to marketing. The company first featured AI-generated models in July 2023, shortly before reporting its highest revenues in four decades. This move is part of a broader industry trend, with other major brands like Nike, Louis Vuitton, and Levi Strauss & Co. also exploring AI-generated content for their advertising.

"It's about faster content creation," explains Mango CEO Toni Ruiz, highlighting the primary motivation behind this transition: increased efficiency and cost reduction. While human models typically charge around $35 per hour, AI model generation services can cost as little as $29 per month. The technology allows for rapid content creation and enables brands to easily customize model appearances to target specific demographics or markets.

This shift extends beyond just replacing models - it affects entire creative teams including photographers, art directors, retouchers, stylists, and makeup artists. Mango's in-house engineering team uses machine learning to create "cohesive mood boards" and trains the AI using specific outfit images to generate the desired aesthetic. As Jordi Álex Moreno, Mango's chief technology information officer, explains, "It is an excellent example of teamwork between human handcrafted intelligence and digital intelligence."

Despite the move toward AI, Mango has stated plans to double its workforce and expand its physical retail presence, with aims to open 42 stores by the end of 2024 and nearly 70 by 2025. The company frames AI implementation as a complement to human creativity rather than a replacement. As Moreno puts it, "AI should serve as a co-pilot to amplify our employees' capabilities and creativity, speeding up repetitive tasks so teams can spend more time on value-added work."

‘NY Post’ Source

DEFINE: ‘OPEN-SOURCE’ AI

The Open Source Initiative has sparked debate in the AI community by establishing strict new criteria for what constitutes "open-source" AI. Under these guidelines, AI models must provide complete transparency about their training data, including detailed information about data sources, selection processes, and filtering methodologies.

This decision directly challenges major tech players like Meta, whose Llama model has been widely promoted as open-source. Meta spokesperson Faith Eischen responded to the guidelines, stating that "there is no single open source AI definition, and defining it is a challenge because previous open source definitions do not encompass the complexities of today's rapidly advancing AI models."

The new definition has divided the community. Some experts argue for a more precise terminology, suggesting "open weights" as a more accurate description for current AI models that share their parameters but not their training data. As one developer noted, "Models right now are basically freeware, not open source."

A key concern is the practical impossibility of full compliance. Companies face significant legal risks in disclosing training data, particularly regarding copyright issues. The New York Times reported that Meta had internally acknowledged the presence of copyrighted content in its training data, noting they had "no way of not collecting that."

The situation mirrors historical tensions in the software industry. OSI's executive director Stefano Maffulli drew parallels to Microsoft's resistance to open source in the 1990s, suggesting that tech giants are using similar arguments about complexity and cost to justify keeping their technology partially closed.

This new standard may significantly impact the AI landscape, potentially affecting how models are developed, shared, and labeled in the future. However, without regulatory authority, OSI's ability to enforce these standards remains limited to setting industry expectations and maintaining the integrity of the "open-source" designation.

‘The Verge’ Source

PUT THIS ON YOUR RADAR

Detail-Daemon: Comfy Enhancement Plugin

A new plugin for ComfyUI brings powerful detail enhancement capabilities to AI art generation. ComfyUI-Detail-Daemon ports the detail enhancement functionality from SD WebUI into the ComfyUI workflow environment.

🎨 Detail-Daemon: Comfy Enhancement Plugin
A new plugin for ComfyUI brings powerful detail enhancement capabilities to AI art generation. ComfyUI-Detail-Daemon ports the detail enhancement functionality from SD WebUI into the ComfyUI workflow environment.
> Precise detail… x.com/i/web/status/1…
— diffusion digest (@DigestDiff93383)
8:12 AM • Oct 30, 2024

Precise detail control through sigma parameter adjustment
Compatible with SDXL and SD1.5 models
Optimized for Flux model outputs
Includes four specialized nodes for different detail enhancement approaches
Simplified workflow compared to traditional methods

Github Repo | Reddit Thread

PixelWave: A New Flux Model Fine-tune for AI Image Generation

PixelWave is a community-created fine-tune of the Flux model that offers enhanced aesthetic capabilities and alternative styling options compared to the base model. The fine-tune reportedly took 5 weeks of training on an RTX 4090.

Available in GGUF format (6.7GB) for better compatibility with lower VRAM GPUs
Works with ComfyUI workflow
Noted for less "plastic-looking" results compared to base Flux
Limited compatibility with existing Flux LoRAs
Requires specific clip encoders and VAE files for optimal performance

CivitAI Workflow | Reddit Thread

ComfyUI Image Filters: Enhanced Image Manipulation Tools

A comprehensive collection of filter nodes for ComfyUI that extends image, latent, and matte manipulation capabilities. These filters provide advanced options for image processing and enhancement within the ComfyUI workflow.

Guided Filter Alpha Feature

Features fast blur operations (100x faster than standard ComfyUI blur)
Includes advanced tools like Alpha Matte for edge refinement
Offers detail enhancement using guided filters
Provides color matching and normalization tools
Contains specialized nodes for batch processing and noise handling
Includes new BetterFilmGrain node with improved realism and performance

Github Repo

ComfyUI-MochiEdit: New Video Editing Nodes for Mochi

A new set of custom nodes for ComfyUI that enables video editing using Genmo Mochi. The toolkit implements an RF-Inversion-inspired approach where videos are first converted to noise and then resampled with target prompts for editing.

Requires ComfyUI-MochiWrapper for operation
Features specialized unsampling and sampling nodes
Includes eta controls for balancing original vs. generated content
Allows for adding new elements while maintaining video consistency
Provides adjustable guidance parameters for fine-tuning results
Includes example workflows for getting started

Github Repo

Oasis: Real-Time AI-Generated Game Demonstration

Oasis is a new AI model that generates real-time interactive gameplay visuals in response to user inputs. The system has been demonstrated using Minecraft-style graphics, showcasing the potential for AI-generated interactive environments.

Generates game visuals in real-time based on user input
Currently has limited object permanence (environment changes when looking away)
500M parameter model available open source
Trained on gameplay footage to predict and generate appropriate frames
Running on cloud infrastructure for demo purposes
Includes interactive web demo for public testing

Blog Post | Model | Demo | Reddit Thread

Blendbox Alpha Launches: Layer-Based AI Image Generation Tool

Blockade Labs introduces Blendbox Alpha, a new AI image generation tool that brings Photoshop-like layer controls to AI art creation, moving away from traditional prompt engineering toward more intuitive visual manipulation.

Real-time adjustments of lighting, texture, and composition
Layer-based system for precise element control
Modular approach allows localized changes without regenerating entire image
Supports multiple AI engines including Stable Diffusion
Created by the team behind Skybox AI (8K panoramic scene generator)
Currently in internal testing with subscriber-only access
Focuses on giving artists direct creative control rather than relying on complex prompts
Changes save as "steps" allowing version history

Blendbox AI

Suno Launches "Personas" - AI Voice Style Cloning Feature

Description: Suno has introduced a new "Personas" feature that allows users to capture and replicate specific musical styles and vocal characteristics, enabling consistent style across multiple AI music generations.

We’re excited to introduce Personas 🎤 Personas let you save the essence of a song - vocals, style, vibe - and reimagine it across your creations. Want to bring your favorite vocals or style into a new song? Here’s how:
1⃣ Find a song you love, click Create then Make a Persona… x.com/i/web/status/1…
— Suno (@suno_ai_)
9:26 PM • Oct 31, 2024

Creates reusable templates from existing songs' vocal and style elements
Allows saving and sharing of style templates
Includes social features for sharing Personas publicly
Currently limited to premium/professional members
First 200 songs are free, then costs 10 points per song
Designed to help creators maintain consistent musical identity
Templates can include vocal characteristics, musical style, and emotional qualities

New Upscaling Technique for Stable Diffusion 3.5 Models

A new workflow combining SD 3.5 Large and Medium models with Skip Layer Guidance (SLG) has been developed, offering enhanced upscaling capabilities and improved detail retention in generated images.

Combines SD 3.5 Large's 1MP output with SD 3.5 Medium's 1440x1440 capability
Introduces Skip Layer Guidance for better control over model attention and detail
Includes a custom film LyCORIS trained on Ferraria Solaris film stocks
Features improved color handling and lighting compared to previous models
Works with ComfyUI workflow system
Requires latest ComfyUI version for SD 3.5 LyCORIS compatibility

Workflow | Film LyCORIS | Full Res Sample Images | SD 1.5 Upscale Workflow | Reddit Thread

ElevenLabs Releases X-to-Voice: Twitter Profile to AI Avatar Converter

ElevenLabs has open-sourced X-to-Voice, a tool that analyzes Twitter profiles to automatically generate matching AI voices and dynamic avatars, creating personalized virtual identities.

Say hello to “X to Voice”.
An open-source app that generates a unique voice from your X/Twitter profile.
Built with our Voice Design API which is live from today — where you can create a unique voice from any prompt.
Try it with your profile here: xtovoice.com
— ElevenLabs (@elevenlabsio)
5:12 PM • Oct 31, 2024

One-click conversion of Twitter profiles to AI voices and avatars
Integrates ElevenLabs' Voice Designer API with Taedra's avatar tools
Uses Apify for data collection and Hedra for avatar generation
Processes profiles in about one minute
Deployable on Vercel platform
Fully open-source with complete API documentation
Includes direct social media sharing capabilities

Github Example

BigASP v2: Large-Scale SDXL Fine-tuning Details Released

Developer fpgaminer shares comprehensive technical details of training BigASP v2, a significant fine-tune of Stable Diffusion XL trained on 6.7M images. The post provides valuable insights into large-scale model training methodology and challenges.

Trained on 6.7M high-resolution images (4M NSFW, 2M SFW)
Used custom quality rating system combining human ratings and ML classification
Implemented JoyCaption and JoyTag for automated image captioning and tagging
Training cost approximately $3,600 on 8xH100 GPUs over 6 days
Introduces improved score tag system for quality control
Developer openly shares technical challenges and areas for improvement

Reddit Thread | Github Repo

InvokeAI 5.3 Released with New "Select Object" Tool and Flux Support

InvokeAI's latest update introduces an AI-powered object selection tool based on Meta's Segment Anything Model (SAM), making it easier to edit specific parts of AI-generated images. The update also expands compatibility with Flux models and adds new features for image editing workflows.

New Select Object tool enables precise selection and layer conversion of image elements
Improved integration with Control Canvas for inpainting and controlnet workflows
Added support for Flux Controlnets & IP Adapters
Introduced pressure sensitivity tablet support
SD 3.5 support coming soon
Available as both open-source software and cloud-hosted service

Github Repo | Reddit Thread

Stability AI Releases SD 3.5 Medium

Stability AI has released a more compact version of their Stable Diffusion 3.5 model, featuring improved performance while requiring less computational resources. The model is designed for better text understanding and image quality while maintaining efficiency.

2.6B parameter model requiring only 9.9GB VRAM
Supports higher native resolutions than SD 3.5 Large (up to 1440x1440)
Reportedly 4x faster than SD 3.5 Large
Improved prompt adherence and text rendering
Already supported in ComfyUI and Forge
Released under Stability Community License
Features additional attention layers for improved performance

Hugging Face Model | Hugging Face Spaces (Demo)

Two-Character Image Generation with Flux and LoRA

u/Sensitive_Teacher_93 has shared their method for creating consistent AI-generated images featuring two distinct characters using Flux AI and LoRA, addressing the common challenge of character bleeding in multi-person compositions.

Uses combination of Flux AI and trained LoRA models
Two-step generation process: base image creation followed by inpainting
Author shares complete training dataset and configuration files
Includes technique for maintaining character consistency
Key challenge remains with closely positioned characters
Full workflow documented in Medium article
Demonstrates practical application for personal AI image creation

Reddit Thread | Full Article

There’s a reason 400,000 professionals read this daily.

Join The AI Report, trusted by 400,000+ professionals at Google, Microsoft, and OpenAI. Get daily insights, tools, and strategies to master practical AI skills that drive results.

Reply

or to participate.