• Diffusion Digest
  • Posts
  • SD 3.5 Medium, AI Fashion Models, Open Source Debate | This Week in AI Art 👗

SD 3.5 Medium, AI Fashion Models, Open Source Debate | This Week in AI Art 👗

Cut through the noise, stay informed — new stories every Sunday.

Interesting find of the week. Take a look at this indie game dev using AI to generate his assets.

If you’re trying to get to inbox zero but you still want to read this later:

AI MODELS ENTER FASHION INDUSTRY

Fashion retailer Mango has begun implementing AI-generated models in its advertising campaigns, marking a significant shift in the fashion industry's approach to marketing. The company first featured AI-generated models in July 2023, shortly before reporting its highest revenues in four decades. This move is part of a broader industry trend, with other major brands like Nike, Louis Vuitton, and Levi Strauss & Co. also exploring AI-generated content for their advertising.

"It's about faster content creation," explains Mango CEO Toni Ruiz, highlighting the primary motivation behind this transition: increased efficiency and cost reduction. While human models typically charge around $35 per hour, AI model generation services can cost as little as $29 per month. The technology allows for rapid content creation and enables brands to easily customize model appearances to target specific demographics or markets.

This shift extends beyond just replacing models - it affects entire creative teams including photographers, art directors, retouchers, stylists, and makeup artists. Mango's in-house engineering team uses machine learning to create "cohesive mood boards" and trains the AI using specific outfit images to generate the desired aesthetic. As Jordi Álex Moreno, Mango's chief technology information officer, explains, "It is an excellent example of teamwork between human handcrafted intelligence and digital intelligence."

Despite the move toward AI, Mango has stated plans to double its workforce and expand its physical retail presence, with aims to open 42 stores by the end of 2024 and nearly 70 by 2025. The company frames AI implementation as a complement to human creativity rather than a replacement. As Moreno puts it, "AI should serve as a co-pilot to amplify our employees' capabilities and creativity, speeding up repetitive tasks so teams can spend more time on value-added work."

DEFINE: ‘OPEN-SOURCE’ AI

The Open Source Initiative has sparked debate in the AI community by establishing strict new criteria for what constitutes "open-source" AI. Under these guidelines, AI models must provide complete transparency about their training data, including detailed information about data sources, selection processes, and filtering methodologies.

This decision directly challenges major tech players like Meta, whose Llama model has been widely promoted as open-source. Meta spokesperson Faith Eischen responded to the guidelines, stating that "there is no single open source AI definition, and defining it is a challenge because previous open source definitions do not encompass the complexities of today's rapidly advancing AI models."

The new definition has divided the community. Some experts argue for a more precise terminology, suggesting "open weights" as a more accurate description for current AI models that share their parameters but not their training data. As one developer noted, "Models right now are basically freeware, not open source."

A key concern is the practical impossibility of full compliance. Companies face significant legal risks in disclosing training data, particularly regarding copyright issues. The New York Times reported that Meta had internally acknowledged the presence of copyrighted content in its training data, noting they had "no way of not collecting that."

The situation mirrors historical tensions in the software industry. OSI's executive director Stefano Maffulli drew parallels to Microsoft's resistance to open source in the 1990s, suggesting that tech giants are using similar arguments about complexity and cost to justify keeping their technology partially closed.

This new standard may significantly impact the AI landscape, potentially affecting how models are developed, shared, and labeled in the future. However, without regulatory authority, OSI's ability to enforce these standards remains limited to setting industry expectations and maintaining the integrity of the "open-source" designation.

PUT THIS ON YOUR RADAR

Detail-Daemon: Comfy Enhancement Plugin

A new plugin for ComfyUI brings powerful detail enhancement capabilities to AI art generation. ComfyUI-Detail-Daemon ports the detail enhancement functionality from SD WebUI into the ComfyUI workflow environment.

  • Precise detail control through sigma parameter adjustment

  • Compatible with SDXL and SD1.5 models

  • Optimized for Flux model outputs

  • Includes four specialized nodes for different detail enhancement approaches

  • Simplified workflow compared to traditional methods

PixelWave: A New Flux Model Fine-tune for AI Image Generation

PixelWave is a community-created fine-tune of the Flux model that offers enhanced aesthetic capabilities and alternative styling options compared to the base model. The fine-tune reportedly took 5 weeks of training on an RTX 4090.

  • Available in GGUF format (6.7GB) for better compatibility with lower VRAM GPUs

  • Works with ComfyUI workflow

  • Noted for less "plastic-looking" results compared to base Flux

  • Limited compatibility with existing Flux LoRAs

  • Requires specific clip encoders and VAE files for optimal performance

ComfyUI Image Filters: Enhanced Image Manipulation Tools

A comprehensive collection of filter nodes for ComfyUI that extends image, latent, and matte manipulation capabilities. These filters provide advanced options for image processing and enhancement within the ComfyUI workflow.

Guided Filter Alpha Feature

  • Features fast blur operations (100x faster than standard ComfyUI blur)

  • Includes advanced tools like Alpha Matte for edge refinement

  • Offers detail enhancement using guided filters

  • Provides color matching and normalization tools

  • Contains specialized nodes for batch processing and noise handling

  • Includes new BetterFilmGrain node with improved realism and performance

ComfyUI-MochiEdit: New Video Editing Nodes for Mochi

A new set of custom nodes for ComfyUI that enables video editing using Genmo Mochi. The toolkit implements an RF-Inversion-inspired approach where videos are first converted to noise and then resampled with target prompts for editing.

  • Requires ComfyUI-MochiWrapper for operation

  • Features specialized unsampling and sampling nodes

  • Includes eta controls for balancing original vs. generated content

  • Allows for adding new elements while maintaining video consistency

  • Provides adjustable guidance parameters for fine-tuning results

  • Includes example workflows for getting started

Oasis: Real-Time AI-Generated Game Demonstration

Oasis is a new AI model that generates real-time interactive gameplay visuals in response to user inputs. The system has been demonstrated using Minecraft-style graphics, showcasing the potential for AI-generated interactive environments.

  • Generates game visuals in real-time based on user input

  • Currently has limited object permanence (environment changes when looking away)

  • 500M parameter model available open source

  • Trained on gameplay footage to predict and generate appropriate frames

  • Running on cloud infrastructure for demo purposes

  • Includes interactive web demo for public testing

Blendbox Alpha Launches: Layer-Based AI Image Generation Tool

Blockade Labs introduces Blendbox Alpha, a new AI image generation tool that brings Photoshop-like layer controls to AI art creation, moving away from traditional prompt engineering toward more intuitive visual manipulation.

  • Real-time adjustments of lighting, texture, and composition

  • Layer-based system for precise element control

  • Modular approach allows localized changes without regenerating entire image

  • Supports multiple AI engines including Stable Diffusion

  • Created by the team behind Skybox AI (8K panoramic scene generator)

  • Currently in internal testing with subscriber-only access

  • Focuses on giving artists direct creative control rather than relying on complex prompts

  • Changes save as "steps" allowing version history

Suno Launches "Personas" - AI Voice Style Cloning Feature

Description: Suno has introduced a new "Personas" feature that allows users to capture and replicate specific musical styles and vocal characteristics, enabling consistent style across multiple AI music generations.

  • Creates reusable templates from existing songs' vocal and style elements

  • Allows saving and sharing of style templates

  • Includes social features for sharing Personas publicly

  • Currently limited to premium/professional members

  • First 200 songs are free, then costs 10 points per song

  • Designed to help creators maintain consistent musical identity

  • Templates can include vocal characteristics, musical style, and emotional qualities

New Upscaling Technique for Stable Diffusion 3.5 Models

A new workflow combining SD 3.5 Large and Medium models with Skip Layer Guidance (SLG) has been developed, offering enhanced upscaling capabilities and improved detail retention in generated images.

  • Combines SD 3.5 Large's 1MP output with SD 3.5 Medium's 1440x1440 capability

  • Introduces Skip Layer Guidance for better control over model attention and detail

  • Includes a custom film LyCORIS trained on Ferraria Solaris film stocks

  • Features improved color handling and lighting compared to previous models

  • Works with ComfyUI workflow system

  • Requires latest ComfyUI version for SD 3.5 LyCORIS compatibility

ElevenLabs Releases X-to-Voice: Twitter Profile to AI Avatar Converter

ElevenLabs has open-sourced X-to-Voice, a tool that analyzes Twitter profiles to automatically generate matching AI voices and dynamic avatars, creating personalized virtual identities.

  • One-click conversion of Twitter profiles to AI voices and avatars

  • Integrates ElevenLabs' Voice Designer API with Taedra's avatar tools

  • Uses Apify for data collection and Hedra for avatar generation

  • Processes profiles in about one minute

  • Deployable on Vercel platform

  • Fully open-source with complete API documentation

  • Includes direct social media sharing capabilities

BigASP v2: Large-Scale SDXL Fine-tuning Details Released

Developer fpgaminer shares comprehensive technical details of training BigASP v2, a significant fine-tune of Stable Diffusion XL trained on 6.7M images. The post provides valuable insights into large-scale model training methodology and challenges.

  • Trained on 6.7M high-resolution images (4M NSFW, 2M SFW)

  • Used custom quality rating system combining human ratings and ML classification

  • Implemented JoyCaption and JoyTag for automated image captioning and tagging

  • Training cost approximately $3,600 on 8xH100 GPUs over 6 days

  • Introduces improved score tag system for quality control

  • Developer openly shares technical challenges and areas for improvement

InvokeAI 5.3 Released with New "Select Object" Tool and Flux Support

InvokeAI's latest update introduces an AI-powered object selection tool based on Meta's Segment Anything Model (SAM), making it easier to edit specific parts of AI-generated images. The update also expands compatibility with Flux models and adds new features for image editing workflows.

  • New Select Object tool enables precise selection and layer conversion of image elements

  • Improved integration with Control Canvas for inpainting and controlnet workflows

  • Added support for Flux Controlnets & IP Adapters

  • Introduced pressure sensitivity tablet support

  • SD 3.5 support coming soon

  • Available as both open-source software and cloud-hosted service

Stability AI Releases SD 3.5 Medium

Stability AI has released a more compact version of their Stable Diffusion 3.5 model, featuring improved performance while requiring less computational resources. The model is designed for better text understanding and image quality while maintaining efficiency.

  • 2.6B parameter model requiring only 9.9GB VRAM

  • Supports higher native resolutions than SD 3.5 Large (up to 1440x1440)

  • Reportedly 4x faster than SD 3.5 Large

  • Improved prompt adherence and text rendering

  • Already supported in ComfyUI and Forge

  • Released under Stability Community License

  • Features additional attention layers for improved performance

Two-Character Image Generation with Flux and LoRA

u/Sensitive_Teacher_93 has shared their method for creating consistent AI-generated images featuring two distinct characters using Flux AI and LoRA, addressing the common challenge of character bleeding in multi-person compositions.

  • Uses combination of Flux AI and trained LoRA models

  • Two-step generation process: base image creation followed by inpainting

  • Author shares complete training dataset and configuration files

  • Includes technique for maintaining character consistency

  • Key challenge remains with closely positioned characters

  • Full workflow documented in Medium article

  • Demonstrates practical application for personal AI image creation

Learn AI in 5 Minutes a Day

AI Tool Report is one of the fastest-growing and most respected newsletters in the world, with over 550,000 readers from companies like OpenAI, Nvidia, Meta, Microsoft, and more.

Our research team spends hundreds of hours a week summarizing the latest news, and finding you the best opportunities to save time and earn more using AI.

Reply

or to participate.