Diffusion Digest
Posts
$1 M AI Painting, AI Radio Takeover, Open Source Voice Clone | This Week in AI Art 🎙️

$1 M AI Painting, AI Radio Takeover, Open Source Voice Clone | This Week in AI Art 🎙️

Cut through the noise, stay informed — new stories every Sunday.

November 10, 2024

Interesting find of the week. If you haven't seen The Line Studio's "Dear Alice" animation, I'd highly recommend it - it's a charming short that reimagines robots as partners in creating a better future, rather than the threats we often see in sci-fi.

Most non-technical people around me are extremely wary of robots, having been conditioned by decades of dystopian sci-fi to see them as threats to people and their humanity. Showing them @thelinestudio's "Dear Alice" animation often helps them see a more hopeful vision.
— Chris Offner (@chrisoffner3d)
9:14 AM • Nov 3, 2024

If you’re trying to get to inbox zero but you still want to read this later:

AI TAKES OVER POLISH STATE RADIO

Off Radio Kraków, a government-owned Polish radio station, has sparked nationwide controversy by becoming the first station to be entirely operated by AI hosts after firing its human journalists. The station introduced three AI presenters - Jakub "Kuba" Zieliński, Emilia "Emi" Nowak, and Alex Szulc - each with specialized content areas. While the station maintains the change was due to declining listenership rather than a direct AI replacement strategy, former journalist Mateusz Demski's open letter protesting the change quickly gathered 15,000 signatures, highlighting the contentious nature of the decision.

An AI-generated image of one of Off Radio Kraków’s new AI presenters Jakub “Kuba” Zieliński. Credit: Off Radio Kraków.

While AI presenters aren't entirely new to radio - with examples like RadioGPT in the UK (2020) and Radio City in India featuring an AI host named SIA - what sets the Polish case apart is both its timing and context. Unlike previous AI radio initiatives, Off Radio Kraków's decision was particularly controversial because it coincided with the termination of human staff and involved a state-funded broadcaster using taxpayer money. The situation has drawn attention from government officials, including Deputy Prime Minister Krzysztof Gawkowski, who warned about AI being used "against people rather than for them," making this case a stark example of how AI can disrupt employment even in creative industries traditionally thought to require human touch.

‘ZME Science’ Source

MACHINE MUSE: AI-DA’S $1 M PAINTING

A humanoid robot named Ai-Da has made history by creating a portrait of Alan Turing that sold for $1.084 million at Sotheby's auction house, far exceeding its initial estimate of $120,000-$180,000. This marks the first time an artwork created by a humanoid robot has been sold at auction.

Ai-Da Robot’s 'A.I. God' (on the right) is a large scale original portrait of Alan Turing

The artwork, titled "A.I. God," depicts Alan Turing, who is considered the father of artificial intelligence and played a crucial role in breaking Nazi codes during World War II. The piece was created through a complex process where:

Ai-Da made 15 individual paintings of different parts of Turing's face
The robot selected three portraits plus a painting of the Bombe machine
These were combined using AI and finished with a 3D textured printer
Ai-Da added final marks and textures to complete the work

Ai-Da Robot is the creation of Aidan Meller, who works with a team of about 30 people. The robot, designed to look like a woman with a bob haircut, is named after Ada Lovelace, the world's first computer programmer. According to Meller, the project aims to stimulate discussion about AI ethics and challenge our understanding of artistry in the age of artificial intelligence.

The piece was purchased by an anonymous American buyer after receiving 27 bids. The proceeds will be reinvested in improving Ai-Da's capabilities, with Meller noting that the robot is already on its third painting arm.

This sale represents a significant moment in the intersection of AI technology and the art market, though it's worth noting that AI-created art has sold at auction before, including a 2018 Christie's sale of an algorithm-generated painting for $432,500.

‘BBC’ Source

PUT THIS ON YOUR RADAR

CogVideoX v1.5: Advanced Open-Source Video Generation

Zhipu AI has released CogVideoX v1.5, an enhanced open-source video generation model that combines with their new CogSound audio model on the Qingying platform.

🆕 CogVideoX v1.5: Advanced Open-Source Video Generation
Zhipu AI has released CogVideoX v1.5, an enhanced open-source video generation model that combines with their new CogSound audio model on the Qingying platform.
⚡ Technical Specifications:
- Supports dual resolution… x.com/i/web/status/1…
— diffusion digest (@DigestDiff93383)
8:58 AM • Nov 8, 2024

Supports dual resolution modes: 4K/60FPS (10-second videos) and 768P/16FPS
Variable aspect ratio compatibility for any size
Multi-channel generation (4 videos per prompt)
Two specialized models: CogVideoX v1.5-5B and CogVideoX v1.5-5B-I2V
Enhanced image-to-video (I2V) conversion
Integrated AI-generated sound effects via CogSound
CogVLM2-caption for accurate content description and prompt understanding
3D VAE technology for improved content coherence
Advanced motion coherence and complex prompt processing

Hugging Face Model | Github Code

Krea AI Adds LoRA Training Feature

Krea AI has expanded its platform with a new LoRA (Low-Rank Adaptation) training feature, allowing users to create custom AI models from as few as three reference images, regardless of their computing capabilities.

finally, AI trainings arrived to Krea.
train an AI model and teach it about your own characters, styles, products, and more.
sharing 100 early invites below 👇
— KREA AI (@krea_ai)
1:11 PM • Nov 7, 2024

Training requires minimal setup - just upload images and adjust basic parameters
Supports both style and character training modes
Includes user-friendly interface with simplified parameter controls
Subscription priced at $10/month includes 720 Flux images and 36,000 real-time images
Commercial usage rights included with subscription

Source

Mochi Video Generation Achieves 6.8 Second Video on Consumer GPU

u/jonesaid has demonstrated Mochi's text-to-video AI model generating a high-quality 163-frame video (6.8 seconds) on a consumer-grade NVIDIA RTX 3060 12GB GPU, showing impressive temporal coherence despite hardware limitations.

Successfully generated 163 frames with good motion continuity using spatial tiling technique
Generation time: ~1 hour 17 minutes for sampling, 22 minutes for VAE decoding
Implements 16x8 tiling in VAE decode to handle memory constraints
Model uses FP8 scaling for efficient memory usage
Currently optimized for 480p output, with 720p model announced
Requires ComfyUI as interface with specific nodes for Mochi integration

Reddit Thread | MochiWrapper Github Code | Workflow

Regional Prompting Released for Flux Model

A new open-source tool enables regional prompting for the Flux image generation model, allowing users to specify different prompts for distinct areas within a single image, improving composition control and multi-character generation.

Provides precise control over image composition by assigning prompts to specific regions
Compatible with ControlNet and LoRA models
More flexible than traditional positioning prompts or ControlNet alone
Particularly useful for multi-character scenes and complex compositions
Community implementation for ComfyUI in development
Adjustable parameters to control region blending and boundaries

Reddit Thread | Github Code

DimensionX Lora for i2V - 3D Camera Movement from 2D Images

A new Lora model for CogVideo allows users to create smooth 3D camera orbits from single 2D images, enabling animated rotations while maintaining visual consistency. The implementation works within ComfyUI's workflow system.

Requires ComfyUI with CogVideo wrapper installed
Compatible with NVIDIA GPUs (specific features require 4090 or newer)
Processing time: 3-5 minutes on NVIDIA 4090
Includes interpolation options via GIMM-VFI for smoother results
Current release focuses on left orbit camera movement, with more camera moves planned

Reddit Thread | DimensionX GitHub Repository | ComfyUI-GIMM-VFI Repository | ComfyUI-CogVideoXWrapper

Google’s ReCapture: One-Click Generation of 'Multi-Camera' Video Blockbusters

Google Research has recently introduced a technology called ReCapture, which allows users to re-experience their own videos from entirely new perspectives. This innovative technology generates a customized camera trajectory version of the provided video, enabling viewers to observe the content from angles not originally present in the footage while preserving the original motion of the characters and scenes.

↪️ Google’s ReCapture: One-Click Generation of 'Multi-Camera' Video Blockbusters
Google Research has recently introduced a technology called ReCapture, which allows users to re-experience their own videos from entirely new perspectives. This innovative technology generates a… x.com/i/web/status/1…
— diffusion digest (@DigestDiff93383)
8:37 AM • Nov 8, 2024

Transforms existing videos with new camera angles without reshooting
Maintains temporal consistency and motion quality of original footage
Works with common video types - no special filming requirements
Combines AI diffusion models with advanced video refinement
Real-time preview and adjustment of camera trajectories

Project Link

Free FLUX.1-schnell Frontend on Render-OS

A new web interface for FLUX.1-schnell model that utilizes Hugging Face's API, allowing users to generate AI images through a simplified interface. The service now supports user-provided Hugging Face tokens for personal usage limits.

Uses Hugging Face's free API for FLUX.1-schnell model
Requires personal Hugging Face token for best results
Limits: Up to 1,000 images per day and 300 per hour with personal token
Simple web interface for easier access than direct API usage
Planned features include image-to-image, image-to-video, and sketch-to-image
All data stored locally in browser, no external database

Reddit Thread | Website | FLUX.1-schnell Model

FLUX 1.1 Pro Adds Ultra and Raw Modes

Black Forest Labs has released new modes for FLUX 1.1 Pro, their advanced text-to-image model. The update includes Ultra and Raw modes, with improved prompt adherence compared to previous versions, particularly at higher CFG values.

Available through API services like fal.ai and Replicate
Demonstrates better prompt following than development version
No local installation option available
Accessible through Synthopic platform
Higher CFG settings possible compared to standard version
Currently focused on API improvements rather than open releases

Blog Post

ComfyUI Custom Nodes: Depth-Aware Particle Simulations Released

A new suite of custom nodes for ComfyUI that enables depth-aware particle simulations for AI-generated visuals. The package includes particle simulators and visualization tools with depth perception capabilities.

Open source custom node suite for ComfyUI
Features particle simulation with depth awareness
Includes multiple visualization setups
Complete with workflow examples and tutorial
Compatible with future text-to-video implementations

Reddit Thread | GitHub Repository | CivitAI Model | Tutorial Video

Fish Agent V0.1 3B - Open Source Real-Time Voice Cloning Model

Fish Audio has released a new text-to-speech model capable of instant voice cloning and multilingual speech generation. Using a "semantic-free token" architecture and trained on 200 billion voice and text tokens, the model achieves remarkably fast text-to-audio conversion in just 200ms.

🐟 Fish Agent V0.1 3B - Open Source Real-Time Voice Cloning Model
Fish Audio has released a new text-to-speech model capable of instant voice cloning and multilingual speech generation. Using a "semantic-free token" architecture and trained on 200 billion voice and text tokens,… x.com/i/web/status/1…
— diffusion digest (@DigestDiff93383)
9:10 AM • Nov 5, 2024

Supports 8 languages including English, Chinese, German, Japanese, French, Spanish, Korean, and Arabic
Zero-shot voice cloning capability (no training needed)
Lightweight 3B parameter model makes it developer-friendly
Built on Qwen-2.5-3B-Instruct base model
Open-sourced with demo available on Hugging Face

GitHub Repository | Live Demo | Model Download | Technical Paper

ComfyAI.run - Convert ComfyUI Workflows into Web Applications

A new cloud service that allows users to convert ComfyUI workflows into hosted web applications, enabling 24/7 accessibility and scalability without requiring local setup.

Cloud-based service, no local ComfyUI instance required
Free tier includes 72-hour file storage
Example applications include professional headshot and superhero photo makers
Built using Next.js framework
Planned features include API access and custom domain support
Premium accounts available with extended file storage options

Reddit Thread | Tutorial | Example Workflow | Example App

Looking to elevate your brand with a custom hat?

Don’t make it just any hat. Make it a Branded Bills hat.

Why? They make quality merch (especially hats) that people actually want to wear.

Don’t just take it from us, here is what one of their recent clients has to say:

“I think Branded Bills is memorable because it is more of a lifestyle brand for us. It’s not just employees wearing them to work or technicians wearing them out in the field; they're wearing them to bowling alleys, sporting events, etc, and they’re repping our brand which is exactly what we want.” Rebecca Ferguson, Empire CAT

Want some more? Get $200 off your first order over $1000 with the code NEWSLETTER.

If this is what you want (and we’re guessing you do), hit up Branded Bills .

Reply

or to participate.