• Diffusion Digest
  • Posts
  • $1 M AI Painting, AI Radio Takeover, Open Source Voice Clone | This Week in AI Art 🎙️

$1 M AI Painting, AI Radio Takeover, Open Source Voice Clone | This Week in AI Art 🎙️

Cut through the noise, stay informed — new stories every Sunday.

Interesting find of the week. If you haven't seen The Line Studio's "Dear Alice" animation, I'd highly recommend it - it's a charming short that reimagines robots as partners in creating a better future, rather than the threats we often see in sci-fi.

In this issue:

If you’re trying to get to inbox zero but you still want to read this later:

AI TAKES OVER POLISH STATE RADIO

Off Radio Kraków, a government-owned Polish radio station, has sparked nationwide controversy by becoming the first station to be entirely operated by AI hosts after firing its human journalists. The station introduced three AI presenters - Jakub "Kuba" Zieliński, Emilia "Emi" Nowak, and Alex Szulc - each with specialized content areas. While the station maintains the change was due to declining listenership rather than a direct AI replacement strategy, former journalist Mateusz Demski's open letter protesting the change quickly gathered 15,000 signatures, highlighting the contentious nature of the decision.

An AI-generated image of one of Off Radio Kraków’s new AI presenters Jakub “Kuba” Zieliński. Credit: Off Radio Kraków.

While AI presenters aren't entirely new to radio - with examples like RadioGPT in the UK (2020) and Radio City in India featuring an AI host named SIA - what sets the Polish case apart is both its timing and context. Unlike previous AI radio initiatives, Off Radio Kraków's decision was particularly controversial because it coincided with the termination of human staff and involved a state-funded broadcaster using taxpayer money. The situation has drawn attention from government officials, including Deputy Prime Minister Krzysztof Gawkowski, who warned about AI being used "against people rather than for them," making this case a stark example of how AI can disrupt employment even in creative industries traditionally thought to require human touch.

MACHINE MUSE: AI-DA’S $1 M PAINTING

A humanoid robot named Ai-Da has made history by creating a portrait of Alan Turing that sold for $1.084 million at Sotheby's auction house, far exceeding its initial estimate of $120,000-$180,000. This marks the first time an artwork created by a humanoid robot has been sold at auction.

Ai-Da Robot’s 'A.I. God' (on the right) is a large scale original portrait of Alan Turing

The artwork, titled "A.I. God," depicts Alan Turing, who is considered the father of artificial intelligence and played a crucial role in breaking Nazi codes during World War II. The piece was created through a complex process where:

  • Ai-Da made 15 individual paintings of different parts of Turing's face

  • The robot selected three portraits plus a painting of the Bombe machine

  • These were combined using AI and finished with a 3D textured printer

  • Ai-Da added final marks and textures to complete the work

Ai-Da Robot is the creation of Aidan Meller, who works with a team of about 30 people. The robot, designed to look like a woman with a bob haircut, is named after Ada Lovelace, the world's first computer programmer. According to Meller, the project aims to stimulate discussion about AI ethics and challenge our understanding of artistry in the age of artificial intelligence.

The piece was purchased by an anonymous American buyer after receiving 27 bids. The proceeds will be reinvested in improving Ai-Da's capabilities, with Meller noting that the robot is already on its third painting arm.

This sale represents a significant moment in the intersection of AI technology and the art market, though it's worth noting that AI-created art has sold at auction before, including a 2018 Christie's sale of an algorithm-generated painting for $432,500.

PUT THIS ON YOUR RADAR

CogVideoX v1.5: Advanced Open-Source Video Generation

Zhipu AI has released CogVideoX v1.5, an enhanced open-source video generation model that combines with their new CogSound audio model on the Qingying platform.

  • Supports dual resolution modes: 4K/60FPS (10-second videos) and 768P/16FPS

  • Variable aspect ratio compatibility for any size

  • Multi-channel generation (4 videos per prompt)

  • Two specialized models: CogVideoX v1.5-5B and CogVideoX v1.5-5B-I2V

  • Enhanced image-to-video (I2V) conversion

  • Integrated AI-generated sound effects via CogSound

  • CogVLM2-caption for accurate content description and prompt understanding

  • 3D VAE technology for improved content coherence

  • Advanced motion coherence and complex prompt processing

Krea AI Adds LoRA Training Feature

Krea AI has expanded its platform with a new LoRA (Low-Rank Adaptation) training feature, allowing users to create custom AI models from as few as three reference images, regardless of their computing capabilities.

  • Training requires minimal setup - just upload images and adjust basic parameters

  • Supports both style and character training modes

  • Includes user-friendly interface with simplified parameter controls

  • Subscription priced at $10/month includes 720 Flux images and 36,000 real-time images

  • Commercial usage rights included with subscription

Mochi Video Generation Achieves 6.8 Second Video on Consumer GPU

u/jonesaid has demonstrated Mochi's text-to-video AI model generating a high-quality 163-frame video (6.8 seconds) on a consumer-grade NVIDIA RTX 3060 12GB GPU, showing impressive temporal coherence despite hardware limitations.

  • Successfully generated 163 frames with good motion continuity using spatial tiling technique

  • Generation time: ~1 hour 17 minutes for sampling, 22 minutes for VAE decoding

  • Implements 16x8 tiling in VAE decode to handle memory constraints

  • Model uses FP8 scaling for efficient memory usage

  • Currently optimized for 480p output, with 720p model announced

  • Requires ComfyUI as interface with specific nodes for Mochi integration

Regional Prompting Released for Flux Model

A new open-source tool enables regional prompting for the Flux image generation model, allowing users to specify different prompts for distinct areas within a single image, improving composition control and multi-character generation.

  • Provides precise control over image composition by assigning prompts to specific regions

  • Compatible with ControlNet and LoRA models

  • More flexible than traditional positioning prompts or ControlNet alone

  • Particularly useful for multi-character scenes and complex compositions

  • Community implementation for ComfyUI in development

  • Adjustable parameters to control region blending and boundaries

DimensionX Lora for i2V - 3D Camera Movement from 2D Images

A new Lora model for CogVideo allows users to create smooth 3D camera orbits from single 2D images, enabling animated rotations while maintaining visual consistency. The implementation works within ComfyUI's workflow system.

  • Requires ComfyUI with CogVideo wrapper installed

  • Compatible with NVIDIA GPUs (specific features require 4090 or newer)

  • Processing time: 3-5 minutes on NVIDIA 4090

  • Includes interpolation options via GIMM-VFI for smoother results

  • Current release focuses on left orbit camera movement, with more camera moves planned

Google’s ReCapture: One-Click Generation of 'Multi-Camera' Video Blockbusters

Google Research has recently introduced a technology called ReCapture, which allows users to re-experience their own videos from entirely new perspectives. This innovative technology generates a customized camera trajectory version of the provided video, enabling viewers to observe the content from angles not originally present in the footage while preserving the original motion of the characters and scenes.

  • Transforms existing videos with new camera angles without reshooting

  • Maintains temporal consistency and motion quality of original footage

  • Works with common video types - no special filming requirements

  • Combines AI diffusion models with advanced video refinement

  • Real-time preview and adjustment of camera trajectories

Free FLUX.1-schnell Frontend on Render-OS

A new web interface for FLUX.1-schnell model that utilizes Hugging Face's API, allowing users to generate AI images through a simplified interface. The service now supports user-provided Hugging Face tokens for personal usage limits.

  • Uses Hugging Face's free API for FLUX.1-schnell model

  • Requires personal Hugging Face token for best results

  • Limits: Up to 1,000 images per day and 300 per hour with personal token

  • Simple web interface for easier access than direct API usage

  • Planned features include image-to-image, image-to-video, and sketch-to-image

  • All data stored locally in browser, no external database

FLUX 1.1 Pro Adds Ultra and Raw Modes

Black Forest Labs has released new modes for FLUX 1.1 Pro, their advanced text-to-image model. The update includes Ultra and Raw modes, with improved prompt adherence compared to previous versions, particularly at higher CFG values.

  • Available through API services like fal.ai and Replicate

  • Demonstrates better prompt following than development version

  • No local installation option available

  • Accessible through Synthopic platform

  • Higher CFG settings possible compared to standard version

  • Currently focused on API improvements rather than open releases

ComfyUI Custom Nodes: Depth-Aware Particle Simulations Released

A new suite of custom nodes for ComfyUI that enables depth-aware particle simulations for AI-generated visuals. The package includes particle simulators and visualization tools with depth perception capabilities.

  • Open source custom node suite for ComfyUI

  • Features particle simulation with depth awareness

  • Includes multiple visualization setups

  • Complete with workflow examples and tutorial

  • Compatible with future text-to-video implementations

Fish Agent V0.1 3B - Open Source Real-Time Voice Cloning Model

Fish Audio has released a new text-to-speech model capable of instant voice cloning and multilingual speech generation. Using a "semantic-free token" architecture and trained on 200 billion voice and text tokens, the model achieves remarkably fast text-to-audio conversion in just 200ms.

  • Supports 8 languages including English, Chinese, German, Japanese, French, Spanish, Korean, and Arabic

  • Zero-shot voice cloning capability (no training needed)

  • Lightweight 3B parameter model makes it developer-friendly

  • Built on Qwen-2.5-3B-Instruct base model

  • Open-sourced with demo available on Hugging Face

ComfyAI.run - Convert ComfyUI Workflows into Web Applications

A new cloud service that allows users to convert ComfyUI workflows into hosted web applications, enabling 24/7 accessibility and scalability without requiring local setup.

  • Cloud-based service, no local ComfyUI instance required

  • Free tier includes 72-hour file storage

  • Example applications include professional headshot and superhero photo makers

  • Built using Next.js framework

  • Planned features include API access and custom domain support

  • Premium accounts available with extended file storage options

Looking to elevate your brand with a custom hat?

Don’t make it just any hat. Make it a Branded Bills hat.

Why? They make quality merch (especially hats) that people actually want to wear.

Don’t just take it from us, here is what one of their recent clients has to say:

“I think Branded Bills is memorable because it is more of a lifestyle brand for us. It’s not just employees wearing them to work or technicians wearing them out in the field; they're wearing them to bowling alleys, sporting events, etc, and they’re repping our brand which is exactly what we want.” Rebecca Ferguson, Empire CAT

Want some more? Get $200 off your first order over $1000 with the code NEWSLETTER.

If this is what you want (and we’re guessing you do), hit up Branded Bills.

Reply

or to participate.