• Diffusion Digest
  • Posts
  • AI in Filmmaking, X's txt2img (Flux.1 PRO), and FLUX Updates | This Week in AI Art 🎬

AI in Filmmaking, X's txt2img (Flux.1 PRO), and FLUX Updates | This Week in AI Art 🎬

Cut through the noise, stay informed — new stories every Sunday.

Let’s organize the major Flux updates into their own section, and have the LoRA, and other minor updates in the ‘Put This On Your Radar’ section. And, well, the other stuff - you can find online.

In this issue:

If you’re trying to get to inbox zero but you still want to read this later:

FLUX UPDATES

Here come the updates…

Low VRAM Flux: 3-4GB GPU technique

  • A technique that allows users to run Flux on cards with as little as 3-4GB of VRAM.

    • Software: Use SD-FORGE WebUI 

    • Model: Use NF4 (4-bit quantized) version of Flux

    • NVIDIA settings: Enable "Prefer system fallback" for CUDA

    • Resolution: Start with 512x512 or 512x768

    • Steps: Use fewer steps (15-20) to reduce generation time

    • Drivers: Install latest NVIDIA drivers and CUDA toolkit

GGUF quantization: Effective Flux compression

  • GGUF quantization, commonly used for large language models, has been successfully applied to Flux, a transformer-based image generation model. This technique allows for significant model compression with minimal quality loss. Initial tests suggest that Q8_0 quantization offers quality closer to fp16 than fp8, while Q4_0 outperforms nf4, potentially providing better options for users with limited VRAM. The GGUF quants are available here, and the respective nodes to load them can be found here.

NF4 Flux v2: Improved quantization and efficiency

  • A refined version (v2) of the NF4 Flux via lllyasviel, featuring improved quantization, higher precision, and reduced computational overhead. It is available on CivitAI, and Hugging Face.

Union controlnet: Multi-mode FLUX.1 control

  • An alpha version of a union controlnet for the FLUX.1 dev model has been released by InstantX. This union controlnet combines multiple control modes - including canny, tile, depth, blur, pose, gray, and low quality - into a single model. The model's size is 7.3GB, which may require more powerful GPUs for optimal performance. The controlnet can be accessed via Hugging Face. The comfy wrapper is also available, see this reddit comment for instructions.

via Instant X

X-Labs LoRAs: Six new FLUX.1 style adaptations

  • X-Labs has released six new Low-Rank Adaptation (LoRA) models for the FLUX.1-dev text-to-image model, covering styles such as furry, anime, Disney, scenery, and art. These LoRAs are distributed under a non-commercial license. The models can be accessed via Hugging Face, and the comfy wrapper can be found here.

via X-Labs

Civitai: Flux LoRA training now available

  • Civitai, a popular platform for AI image generation resources, has recently introduced support for Flux LoRA training. This new feature allows users to train LoRAs (Low-Rank Adaptations) for the Flux model directly on the Civitai website. The platform now offers two training engine options: Kohya, which is set as the default, and X-Flux. Users can choose between these engines when initiating a training session. The cost for training a Flux LoRA on Civitai is currently set at 2000 buzz, which is equivalent to approximately $2 USD.

HOW TO: Flux Realism

  • FLUXRealisticV1 is a new checkpoint for the FLUX model, trained on over 7,000 images to create more diverse and realistic people and scenes. It demonstrates improvements in anatomy, facial features, overall realism, and scene composition compared to the base FLUX model. The checkpoint includes pre-baked CLIP and VAE components. The model tends to produce images with flatter lighting and more muted colors, which some users perceive as more realistic while others view as less vibrant compared to the original FLUX model. The checkpoint can be found here.

AI Tools in Filmmaking

Let’s take a breather from Flux.

At SIGGRAPH 2024 in Denver, film and VFX experts gathered to discuss the current limitations and future potential of AI in filmmaking. We explore key insights from Nikola Todorovic (Wonder Dynamics), Freddy Chavez Olmos (Boxel Studio), and Michael Black (Max Planck Institute) on how AI tools are shaping movie production. From debunking misconceptions about "one-click solutions" to envisioning entirely new genres of entertainment, the panel offers a balanced view of AI's role in cinema, its potential to democratize the industry, and the enduring importance of traditional storytelling elements.

Credit: Devin Coldewey

So, I keep hearing about AI in movies. What's really going on with AI in filmmaking right now?

AI tools in filmmaking are still in their infancy, lacking the ability to produce final VFX shots or complete films with a single click. Nikola Todorovic of Wonder Dynamics points out a common "misperception of AI that it's a one-click solution," highlighting that editability remains crucial. Language-based prompts for AI systems struggle to fully capture complex visual concepts, with Michael Black from the Max Planck Institute noting that "humans actually have inside them a generative model of behavior" that's difficult to translate into words.

Looking to the future, Black predicts a potentially radical shift: "The real revolution... is we're going to see an entirely new genre of entertainment." He envisions a blend of film, video games, and interactive storytelling, suggesting that AI could fundamentally alter how we create and consume media.

Interesting. Can AI actually replace human creativity in making films? What are the experts saying?

AI may lower barriers to entry in filmmaking, allowing participation from those outside traditional hubs. Freddy Chavez Olmos from Boxel Studio, who left Mexico due to limited opportunities, sees AI as potentially providing "that same opportunity for people who don't need to go overseas to do it."

However, this democratization comes with challenges. Michael Black cautions that while AI tools may be widely accessible, "The number of people making really good films is still going to be small." There's also concern about an initial "uncanny valley" effect with AI-generated films. Chavez Olmos predicts a reaction similar to early CGI films like "Final Fantasy" or "The Polar Express," where "something's going to be not quite there yet, but people are going to start accepting these films."

This all sounds pretty high-tech. Does this mean only big studios with deep pockets can use AI in filmmaking?

Despite technological advancements, core elements of traditional filmmaking are expected to retain their importance. Michael Black emphasizes the enduring value of storytelling: "It's all about story. It's all about connecting to the characters. It's about heart." He argues that if a movie has heart, audiences will connect regardless of whether the characters are AI-generated or human.

Human actors are also likely to remain significant. Black notes, "There's an excitement to knowing it's real humans like us, but like way better than us, to see a human at the peak of their game, it inspires all of us, and I don't think that's going to go away."

AI-Powered Brain Implant Restores Hope for ALS Patient

Here’s our feel good story of the week.

Using AI and brain implants to restore speech in ALS patients, researchers at the University of California, Davis, have made a breakthrough. Casey Harrell, a 46-year-old climate activist with ALS, received an experimental brain implant that has allowed him to regain the ability to communicate using a computer-generated version of his own voice. This technology has enabled Harrell to tell his 5-year-old daughter he loves her, banter with his wife, work more productively, and even express complex thoughts using a vocabulary of nearly 6,000 unique words.

What are the key technological advancements and results described in this article about using AI and brain implants for speech restoration?

Researchers at the University of California, Davis, implanted four electrode arrays into Casey Harrell's brain, each with 64 spikes to detect neural signals related to speech. This system, combined with AI similar to language models like ChatGPT, achieved high accuracy in interpreting Harrell's attempted speech. The device demonstrated 99.6% accuracy for a 50-word vocabulary three weeks after surgery, then expanded to recognize a 125,000-word vocabulary with 90% accuracy. Over eight months, it sustained 97.5% accuracy across nearly 6,000 unique words. The system also recreated Harrell's pre-ALS voice using old recordings. Dr. Sergey Stavisky highlighted the key innovation as placing more arrays with precise targeting into the most speech-related parts of the brain.

How has this technology impacted the patient's life, and what are its current limitations?

The brain implant has significantly improved Casey Harrell's quality of life, enhancing his ability to communicate with family and friends. He can now express complex thoughts and emotions, tell his daughter he loves her, and work more productively. The technology has allowed him to reconnect with old friends and express himself more fully in social situations. However, limitations exist. It's unclear whether this technology would be as effective for people with more severe paralysis than Harrell. Additionally, there are significant financial barriers to accessing such advanced treatments, highlighting the challenging financial realities faced by ALS patients.

This is a big deal. Beyond helping this one person, what could this mean for healthcare and society in general?

This research demonstrates the potential of brain-computer interfaces to restore lost functions, particularly speech, for individuals with neurological conditions. It highlights rapid advancements in technology that seemed like science fiction just years ago. However, the case also underscores significant societal challenges, including accessibility and equity in healthcare. The financial burdens associated with ALS care raise questions about access to cutting-edge treatments. The research also prompts ethical considerations about the integration of AI and brain-computer interfaces in healthcare, including data privacy, long-term effects of brain implants, and the societal implications of such technologies becoming more widespread.

X Sparks Debate with Unrestricted AI Image Generator

Is anyone really surprised? This is Elon Musk we are talking about.

X's new Grok chatbot feature allowing Premium subscribers to generate and publish AI-created images with minimal restrictions has ignited controversy. While media outlets like The Verge raise concerns about potential misuse, many in the tech community view it as an inevitable advancement in AI technology. This development highlights the growing tension between content moderation, free speech, and the rapid evolution of AI capabilities. As we examine this story, we'll explore the differing perspectives and consider the broader implications for digital media and society.

Credit: Unknown

I heard X launched some new AI image tool. What's the deal with that?

The launch of an image generator feature for X's Grok chatbot, available to X Premium subscribers. This new tool allows users to create images from text prompts and publish them directly to X using the Flux.1 PRO model. Importantly, the image generator appears to have very few content restrictions, allowing users to create controversial and potentially offensive images, including those depicting violence, nudity, and political figures in compromising situations.

The community reveals that many users actually support this lack of restrictions. As one user puts it, "This is exactly what freedom looks like" (u/Olympus____Mons). Many community members express skepticism towards the media's framing of this as a problem, with one stating, "Only clowns want censorship. Let Grok be free" (u/nemoj_biti_budala).

Hmm, that sounds like it could be controversial. Are people worried about this? What are the main concerns?

The primary concern is the apparent lack of content restrictions on Grok's image generator. Unlike other major AI image generators, Grok seems to have few safeguards against creating controversial or potentially offensive content.

The community reveals a different perspective on these concerns. Many users argue that unrestricted AI image generation is inevitable and attempts to censor it are futile. As one commenter notes, "Self-hosted AI isn't really something you can stop" (u/Not_a_housing_issue). Several community members also point out that similar capabilities already exist in open-source models that can be run locally.

So what does this mean for the future? How might this change things on social media and beyond?

The community highlights several key implications of Grok's unrestricted image generation capability. Many users believe this will accelerate the need for media literacy, with u/moru0011 suggesting that people will need to become more skeptical of digital content overall. This shift could lead to what u/Specialist-Roof3381 describes as a "zero trust media model" - a potentially uncomfortable but necessary adaptation to the new reality of AI-generated content.

There's also concern among some community members that overreaction to this development could result in excessive regulation of AI technology. This ties into an ongoing debate about balancing free speech with the potential harm from misuse of the technology. As u/Bankbox007 puts it, "This feels like a sex ed vs abstinence thing. People need to learn what this tech is capable of, and trying to stifle it is only going to make people naive."

From a regulatory perspective, X is already under investigation by the European Commission for potential violations of the Digital Safety Act. The introduction of this new, unrestricted image generation feature could intensify that scrutiny.

Put This On Your Radar

VFusion3D: 3D Asset Generation from Single Image

VFusion3D (via Meta) is a new method for building scalable 3D generative models using pre-trained video diffusion models - it can generate 3D assets from a single image in seconds.

Google's Imagen 3: Advanced Text-to-Image AI

Google has released Imagen 3, its advanced text-to-image AI model. Google claims Imagen 3 sets new standards for image quality and detail, outperforming DALL-E 3 and Midjourney V6 in internal evaluations.

Google Imagen 3

Personalized LoRA Model Training with Flux.1-dev

u/appenz trained a personalized LoRA model using the Flux.1-dev base model. He used Replicate's cloud service for training, which cost about $6.25 for 75 minutes on an A100 GPU.

  • Key training parameters:

    • Used 20 training images (fewer images worked better than more)

    • 2,000 training steps

    • Learning rate of 0.0004

    • Images resized to 1024x1024

"Manual" App: Open-Source UI for ComfyUI

Yoel Gambera released version 1.0.0 of their application called "Manual" as open source software. The application is described as an advanced UI that uses ComfyUI as a backend for AI image generation.

SimpleTuner v0.9.8.1: Enhanced AI Model Fine-Tuning

A new version (v0.9.8.1) of SimpleTuner, a tool for fine-tuning AI models, has been released. This version produces high-quality results when fine-tuning Flux-dev models, particularly for creating LoRA (Low-Rank Adaptation) models.

  • Better preservation of Flux’s distillation capabilities

  • Ability to train multiple subjects into a single LoRA

  • Improved compatibility with inference platforms like AUTOMATIC1111/stable-diffusion-webui

Flux LoRA: RPG v6
Flux LoRA: Flat Color Anime v3.1
AuraFlow-v0.3 Release
Flux LoRA: Aesthetic LoRA for FLUX
Flux LoRA: Impressionist Landscape

If you're still here, consider sharing this newsletter.

And I'm curious, where you do stand on AI development?

Login or Subscribe to participate in polls.

Reply

or to participate.