• Diffusion Digest
  • Posts
  • AI Video Editing, Zero-Shot Voice Cloning, Custom Node for Emotion Analysis | This Week in AI Art 🫠

AI Video Editing, Zero-Shot Voice Cloning, Custom Node for Emotion Analysis | This Week in AI Art 🫠

Cut through the noise, stay informed — new stories every Sunday.

Interesting find of the week. Check out Kat, an engineer who built a tool to visualize time-based media with gestures.

In this issue:

If you’re trying to get to inbox zero but you still want to read this later:

FLUX UPDATES

u/camenduru shared a demonstration of ControlNet Outpainting using FLUX.1 Dev in ComfyUI. The post showcases a video of the outpainting process, which expands images beyond their original boundaries. The technique utilizes FLUX Controlnet Inpainting Alpha, with workflows provided for implementation. The post also includes information on running the setup on RunPod. This outpainting method aims to seamlessly extend images, though some users noted visible seams at the boundaries of the original and generated content.

u/Current_Wind_2667 reports that Flux fine-tuning can now be performed with 10GB of VRAM, a significant reduction in hardware requirements. This advancement in Stable Diffusion model training potentially makes fine-tuning more accessible to users with mid-range GPUs. The post links to a GitHub pull request that contains more technical details about this development. However, it's important to note that while the VRAM requirement has decreased, the training process may still be time-consuming, potentially taking hours to complete.

u/lazyspock reports that using the Flux-Dev-Q5_1.gguf quantized model significantly improves performance on GPUs with 12GB VRAM, such as the NVIDIA RTX 3060. This version allows for faster image generation, even with multiple LoRAs loaded, and fits entirely in VRAM without requiring model reloads. The post claims there are no noticeable quality differences compared to the original FP16 model while offering substantial speed improvements and reduced system resource usage. The author links to a previous post that provides more detailed comparisons between different quantized versions. Other users in the comments suggest trying Q5_K_S or Q8 versions for potentially better performance, depending on the specific hardware configuration. The post highlights the trade-offs between model size, generation speed, and output quality, encouraging users to experiment with different quantized versions to find the optimal balance for their setup.

u/CliffDeNardo shared information about new Controlnet models, including an upscaler model, for image enhancement for Flux. The post highlights three models: a depth model, an upscaler, and a surface normals model. The upscaler model, in particular, shows promise for improving image quality, especially for noisy images. Users reported successful implementation in ComfyUI, a popular Stable Diffusion interface, by loading the control image, resizing it (typically 4x), and applying the Controlnet while generating at the upscaled resolution. The technique appears to work best on noisy images, potentially introducing unnecessary sharpness to normal images. Implementation details and performance vary based on hardware, with reports of processing times ranging from 77 to 115 seconds for different image sizes on a 4060ti GPU.

Surface Model

Upscaler Model

Depth Model

u/zer0int1 has developed fine-tuned versions of CLIP-L and Long-CLIP models that are now fully integrated with the HuggingFace Diffusers pipeline. These models aim to improve text-to-image generation, particularly for longer prompts. The Long-CLIP variant has an extended token limit of 248, compared to the standard 77, allowing for better handling of longer text inputs. The author provides links to the models and an example script for using these models with OpenAI and Flux.1. The integration with HuggingFace Diffusers aims to improve compatibility across various AI platforms and simplify implementation. The author also notes that for very long prompts, it may be beneficial to separate text descriptions between CLIP and T5 encoders to achieve better results.

JAMES CAMERON JOINS FORCES WITH STABILITY.AI

James Cameron, the renowned filmmaker behind blockbusters like Avatar and Titanic, has joined Stability AI's Board of Directors. This move brings together Cameron's expertise in merging cutting-edge technology with storytelling and Stability AI's position as a leader in generative AI.

Stability AI CEO Prem Akkaraju stated, "James Cameron lives in the future and waits for the rest of us to catch up." Cameron himself expressed excitement about the potential of AI in filmmaking, saying, "The intersection of generative AI and CGI image creation is the next wave."

This appointment comes at a time when the film industry is increasingly exploring AI collaborations. For instance, Lionsgate recently partnered with Runway to develop an AI model trained on its catalog, while Sony Pictures Entertainment plans to use AI for cost savings.

This partnership could significantly impact the film industry. As one community membre noted, "This gives Stability legitimacy in a world that is extremely skeptical about what they are trying to do. And for Cameron, there's no way he doesn't see this as the future of Hollywood and entertainment in general." The potential for AI to drastically reduce production costs is immense, with one commenter speculating, "Imagine creating Avatar using AI and cut cost from 200 millions to 200 thousands dollars."

However, some in the AI community express concern that Stability AI may be moving away from its open-source roots, potentially limiting broader access to future developments. As one commenter put it, "The only good thing might be adoption of new tools by the industry and maybe an awesome movie."

This collaboration between a Hollywood legend and a leading AI company signals a significant step towards integrating AI more deeply into the creative and visual media industries, though its full impact remains to be seen.

PUT THIS ON YOUR RADAR

Subscribe to keep reading

This content is free, but you must be subscribed to Diffusion Digest to continue reading.

Already a subscriber?Sign In.Not now

Reply

or to participate.