• Diffusion Digest
  • Posts
  • FLUX, Deepfake Porn Bill, OpenAI VOICE MODE, Midjourney v6.1 | This Week In AI Art 🎨

FLUX, Deepfake Porn Bill, OpenAI VOICE MODE, Midjourney v6.1 | This Week In AI Art 🎨

Cut through the noise, stay informed — new stories every Sunday.

SO.MUCH.TO.COVER. The email may have been clipped depending on your email provider, so make sure you either 'read online' or 'click more' at the bottom if needed. There's also a 'listen online' feature now - great for multitasking or testing your tolerance for synthetic voices. That's it for this week, see you guys next Sunday. I'm going to bed (ᴗ˳ᴗ)…zzZ.

🖼️ Flux: New Open-Source AI Image Generator Rivals Industry Giants

Women lounging on lawns, hands that don't look like mutant crabs. FLUX has entered the chat. 

In a significant leap forward for open-source AI, Black Forest Labs has unveiled Flux - a groundbreaking text-to-image model boasting an impressive 12 billion parameters. FLUX.1 models use a hybrid architecture of multimodal and parallel diffusion transformer blocks. This new contender in the AI image generation arena is already turning heads, with many users hailing it as a worthy rival to industry giants like DALL-E 3 and Midjourney.

  • FLUX.1 [pro]: Top-tier model, offering state-of-the-art performance in image generation. Available via API, Replicate, and Fal.ai.

  • FLUX.1 [dev]: Open-weight, guidance-distilled model for non-commercial applications. Available on HuggingFace, Replicate, and Fal.ai.

  • FLUX.1 [schnell]: Fastest model, designed for local development and personal use. Openly available under an Apache 2.0 license.

This level of quality doesn't come cheap in terms of computational resources, though; the model requires about 24GB of VRAM to run locally. However, the community has quickly rallied to make Flux more accessible to users with less powerful hardware. Two primary methodologies have emerged for running Flux locally on lower-end hardware:

  1. [12GB VRAM Approach] This method, shared by Far_Insurance4191 enables running the Flux AI model on graphics cards with 12GB VRAM. It utilizes ComfyUI's built-in low VRAM mode and compensates for VRAM limitations by leveraging system RAM. While more accessible to average users, this approach may result in slower generation times. Implementation involves using the ComfyUI interface with specific settings and file placements.

  2. [16GB VRAM Approach] AmericanPresidentJimmyCarter shared a Python script enabling Flux to run on GPUs with at least 16GB VRAM. This advanced technique employs 8-bit quantization and model freezing using the optimum-quanto library. While this approach potentially offers better performance, it requires more technical expertise to implement compared to standard methods.

For those without access to such hardware, the Flux API presents an attractive alternative. At $0.025 per image, it undercuts the pricing of DALL-E 3.

Looking ahead, the AI community is buzzing with anticipation about Flux's potential for further development. There's particular excitement about the possibility of integrating ControlNet and other enhancements, which could push the boundaries of what's possible with open-source image generation even further. However, the non-commercial license of the dev version may limit community contributions in terms of fine-tuning and creating extensive libraries of LoRAs, as seen with previous models.

🎮 AI in Game Development: 'Echoes of Somewhere' Pushes Boundaries

In the ever-evolving landscape of game development, an intriguing experiment is unfolding. Jussi Kemppainen has embarked on a journey to create a full game using AI-generated assets. His project, aptly named "Echoes of Somewhere," serves as a compelling case study for the potential of AI in indie game creation. Take a look.

"Echoes of Somewhere" is an experimental 2.5D point-and-click adventure game anthology series.

Kemppainen's adventure began as a hobby project, but he found himself repeatedly stalled by the time-consuming nature of asset creation. The emergence of AI tools like DALL-E 2 and Midjourney in 2021 and 2022 offered a glimmer of hope. Initially viewing these as potential time-savers, Kemppainen soon realized their transformative potential. A Christmas holiday prototype featuring AI-generated characters and backgrounds went viral, cementing his decision to explore AI integration in game development as a public experiment.

Interestingly, the limitations of early AI tools shaped the game's setting. The AI struggled with contemporary scenes but excelled at creating believable sci-fi environments, naturally steering the game towards a futuristic aesthetic. This symbiosis between AI capabilities and creative direction became a recurring theme in Kemppainen's development process.

The journey wasn't without its challenges. There were instances where AI fell short, such as failing to produce a specific photo booth design. In such cases, Kemppainen resorted to manual creation, spending hours in Photoshop to achieve the desired result. This highlights an important aspect of AI-assisted development: it's not about completely replacing human creativity, but rather augmenting and streamlining the creative process.

Looking at the broader implications, Kemppainen believes AI has the potential to level the playing field between small indie developers and large studios. By making high-quality content production more affordable and accessible, AI could democratize game development, allowing small teams to create content that previously required substantial budgets and resources.

As for the future, Kemppainen predicts AI will become an integral part of game development across the industry. While this could lead to an influx of low-quality content, it also promises to uncover hidden gems and enable more creators to bring their visions to life.

🤖 Elon Musk Shares Deepfake Video of VP Harris, Sparking Controversy

The intersection of AI-generated art and politics has taken center stage this week, as tech mogul Elon Musk shared a deepfake video of Vice President Kamala Harris on his social media platform X. Watch this.

The video in question manipulates Harris's voice and appearance, putting false words in her mouth that could easily mislead viewers. Specifically, the altered voiceover makes Harris say, "I was selected because I am the ultimate diversity hire. I'm both a woman and a person of colour, so if you criticize anything I say, you're both sexist and racist." While the original creator labeled it as satire, Musk's sharing of the video without context has raised alarm bells about the responsible use of AI art tools, especially as we approach the U.S. presidential election. What's more concerning is that Musk didn't just post the video; he pinned it to his page and blocked community notes, further amplifying its reach and potential impact.

Interestingly, this incident could have unforeseen consequences in the political sphere. Some speculate that if Harris were to win a higher office, she might be more motivated to bring regulatory scrutiny to platforms like X because of the dishonest information Musk is disseminating.

This event serves as a stark reminder of how far AI-generated media has come. What was once the realm of specialized artists and technologists is now accessible to a broader audience, capable of creating convincing deepfakes that blur the line between reality and fiction. The rapid advancement of these tools presents both exciting opportunities for creative expression and significant ethical challenges.

⚖️ DEFIANCE Act: US Senate Passes Bill Against AI Deepfake Pornography

The U.S. Senate has unanimously passed the DEFIANCE Act, a landmark legislation targeting deepfake pornography created using AI. This move allows victims to take legal action against those who produce, distribute, or receive such content without consent.

The bill focuses more on content distribution than creation, potentially affecting how AI artists share their work. As AI tools become more accessible and localized, enforcement challenges grow more complex.

The DEFIANCE Act reflects the growing need for ethical guidelines and legal frameworks in AI-generated art. The AI art community must now balance protecting individuals from harmful content with preserving artistic freedom as technology advances.

Sponsored
WallStreetWindowBecome A Better Trader And More Informed Investor

🗣️ OpenAI's ChatGPT Voice Mode: Potential and Limitations

In the rapidly evolving world of generative AI art, recent developments have ignited both excitement and controversy. At the center of this debate is OpenAI's rollout of an advanced Voice Mode to select ChatGPT Plus users, a feature that exemplifies both the remarkable potential and contentious limitations of AI in creative fields.

OpenAI's new voice feature showcases impressive capabilities, such as mimicking an airline pilot's speech patterns with uncanny accuracy. The sophistication of the voice model is undeniable, incorporating natural speech patterns, including stutters and tonal shifts, that point to a future where AI-generated audio could be indistinguishable from human speech.

However, it's the limitations placed on these abilities that have drawn the most attention. Users frequently report instances where the AI abruptly halts mid-sentence, citing guideline restrictions. This heavy-handed approach to content moderation has left many questioning the balance between safety and creative potential.

For artists and creators, these restrictions feel particularly limiting. The inability to fully explore voice modification and sound effects hampers the technology's application in fields like audiobook narration, documentary production, and voice acting. This approach creates a stark contrast between the AI's potential and its current implementation, frustrating those eager to push the boundaries of AI-assisted art creation.

As we move forward, it will be crucial to find a balance that allows for innovation while addressing legitimate safety concerns. Only by doing so can we ensure that the transformative potential of AI in art and media is fully realized, ushering in a new era of creativity that pushes the boundaries of what's possible while remaining responsible and ethical.

💰 Want to turn your writing passion into a lucrative career as a premium ghostwriter?

The fastest way to build a lucrative side hustle?

Ghostwriting.

You don’t need any startup capital.
You don’t need a big social audience.
And you don’t need decades of writing experience.

To become a Premium Ghostwriter you only need 5 simple (but oddly specific) skills.

And this FREE email course will give you everything you need to start ghostwriting today.

📡 Put This On Your Radar

Stable Fast 3D

Stable Fast 3D generates high-quality 3D assets from a single image in just 0.5 seconds. The model has applications for game and virtual reality developers, as well as professionals in retail, architecture, design and other graphic-intense professions. The model is available on Hugging Face and is released under Stability AI Community License.

Runaway Gen-3 Alpha

Runway has released Gen-3 Alpha, an AI-powered image-to-video generation model. It produces 10-second video clips from input images, with users reporting high-quality results that potentially surpass competitors. The service is available through a subscription model, with prices ranging from $15/month for 1 minute of video to $76/month for unlimited generations.

Segment Anything Model 2

SAM 2 is a foundation model for promptable visual segmentation in images and videos, created by Meta AI and FAIR. It extends the original SAM model to work with video.

  • Segments objects in images and videos based on user prompts

  • Supports automatic mask generation for images

  • Tracks multiple objects across video frames

  • Processes video in real-time using streaming memory architecture

SAM 2 is open source and available on GitHub under the Apache 2.0 license. Users can download checkpoints and use provided Python APIs for image and video prediction workflows.

To use SAM 2, clone the GitHub repository, install dependencies, download model checkpoints, and use the appropriate predictor APIs.

Midjourney v6.1

Midjourney released version 6.1 of their image generation AI model, now the default for all users. It's accessible through Discord or their web interface.

  • More coherent images, especially for body parts, plants, and animals

  • Enhanced image quality with reduced artifacts and better textures

  • Improved small details and text accuracy

  • New upscalers for better image/texture quality

  • Approximately 25% faster generation

  • New personalization model with improved nuance

Meta AI Studio

A platform for creating custom AI characters without technical skills. Users can design personalities for various purposes, usable on Meta's messaging platforms and web.

  • Powered by Llama 3.1 model

  • Accessible via website or Instagram app

  • Currently US-only

Easy Comp: AI-based compositing plugin for Adobe After Effects

AI-based plugin for After Effects automatically matches colors and blends foreground elements into backgrounds.

  • Saves time on manual color correction, especially helpful for color blind users

  • Applied as final effect, combining layers rather than affecting individual ones

  • Free demo available

Automatic 1111 Update

AUTOMATIC1111's Stable Diffusion web UI has been updated to version 1.10.0. This popular interface for running Stable Diffusion image generation models now supports Stable Diffusion 3 models and offers performance improvements. New schedulers and samplers have been added, along with minor UI and usability enhancements.

Existing users can update by running the included update.bat script, while new users can download the project from GitHub.

Flash Face

FlashFace, originally developed by Ali-vilab and implemented for ComfyUI by GitHub user cold-hand, is a tool for personalizing human images while preserving high-fidelity identity. It functions as a custom node within ComfyUI, a graphical interface for Stable Diffusion workflows.

The tool allows for customization of human images and reportedly produces more realistic results compared to similar tools like InstantID. Its effectiveness increases with the use of multiple reference images.

To use FlashFace, install it in ComfyUI's custom_nodes directory and run the setup script. It can then be integrated into ComfyUI workflows.

ControlNet Pose Depot

Pose Depot is a library of high-quality ControlNet poses created by Reddit user Qwernasivob. It offers collections of poses with multiple variants (OpenPose, depth, normal, and canny) for use with ControlNet in Stable Diffusion image generation.

Pose Depot is available on GitHub and Civitai, and can be integrated into Stable Diffusion workflows using ControlNet.

ReForge Updates

reForge is an updated fork of the Stable Diffusion web UI created by GitHub user Panchovix. It combines recent changes from the Automatic1111 (A1111) web UI with the Forge/ComfyUI backend, offering an expanded toolkit for Stable Diffusion users.

Key features include new samplers (ODE samplers and CFG++), the HiDiffusion extension for higher resolution outputs, and updates to PyTorch and related libraries.

Users can download reForge from https://github.com/Panchovix/stable-diffusion-webui-reForge, with two main branches available: "main" (A1111 upstream changes) and "dev_upstream" (A1111 and ComfyUI backend changes).

*Note: Some A1111 features are not implemented, and there may be compatibility issues with certain extensions. The developer reports potential instability on Linux/Ubuntu.

Virtual Tabletop

DestinyMaestro has created a prototype virtual tabletop system for online RPGs and board games.

 Enjoy the content? Support me by clicking the sponsor's link (or don't, I see you 👀)

Million dollar AI strategies packed in this free 3 hour AI Masterclass – designed for founders & professionals. Act fast because It’s free only for the first 100.

Reply

or to participate.