• Diffusion Digest
  • Posts
  • Stability AI's SD3 Olive Branch, New CXL Technology for AI and HPC Applications, Runway's Gen-3 Alpha Sparks Debate, AI Voices Cause Ripples (July 7, 2024)

Stability AI's SD3 Olive Branch, New CXL Technology for AI and HPC Applications, Runway's Gen-3 Alpha Sparks Debate, AI Voices Cause Ripples (July 7, 2024)

Cut through the noise, stay informed — new stories every Sunday.

Welcome, generative-AI enthusiasts, creators, researchers, and curious souls who found their way here (whether by intention or serendipity). This week, we cover Stability AI's updated licensing for Stable Diffusion 3, Runway's Gen-3 Alpha text-to-video model debut, new technology expanding GPU memory for AI applications, and the impact of AI-generated voices on the voice acting industry. 

As we ease into July, the industry takes a breather after June's whirlwind of announcements, giving us a chance to dive deeper into some intriguing tools that might have otherwise slipped under the radar. So grab a seat, make yourself comfortable, and let's explore the generative AI world this week.

🤝 Stability AI's Olive Branch: Updated Licensing and Improvements for Stable Diffusion 3

Stability AI, the company behind the Stable Diffusion image generation models, has announced an update to the licensing terms for their Stable Diffusion 3 Medium model along with a commitment to release an improved version in the coming weeks.

The SD3 Medium model is now available under a "non-exclusive, worldwide, non-transferable, non-sublicensable, revocable and royalty-free" license for commercial use up to $1 million in revenue. This means that individuals or businesses can use the SD3 Medium model to create and sell products or services incorporating the model's outputs without paying additional licensing fees, as long as their total revenue does not exceed $1 million. Once the $1 million threshold is crossed, a separate commercial licensing agreement with Stability AI would be required. This change was generally well-received by the Stable Diffusion community as a step in the right direction, making the model usable for a wider range of applications. However, some expressed concern about the "revocable" nature of the license, which potentially allows Stability AI to change terms in the future. Clarity was requested on the exact conditions that could trigger a revocation.

Stability AI acknowledged the initial SD3 Medium release "didn't meet our community's high expectations" in terms of output quality and capabilities. The company is committing to release an improved version of SD3 Medium in the coming weeks to address these shortcomings.

The Stable Diffusion community appears cautiously optimistic about Stability AI's change in direction, but trust remains strained. Fulfilling their promises on SD3 Medium improvements and addressing the questions around the "revocable" license will be key to repairing the relationship in the coming months.

TL;DR: Stability AI has updated the licensing terms for their Stable Diffusion 3 Medium model, allowing commercial use up to $1 million in revenue without additional fees. The company also acknowledged the initial release's shortcomings and committed to releasing an improved version in the coming weeks to address the community's concerns.

🎥 Runway's Gen-3 Alpha: A Leap Forward in Text-to-Video AI, Sparking Debate and Discussion

Runway, known for its involvement in Stable Diffusion, has launched Gen-3 Alpha, a text-to-video AI model now available to paid subscribers.

Gen-3 Alpha boasts improved quality, fidelity, consistency, and motion control compared to competitors like Luma and Kling. Key features include fine-grained temporal control and the ability to generate complex scene changes and transitions. However, the current version lacks image-to-video and video-to-video capabilities, which are planned for future releases.

The announcement has generated mixed reactions from the AI community. Some users have expressed concerns about the limited usefulness of the tool without image-to-video functionality, noting that alternatives like Luma and Kling offer this feature for free. Moreover, these competitors provide more free generation time per month, with Luma offering 150 seconds compared to Runway's 62 seconds for paid users.

Others have reported difficulties in replicating the quality of Runway's demo videos using basic prompts, suggesting that finding the right combination of prompt and seed is crucial for achieving optimal results. Some users who tried the service reported inconsistent results and lower quality than the advertised examples, indicating that the demo videos may have been cherry-picked.

Pricing has also been a topic of discussion, with some users estimating that it could cost up to $150 to produce one minute of quality video clips using Gen-3 Alpha. However, proponents argue that this cost is still significantly lower than traditional CGI methods. Despite the high costs, the service might still be valuable for professional use cases, such as film production. However, it may not be practical for average consumers at the current price point.

Despite these concerns, the release of Gen-3 Alpha represents a notable advancement in AI video generation. The model produces 720p resolution videos and has the potential to revolutionize content creation across various industries. As Runway continues to develop and refine its offering, addressing user feedback and introducing new features is paramount - as the landscape of AI-generated video is likely to evolve rapidly in the coming months and years.

TL;DR: Runway launched Gen-3 Alpha, a text-to-video AI model with improved quality, fidelity, consistency, and motion control. The release has generated mixed reactions due to the lack of image-to-video functionality, pricing concerns, and inconsistent results compared to the advertised examples.

💾 Expanding Horizons: CXL Technology Boosts GPU Memory for AI and HPC Applications

In the rapidly evolving field of artificial intelligence (AI) and high-performance computing (HPC), graphics processing units (GPUs) play a crucial role in accelerating memory-intensive workloads. However, the fixed amount of high-bandwidth memory (HBM) built into these GPUs can limit performance as AI datasets continue to grow. Companies are often forced to either invest in new GPUs, use less advanced data, or resort to slower CPU memory to overcome these limitations.

Emerging technology now offers a promising solution to this problem by enabling the expansion of GPU memory capacity through the connection of additional memory or even solid-state drives (SSDs) over the PCIe bus using the Compute Express Link (CXL) protocol. This innovative approach provides an alternative to being constrained by the GPU's built-in memory.

Panmnesia, a company backed by South Korea's KAIST institute, has developed low-latency CXL intellectual property (IP) that allows for the addition of more memory to GPUs via CXL memory expanders. The main challenges faced by Panmnesia were the absence of native CXL support and memory subsystems in GPUs that could recognize the additional capacity. To overcome these obstacles, Panmnesia developed a CXL root complex and host bridge that effectively tricks the GPU into believing it is utilizing system memory.

During Panmnesia's testing, their optimized CXL solution demonstrated impressive performance, achieving round-trip latency in the two-digit nanoseconds range, significantly lower than previous prototypes. The technology performed 3.22 times faster than using unified virtual memory (UVM) and 1.65 times faster than an earlier CXL prototype.

The introduction of CXL support could bring substantial benefits to AI/HPC GPUs, enabling them to handle larger datasets and more complex models without the need for expensive hardware upgrades. However, the adoption of this technology remains uncertain, as it is unclear whether GPU vendors such as AMD and Nvidia will embrace CXL, either by licensing IP from companies like Panmnesia or developing their own proprietary solutions.

As the demand for more powerful AI and HPC applications continues to grow, the development of technologies like CXL memory expansion could prove to be a game-changer in the industry. By allowing GPUs to access additional memory resources, researchers and developers can push the boundaries of what is possible with AI and HPC, potentially leading to groundbreaking advancements in various fields.

TL;DR: Panmnesia has developed a low-latency CXL intellectual property (IP) that allows for the expansion of GPU memory capacity, enabling AI and HPC applications to handle larger datasets and more complex models without expensive hardware upgrades. The adoption of this technology remains uncertain, as it is unclear whether GPU vendors will embrace CXL.

💸 Do you find this content valuable? Please consider supporting us by clicking the sponsors link below.

Seeking impartial news? Meet 1440.

Every day, 3.5 million readers turn to 1440 for their factual news. We sift through 100+ sources to bring you a complete summary of politics, global events, business, and culture, all in a brief 5-minute email. Enjoy an impartial news experience.

🎙️ The Synthetic Voices Dilemma: AI's Impact on the Voice Acting Industry

The rise of artificial intelligence-generated voices is causing concern among voice actors, particularly in Australia, where an estimated 5,000 jobs are at risk. As AI voice clones become more affordable and accessible, industries such as audiobooks, corporate work, and radio are beginning to replace human voice talent with synthetic alternatives.

Audiobooks are considered the most vulnerable sector, as companies may be tempted by the perceived cost savings of AI narration. However, there are concerns that the lack of human emotional connection could lead to a decline in audience engagement. While AI voices have struggled with certain accents, like Australian English, they are improving as datasets grow. Some view this as a result of Australians holding out for ethical AI frameworks, while others attribute it to Australia being a smaller initial market.

Voice actors are calling for laws to govern the use of their voices by AI, including consent, control, and compensation. Some have even proposed banning AI entirely from creative industries to protect jobs. However, not all reactions are negative. Startups like Replica Studios are taking an "ethical AI" approach by licensing real voices and compensating actors for the use of their voice clones.

Supporters of AI voice technology, such as hobby podcasters, argue that it allows small creators to produce higher-quality content that would otherwise be unaffordable. They view the progress of AI as inevitable, even if it means professionals may lose jobs in the process.

Critics, on the other hand, worry that AI will limit opportunities for voice actors and lead to less creative, nuanced performances. They fear that the lack of human input will result in "shallow" and less emotionally engaging content.

As the debate surrounding AI-generated voices continues, it is clear that the industry is at a crossroads. While the technology offers cost savings and accessibility, it also raises questions about the value of human creativity and the importance of ethical frameworks in the age of artificial intelligence.

TL;DR: The rise of AI-generated voices is causing concern among voice actors, particularly in Australia, where an estimated 5,000 jobs are at risk. Voice actors are calling for laws to govern the use of their voices by AI, while supporters argue that the technology allows small creators to produce higher-quality content at an affordable cost.

🆕Put This On Your Radar: CHIMERA 2, LivePortrait, and Eleven Labs: Voice Isolator

CHIMERA 2 is a new Stable Diffusion XL anime model that merges several popular models, including Pony Diffusion, Animagine, Anime Illust Diffusion, ArtiWaifu, Godiva, and more. It amplifies support for Danbooru-style artist tags without strictly requiring meta-tags, improves anatomy, and enables effective artist style mixing. The model is available for download at https://civitai.com/models/549543.

A new AI-based portrait animation system called LivePortrait enables highly realistic animation of still portrait images. It uses stitching and retargeting techniques to efficiently animate facial expressions and head poses based on driving videos. An open-source Jupyter notebook implementation is available, and the system has also been integrated into the ComfyUI framework. Early examples are very promising, showing portraits that move and emote with lifelike fluidity when driven by expressive videos.

ElevenLabs has released a new AI-powered ‘Voice Isolator’ that can extract clear speech from audio by removing background noise. This is useful for post-production of films, podcasts, interviews, etc.

What did you think of this week's issue?

Be real. We love hearing from you!

Login or Subscribe to participate in polls.

Reply

or to participate.