Stability AI 发布 Stable Audio 3.0，支持长达六分钟音轨并开放权重

2026-05-20 22:59·25天前·Jonathan Kemper

AI 摘要

Stability AI正式推出Stable Audio 3.0音频生成模型套件。该系列包含三个已开放模型权重的版本，能够生成最长六分钟的连贯音乐音轨。公司强调，所有模型均完全基于授权音乐数据集进行训练，确保了生成内容的合规性。此次发布标志着在AI音乐生成领域的技术进步，为音乐创作者和开发者提供了更长时长、更开放可用的工具。

原文 · 未翻译

Stability AI launches Stable Audio 3.0 with up to six-minute tracks and open weights

Key Points

Stability AI's Stable Audio 3.0 generates music tracks up to six minutes long, trained entirely on licensed data.

Three of the four model variants are freely available as open-weights models. The largest remains exclusive to API users and enterprise customers.

With licensed training data and legal indemnification for enterprise customers, Stability AI is deliberately distancing itself from competitors currently facing copyright lawsuits.

Stability AI has unveiled Stable Audio 3.0, a new generation of audio models - three of which ship with open weights. The models generate music tracks up to six minutes long and were trained entirely on licensed data, according to the company.

The model family includes four variants. Stable Audio 3.0 Small SFX and Stable Audio 3.0 Small each pack 459 million parameters and produce tracks up to two minutes long in 0.44 seconds of inference time on an H200 GPU. The first focuses on sound effects and is designed for smartphones and consumer laptops. The second targets short music pieces. Stable Audio 3.0 Medium runs 1.4 billion parameters and generates tracks up to 6:20 minutes in 1.31 seconds. All three are available as open-weights models on Hugging Face.

The largest model, Stable Audio 3.0 Large with 2.7 billion parameters, isn't available as open weights. It's only accessible through the Stability AI API, through partner fal.ai, or can be hosted on a company's own infrastructure via enterprise licensing. Stability AI says it delivers the highest musicality and is built for music platforms with high generation volume.

New architecture enables longer, more flexible audio output

Stable Audio 3.0 runs on a new architecture with a semantic-acoustic autoencoder that allows longer and more flexible audio output, according to Stability AI. Generation works at variable length with second-level control.

Stable Audio 3.0 Small is the only model that enables full music composition on-device - offline and without short sample limits, the company says. For context: Stable Audio Open Small topped out at eleven seconds. Stable Audio Open managed 47 seconds. Stability AI is also releasing LoRA training documentation alongside the Stable Audio 3.0 Small and Medium weights, letting users fine-tune models on their own audio libraries.

Enterprise customers get guided fine-tuning support. The models also include inpainting features: users can edit individual segments of a track, modify multiple sections at once, or extend existing tracks beyond their original endpoint (causal continuation).

Commercial use is free up to a million dollars in revenue

Under the Stability AI Community License, users own the audio files they generate and can use them commercially. Organizations with more than one million dollars in annual revenue need to contact Stability AI for enterprise licensing, which adds commercial coverage and legal indemnification.

Stability AI points out that, to its knowledge, competing open music models either restrict commercial use or carry risks from training on unlicensed data. The company backs up its licensing stance with partnerships with Universal Music Group and Warner Music Group.

From image pioneer to audio specialist

Stability AI once shaped the open image generation space with Stable Diffusion, but has shifted its focus toward audio since founder Emad Mostaque's departure and ongoing financial struggles. The first Stable Audio launch in September 2023 relied on a partnership with stock music provider AudioSparx, which contributed about 800,000 songs, audio effects, and instrument snippets.

Stable Audio 2.0 followed in April 2024 and was one of the first commercially viable AI music tools for full-length 44.1 kHz audio up to three minutes. Stable Audio Open arrived in summer 2024 as an open-source variant for shorter samples. In May 2025, Stability AI teamed up with Arm to release Stable Audio Open Small, a compact text-to-audio model that runs on smartphones. Stable Audio 2.5 from September 2025 targeted professional sound production with multi-part compositions featuring intro, development, and outro sections. Stable Audio 3.0 now marks the shift to a unified architecture that Stability AI says will serve as the foundation for its next generation of licensed professional models.

Licensed training data gains weight amid copyright rulings

The company's repeated emphasis on licensed training data carries extra weight given recent court decisions. In November 2025, a Munich court found OpenAI liable for copyright infringement because ChatGPT reproduced protected song lyrics from the GEMA catalog in response to simple prompts. The court agreed that training data remains embedded in model weights and can be retrieved - a phenomenon GEMA calls memorization. OpenAI has appealed. The case is now before the Munich Higher Regional Court.

Stability AI's promise to work with fully licensed data and to indemnify enterprise customers positions the British company squarely against providers like Suno and Udio, which are facing similar legal battles. A separate GEMA lawsuit against Suno alleges the tool was trained on original recordings from GEMA's catalog and produces near-identical versions. In the US, Suno and Udio face comparable lawsuits from the music industry. With fully licensed training data and legal protection for enterprise customers, Stability AI is deliberately staying clear of that fight.

AI News Without the Hype – Curated by Humans

多模态模型发布

The Decoder：AI News（RSS）