early Fable 5 leak in new claude-code binary. Claude Fable 5 - Our most powerful, most intelligent model. New tier above...
"Fable 5 requires usage credits. Update Claude Code to the latest version to learn more."
译Fable 5 需要使用积分。请将 Claude Code 更新至最新版本以了解更多。
ANTHROPIC 🔥: Claude Fable 5 (Mythos) is being prepared for the upcoming release on Google Cloud Platform. Really soon 👀
译ANTHROPIC 🔥: Claude Fable 5 (Mythos) 正准备在谷歌云平台发布。很快了👀
Cohere just released North Mini Code, a small 30B parameter (3B active) open weights coding model that scores 27.6 on the Artificial Analysis Intelligence Index Less than a month since @cohere's last model release, Command A+, has launched another open weights model that is optimized for coding, and much smaller at 30B total parameters and 3B active parameters. Key Takeaways: ➤ Achieves 27.6 on the Artificial Analysis Intelligence Index, above gpt-oss-20B (high) at 24.5 and just below Mistral Small 4 (119B parameters, 6.5B active) at 27.8 ➤ Scores competitively on the Artificial Analysis Coding Index (weighted average of Terminal-Bench Hard and SciCode) against open weights models in its size class, scoring 33.4, significantly above GLM-4.7-Flash at 25.9, and below Qwen3.6 35B A3B at 35.2. However, it underperforms on non-coding agentic tasks, scoring 14% on GDPval-AA and 37% on 𝜏²-Bench Telecom ➤ On Cohere’s API, North Mini Code is faster than several comparable open weights models of its intelligence and size class (~199 output tokens per second) ➤ North Mini Code is a text-only 30B total parameter and 3B active parameter model, and is open-sourced under the Apache 2.0 license
译Cohere近日发布North Mini Code,一款30B总参数(3B活跃参数)的开放权重编码模型,采用Apache 2.0开源协议。该模型在Artificial Analysis Intelligence Index上得分27.6,高于gpt-oss-20B (high)的24.5,略低于Mistral Small 4(119B参数,6.5B活跃)的27.8。在Coding Index(Terminal-Bench Hard和SciCode加权平均)上得分33.4,显著高于GLM-4.7-Flash的25.9,低于Qwen3.6 35B A3B的35.2。非编码智能体任务表现较弱:GDPval-AA 14%、τ²-Bench Telecom 37%。在Cohere API上推理速度约199 output tokens/s,快于同类模型。距Cohere上次发布Command A+不到一个月。
Fascinating. Google just released Gemini 3.5 Live Translate. A live speech-to-speech translation model that starts speaking in another language while the original speaker is still talking. Older translation systems often wait for a full sentence, because early words can be misleading until later words reveal tense, intent, or context. Gemini 3.5 instead runs streaming translation, where the model listens, interprets partial meaning, predicts what can safely be translated, and keeps updating as new speech arrives. supports 70+ languages, stays only a few seconds behind the speaker, and can preserve pacing, pitch, and intonation across longer sessions. Rolling out to Gemini Live API, businesses through Google Meet preview, and regular users through Google Translate on Android and iOS.
译Google 推出 Gemini 3.5 Live Translate,一款实时语音转语音翻译模型。它在原说话者尚未说完时即开始翻译,无需等待完整句子。模型采用流式翻译,边听边更新结果,支持 70 多种语言,延迟仅数秒,并能保持语速、音高和语调。该功能通过 Gemini Live API、Google Meet 预览版以及 iOS/Android 版 Google Translate 应用推出。
Anthropic Is dropping a public version of Mythos today: codename "Fable" - per The Information It’s costly, at 2x the price of Opus, but maybe still cheaper than what people expected after seeing the first Mythos pricing at 5x Opus. - It will come with strong safety limits, and it will not be as open on cyber use as the restricted preview given to Project Glasswing partners. - It is expected to be much stronger at long-running, multi-step tasks and agent-style workflows. Context on Mythos: - Anthropic introduced Claude Mythos Preview in April 2026. At launch, it wasit’s most powerful frontier model, especially strong in coding, reasoning, and cybersecurity, including finding and exploiting zero-days. - It was not released publicly at first because of safety issues. Only selected Project Glasswing partners received access for defensive cybersecurity, and they have reportedly found thousands of major vulnerabilities.
译Anthropic 今日发布 Mythos 的公开版本,代号“Fable”。其成本约为 Opus 的两倍,低于此前预览版 5 倍 Opus 的定价。Fable 配备严格安全限制,在网络安全方面比 Project Glasswing 合作伙伴的受限预览版更保守,且在长时间、多步骤任务及智能体式工作流上表现更强。Mythos 预览版于 2026 年 4 月推出,是当时最强前沿模型,尤其擅长编程、推理和网络安全(含发现零日漏洞);因安全问题未公开,仅限 Project Glasswing 合作伙伴用于防御性网络安全,目前已报告发现数千个重大漏洞。
Introducing Gemini 3.5 Flash Live Translate, our real time speech to speech translation model which supports more than 70 languages (both in and out), and is so natural. It is available in the Gemini API, AI Studio, & Google Translate right now + coming soon to Google Meet!!
译Introducing Gemini 3.5 Flash Live Translate,我们的实时语音到语音翻译模型,支持超过 70 种语言(输入和输出),并且非常自然。 现在已在 Gemini API、AI Studio 和 Google 翻译中可用,并即将登陆 Google Meet!
Our latest audio model, Gemini 3.5 Live Translate, takes real-time speech translation to the next level for developers by delivering low-latency translation across 70+ languages. By processing speech as it streams in near real time, the model enables devs to build low-latency audio experiences with: — Multilingual input: Understands multiple languages in a single session without needing to adjust settings. — Auto-detection: Identifies the spoken language and begins translation instantly. — Native audio processing: Generates more natural-sounding speech that preserves speakers' intonation, pacing, and pitch. — Noise robustness: Filters out ambient noise for clearer conversation in loud environments.
译Google AI 推出音频模型 Gemini 3.5 Live Translate,为开发者提供低延迟实时语音翻译,支持 70+ 种语言。模型具备多语言输入(同会话无需切换)、自动语言检测、原生音频处理(保留说话者语调、语速和音高)以及噪声鲁棒性(过滤环境噪音),可直接处理流式语音。
Today, we released Gemini 3.5 Live Translate, our latest audio model for live speech-to-speech translation. It supports over 70 languages and starts translating as soon as you start talking, streaming translations while listening to what you say next. No awkward pauses or choppy audio, just real connection without language barriers. So, how does it work? 🤔 The model is able to make split-second decisions to juggle speed and translation quality so conversations actually feel fluid, human, and natural. In order to do this, the model must receive and contextualize the input while simultaneously outputting the translated speech. Through this process, Gemini 3.5 Live Translate manages to stay mere seconds behind each speaker and can even maintain pacing, pitch, and intonation across extended sessions. See it in action below, or try it yourself in the Google Translate app on iOS & Android.
译Google AI 推出 Gemini 3.5 Live Translate,一款面向实时语音到语音翻译的音频模型。该模型支持 70 多种语言,可在用户说话的同时开始翻译并流式输出译文,避免尴尬停顿或断续。模型通过毫秒级决策平衡速度与翻译质量,使对话流畅自然。它可边接收输入边输出翻译语音,延迟仅比说话者慢几秒,并能在长对话中维持语速、音高和语调。目前已在 iOS 和 Android 版 Google Translate 应用中上线。
Say hello, hola, 你好 to Gemini 3.5 Live Translate: our latest audio model built for fast, cross-language communication. 🌐
译说 hello, hola, 你好——欢迎 Gemini 3.5 Live Translate:我们最新的音频模型,专为快速跨语言交流而构建。🌐
Confirmed, Claude Mythos will be unveiled in the next few hours
译确认,Claude Mythos 将在接下来几小时内揭晓。 [引用 @steph_palazzolo]: 独家:一个名为 Claude Fable 的精简版 Mythos 今天推出。它价格昂贵——是 Opus 的两倍——但或许不像人们从最初 Mythos 定价(Opus 的 5 倍)所想的那样昂贵。 更多内容及 Apple WWDC 见 AI Agenda: https://www.theinformation.com/newsletters/ai-agenda/anthropics-mythos-coming-today-apple-pursues-modest-goals-siri-revamp
Direction goes in. Cinema comes out. Ray3.2 is here → http://lumalabs.ai/ray3-2
译方向进入,电影出来。 Ray3.2 来了 → http://lumalabs.ai/ray3-2
Apple's new foundation models are genuinely exciting. The standout is AFM 3 Core Advanced, a 20-billion (!) parameter model that runs entirely on-device. Read that again. 20-billion, on-device, iPhone 17 Pro. It pulls this off by keeping the full model in flash memory and loading only a small slice of "experts" into active memory for each prompt, just 1 to 4 billion parameters at a time. That's a clever way to get around the usual DRAM wall, and it's what unlocks things like expressive voices and much sharper dictation right on the device. The whole family of five models was built in collaboration with Google. It spans these on-device models all the way up to server-based ones running on Private Cloud Compute, with the most demanding cloud model running on NVIDIA GPUs. Kudos, Apple!
译Apple 发布全新基础模型家族,亮点是 AFM 3 Core Advanced:200 亿参数,完全运行在 iPhone 17 Pro 设备端。通过将完整模型存于闪存,每次仅加载 1-4B 专家参数到活跃内存,巧妙绕过 DRAM 瓶颈,实现设备端更生动的语音和更精准的听写。共 5 个模型,与 Google 合作打造,覆盖从设备端到 Private Cloud Compute 的云端模型,最高性能云端模型运行在 NVIDIA GPU 上。
小道消息:Anthropic 将于今晚发布其最强 AI 模型Mythos...
译小道消息:Anthropic 将于今晚发布其最强 AI 模型 Mythos...
Claude Mythos will be released today (June 9th), according to leaks everywhere. The interesting question is whether they'll also update Sonnet or Haiku. Not that Mythos isn't enough for me, I'm just curious why the smaller models are currently getting so little attention. Anyway, it will probably be called Claude-5-fable.
译据多方泄露,Claude Mythos 将于今日(6月9日)发布。 有趣的问题是,他们是否也会更新 Sonnet 或 Haiku。 不是说 Mythos 对我而言不够用,我只是好奇为什么目前小模型受到的关注这么少。 总之,它可能被命名为 Claude-5-fable。
Claude Mythos is conning tomorrow!! Prepare yourself friends. It’s happening!!
译据消息,Anthropic 计划明天发布 Mythos 公开版。该版本将配备实质性护栏,权限不如 Project Glasswing 合作伙伴可访问的版本宽松,但在长周期、多轮任务上表现将大幅提升。准备好,朋友们,就要来了!
MiMo推出1000 Token/s超高速模型|体验测评 MiMo 推出了 MiMo V2.5 Pro UltraSpeed 超高速的模型版本,能够实现每秒输出超过 1,000 Token 的速度。 同时,这应该也是全球第一个达到这个速度的万亿(1T)参数模型。 藏师傅提前试了一下,做了三个测试,确实爽。 第一个跑了一个比较复杂的 3D 采矿小游戏测试。在没有素材的情况下,我让它全部用 Three.js 前端代码来生成素材。整体要求比较完整,虽然第一次实践时出了一些小问题,但在跟他沟通修改建议后,非常完美地实现了任务。 这次测试的各项指标如下:思考的 TPS:804 Token/s,峰值速度:810 Token/s,首次响应时间:4.71 秒。 第二个测试给了一个官网,其头部包含一个相对复杂的 3D 动画。 这次的输出速度快了非常多:峰值达到了 1426 Token/s,首次响应只用了 0.83 秒,在 32 秒内输出了 25624 个 Token,总计生成了 1000 行代码。 第三个测试给了一个更复杂的官网。我要求这个官网的 Header 头部包含以下 3D 效果:地球边缘、轨道上的飞船、星际尘埃、航线图、舷窗的 HUD 样式。 这个效果非常好,整体的视觉样式、状态、SVG 动画和驾驶卡片都非常精细,还有滚动的视差效果 这个输出的 TPS 达到了 1136 tokens/s,首次响应是 4.5 秒 官方测试平台下面有个数据展示,会显示相关信息 在流式输出的情况下,当你看着它只用 20 秒就产生一个非常复杂的 3D 游戏时,那种场景还是比较震撼的 之前的这些(比如说 Groq 之类的)超高速推理方案,在模型能力或者是整体水平上都会有所下降,但是 MiMo 这个在测试的时候,我没有看到这种迹象 最近很多公司都开始推出这种超高速的 API 服务,比如之前 OpenAI 和 Anthropic 都有 Fast 模式 在 Agent 场景下,模型输出效率的提升会直接带动每一步 Agent 操作的效率: 如果一个任务预估一分钟完成,你就会盯着它直到结束,然后立刻投入测试。如果需要五分钟才完成,你可能就会去干别的事,然后再回来看,难免会浪费一些时间 这种效率提升在 Sub-Agent 和并发场景下更加明显。因为它可以更快地产出大量结果,想象一下,如果同时启动一两百个 Sub-Agent,在模型能力没有衰减的前提下,速度提高 10 倍,体验是非常爽的 毕竟这本质上是面向那种对效率有极高要求的 To B 客户所推出的 希望后面大家卷起来,优化一下成本,让普通用户也能放开用这种 UltraSpeed 模型
译MiMo推出V2.5 Pro UltraSpeed超高速模型版本,每秒输出超1000 Token,号称全球首个达此速度的万亿参数模型。实测显示:复杂3D小游戏TPS 804 Token/s(峰值810),首次响应4.71秒;官网3D动画峰值1426 Token/s,首次响应0.83秒,32秒输出25624 Token(1000行代码);另一复杂官网3D效果TPS 1136,首次响应4.5秒。相比此前超高速推理方案常见能力下降,MiMo未出现此类迹象。该模型主要面向效率要求极高的ToB客户,在Agent和Sub-Agent并发场景下效率提升明显。
难道说?我感觉他们能做出来强制 kyc 才让用这种操作
译据报道,Anthropic 将于明天发布新 AI 模型“Mythos”。主推文猜测这可能伴随着强制 KYC 措施。
ANTHROPIC 🔥: Claude Mythos is planned to be released as Claude Fable 5 according to checkpoints detected by Dev Mode, Hacker News reports and Sources. Anthropic is also hosting its 3rd developer event in Japan tomorrow. Soon? 👀
译ANTHROPIC 🔥: Claude Mythos 计划作为 Claude Fable 5 发布,根据 Dev Mode 检测到的检查点、Hacker News 报道和消息源。 Anthropic 还将于明天在日本举办第三届开发者活动。 快了?👀
Grok debuts grok-imagine-video-1.5-preview, achieving #2 in Image to Video (With Audio) in the Artificial Analysis Video Arena, behind only ByteDance's Seedance 2.0! grok-imagine-video-1.5-preview is @xAI's latest video generation model, currently supporting only Image to Video with native audio, and durations up to 15s. It ranks #2 in the Image to Video (With Audio) Leaderboard, trailing only ByteDance's Seedance 2.0. In the Without Audio Leaderboard it places #3, behind Seedance 2.0 and xAI's own grok-imagine-video, which it performs very closely to. grok-imagine-video-1.5-preview costs $8.40 per minute of generated video, and is available now via xAI's API, with a broader rollout across the Grok app and X in progress. Congratulations to @xAI and @elonmusk on the release! See below for comparisons between grok-imagine-video-1.5-preview and other leading models in the Artificial Analysis Video Arena 🧵
译xAI推出视频生成模型grok-imagine-video-1.5-preview,目前在Artificial Analysis Video Arena的Image to Video (With Audio)排行榜中排名第二,仅次于字节跳动Seedance 2.0。该模型支持图像转视频并原生生成音频,最长可生成15秒视频。在无音频排行榜中位列第三,紧随Seedance 2.0和自家的grok-imagine-video。模型定价为每分钟视频$8.40,现已通过xAI API提供,并将逐步在Grok app和X上线。
🚀 VoxCPM2 Technical Report is now available on arXiv! VoxCPM2 is the latest speech generation model in the VoxCPM family. Built with 2B parameters and trained on over 2 million hours of multilingual speech data, it supports 30 languages and 9 Chinese dialects, along with natural-language voice design, controllable voice cloning, and high-fidelity continuation-based voice cloning. In this technical report, we provide a comprehensive overview of: 🔹 The VoxCPM2 architecture 🔹 A unified sequence formulation for speech generation and control 🔹 The design of AudioVAE for high-fidelity speech reconstruction 🔹 Large-scale multilingual training and evaluation 🔹 Benchmark results across zero-shot and instruction-following TTS tasks With 16kHz semantic encoding and 48kHz waveform reconstruction, VoxCPM2 delivers high-quality speech generation and achieves SOTA or highly competitive performance on public TTS benchmarks. To support open research and development, we have open-sourced the model weights, fine-tuning code, and inference toolkit under the Apache 2.0 license. 📄 Paper: https://arxiv.org/abs/2606.06928 💻 GitHub: https://github.com/OpenBMB/VoxCPM We hope VoxCPM2 helps advance the open-source multilingual speech ecosystem. Feedback, experiments, and contributions are always welcome! 🔥 #AI #OpenSource #TTS #SpeechAI #VoiceAI #GenerativeAI #MachineLearning
译面壁智能 OpenBMB 发布 VoxCPM2 技术报告。该模型为最新语音生成模型,拥有 2B 参数,基于超 200 万小时多语言语音数据训练,支持 30 种语言和 9 种中文方言。具备自然语言语音设计、可控及高保真延续性语音克隆能力。技术报告涵盖架构设计、统一序列公式、AudioVAE 高保真语音重建、大规模训练评估,以及零样本和指令跟随 TTS 基准结果。采用 16kHz 语义编码 + 48kHz 波形重建,在公开 TTS 基准上达到 SOTA 或极具竞争力。模型权重、微调代码和推理工具以 Apache 2.0 开源。
🚀 1,000+ TOKENS/S ON A 1T MODEL! 🚀 We are thrilled to release Xiaomi MiMo-V2.5-Pro-UltraSpeed in collaboration with @TileRT_AI , breaking the 1,000 tokens/s output speed on a 1 Trillion parameter model for the FIRST TIME! Not wafer-scale integration like Cerebras. Not pure on-chip SRAM chips like Groq. We achieve 1,000 tps on a 1T MoE model using just a SINGLE, STANDARD 8-GPGPU NODE. Read the full technical deep dive:https://mimo.xiaomi.com/blog/mimo-tilert-1000tps Want to experience the future of real-time AI? 👉 Apply for UltraSpeed now: https://platform.xiaomimimo.com/ultraspeed ⏳ Limited-Time Access: Application-based · Jun 8 – Jun 23 (PDT) 💬 Chat Experience: Completely FREE for a limited time — try the blazing-fast web chat now. ⚡ UltraSpeed API: Just 3x the price for a ~10x boost in output experience. 🤝 Enterprise & Large-Scale Needs: business-mimo@xiaomi.com
译小米 MiMo 联合 TileRT_AI 发布 MiMo-V2.5-Pro-UltraSpeed,首次在 1 万亿参数 MoE 模型上实现超过 1,000 tokens/s 输出速度,仅用单台标准 8-GPGPU 节点(非 Cerebras 或 Groq 方案)。提供限时免费聊天体验,UltraSpeed API 价格为 3 倍,输出体验提升约 10 倍。申请时间为 6 月 8 日至 23 日(PDT),企业可邮件联系 business-mimo@xiaomi.com。
🔥 Launch Special for Qwen3.7-Plus: Get 20% OFF now! ✅ Multimodal Interactive Hybrid Agents ✅ Coding & Productivity Assistants ✅ Vision Agents ✅ Cross-Harness Generalization Don't miss the upgrade. 👇 https://int.alibabacloud.com/m/1000414123/ #Qwen #AI #Multimodal #AlibabaCloud #AgenticAI
译🔥 Qwen3.7-Plus 发布特惠:现在享受八折! ✅ 多模态交互式智能体 ✅ 编程与生产力助手 ✅ 视觉智能体 ✅ 跨任务泛化 不要错过升级机会。👇 https://int.alibabacloud.com/m/1000414123/ #Qwen #AI #Multimodal #AlibabaCloud #AgenticAI
我靠,这不直接抢了苹果的活儿啊! 6.6B的小模型直接把Siri和一堆云端巨头干到闭嘴,还只吃7GB内存就跑在Mac本地。 CJ Zafir他们搞的Mac-1,不光参数小到离谱,还一次性接了487个Mac原生工具,能链式调用、自动推理、连发邮件订会议都行,速度65 tok/s,UI还是纯Mac风。 以前大家都觉得agent要靠大模型+云端才能靠谱,结果这个本地小家伙直接把“模型越大越强”的理论快要掀桌子了。 它真正厉害的地方是把应用层做成了Mac原生体验,人用着舒服,Agent后台自己干活。 云端SaaS的agent时代,可能还没真正开始,就已经被本地小模型+原生工具的组合终结了。 感觉苹果没有做成的事儿,被这家公司嘿干了啊! 完了实际测测支持中文方便是否也丝滑~
译CJ Zafir团队发布Mac-1模型(6.6B参数),可在任何Mac本地运行,仅需7GB内存(12GB更佳)。它支持487个MacOS原生工具,能执行多工具链式调用,推理开启,输出速度约65 tok/s。应用层基于Mac原生UI/UX设计。作者认为这种本地小模型+原生工具的组合直接挑战云端SaaS agent,甚至可能抢了苹果Siri的活儿。
Holy, release is so close. It will be named „Claude Mythos 5“, a tier above Opus. I got the feeling coming week will be so huge
译天哪,发布就在眼前了。它将被命名为“Claude Mythos 5”,是比 Opus 更高一级的模型。 我感觉下周会非常重磅。
BREAKING 🔥: A new Claude Mythos 5 model slug has been spotted via Dev Mode. Claude Mythos is planned to be released as its own model class, besides Haiku, Sonnet and Opus model families. Soon? 👀
译BREAKING 🔥: 开发者模式下发现一个新的 Claude Mythos 5 模型 slug。 Claude Mythos 计划作为自己的模型类别发布,与 Haiku、Sonnet 和 Opus 模型系列并列。 很快?👀
Google just made Gemma 4 much easier to run on phones and laptops by releasing QAT (Quantization-Aware Training) checkpoints that shrink the smallest model from 11.4GB to 1.1GB, or 0.84GB for text-only use. Normal PTQ (Post-Training Quantization.) compresses after training and can damage quality because the model never learned to survive that rounding. QAT fixes this by simulating compression during training, so Gemma 4 learns while its weights are being squeezed, making the final compressed model less likely to lose reasoning quality. Google also built a mobile-focused format with static activations, channel-wise quantization, targeted 2-bit quantization, and KV cache optimization, which means the phone does less scaling work, stores some token-generation parts more aggressively, and keeps long chats from eating memory too fast.
译Google 发布 Gemma 4 的 QAT(量化感知训练)检查点,将最小模型从 11.4GB 缩小至 1.1GB(纯文本版 0.84GB),便于手机和笔记本运行。常规 PTQ(训练后量化)因模型未学会应对舍入而损伤质量;QAT 在训练中模拟压缩,让模型在权重被挤压时学习,压缩版不易丢失推理能力。Google 还构建了移动端优化格式,包含静态激活、通道量化、定向 2-bit 量化及 KV 缓存优化,减少手机缩放计算并防止长对话过快消耗内存。
Google DeepMind released new Gemma 4 QAT models that make the model family much more efficient for local, on-device use. Using Quantization-Aware Training, the models are trained with compression in mind, which reduces memory needs while preserving more quality than standard post-training quantization. The release includes support for the popular Q4_0 format and a new mobile-specialized quantization format. Gemma 4 E2B can now run with around 1GB of memory (!), and the text-only version can even require less than 1GB (!). That makes local AI on phones, laptops, edge devices, and consumer GPUs far more practical. Really cool to see.
译Google DeepMind 发布 Gemma 4 QAT 量化感知训练模型,专为本地 / 设备端优化。通过量化感知训练减少内存占用,同时相比标准训练后量化保留更多质量。支持 Q4_0 格式及新的移动专用量化格式。Gemma 4 E2B 版本可运行于约 1GB 内存,纯文本版本甚至低于 1GB,使手机、笔记本、边缘设备和消费级 GPU 上的本地 AI 更实用。
Live on OpenRouter: Riverflow 2.5 from @Sourceful. The first image model with an independent scoring rubric you control to guide its thinking and editing, with controllable reasoning effort to trade speed for quality. Free until Tuesday June 9. Fast and Pro below 🧵
译在OpenRouter上线:来自@Sourceful的Riverflow 2.5。 首个具有独立评分标准的图像模型,你可控制该标准以引导其思维和编辑,并具备可控的推理努力,可在速度与质量之间进行权衡。 免费至6月9日(周二)。Fast和Pro见下方🧵。
New @GoogleGemma 4 QAT (Quantization-Aware Training) checkpoints are here, so you can run models locally on consumer GPUs and mobile devices with minimal quality loss. What’s new: 🔹 GGUF (Q4_0): Checkpoints: Max local performance across all sizes and drafter models 🔹 Custom Mobile Schema: We shrunk Gemma 4 down to less than 1GB for mobile devices by using a custom mixed precision schema designed for edge hardware (featuring targeted 2-bit decoding layers, optimized KV caches, and static activations) By simulating compression during training rather than after (Post-Training Quantization), we've drastically reduced the memory footprint and accelerated decode speeds while preserving reasoning quality. https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4/
译谷歌发布 Gemma 4 量化感知训练 (QAT) 检查点,支持在消费级 GPU 和移动设备上本地运行,质量损失极小。新检查点提供 GGUF(Q4_0)格式,覆盖所有尺寸及起草模型,实现最佳本地性能。自定义移动模式采用混合精度方案,将 Gemma 4 压缩至 1GB 以下,包含 2-bit 解码层、优化 KV 缓存和静态激活。通过在训练中模拟压缩(而非训练后量化),大幅降低内存占用并加速解码,同时保持推理质量。
Holy cow. Mythos really is next level
译最近发现的“Oceanus”检查点输出预览曝光,据传闻这是 Anthropic 即将发布的 Mythos 模型的一个版本,计划在“几周内”公开发布。
MYTHOS 🔥: Another early preview of recently spotted "Oceanus" checkpoint output. "Oceanus" is rumored to be a version of the upcoming Mythos model, which is planned for public release within "weeks", according to Anthropic. "Oceanus" prompt 👀
译MYTHOS 🔥: 近期发现的"Oceanus"检查点输出的另一个早期预览。 "Oceanus"被传是即将推出的Mythos模型的一个版本,根据Anthropic,计划在"数周内"公开发布。 "Oceanus"提示词 👀
Claude mythos will be on a completely different level. These outputs are insane
译@Lentils80 分享了两段来自 Claude Mythos 的惊人输出,零样本且几乎无需费力。这是自 2025 年 10 月 Gemini A/B 模型以来,针对该提示词我看到的最佳输出。主推文感叹:Claude Mythos 将进入完全不同的水准,这些输出太疯狂了。
Grok model improvement
译更新后的 Grok-build 模型(仍是 0.5T 那个)比以前好很多。它不那么偷懒、更自主、更准确。我们仍在改进长时任务。请期待并在我们漂亮的 TUI 中使用新的使用限制!🚀
NVIDIA 🔥: Nemotron 3 Ultra has been released on Huggingface with 5x faster inference and 30% lower costs in comparison to other open models. > Nemotron-3-Ultra-550B-A55B-NVFP4 is a frontier-scale large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities.
译NVIDIA 在 Huggingface 上发布 Nemotron 3 Ultra(Nemotron-3-Ultra-550B-A55B-NVFP4),一个 550B 参数的 MoE 前沿智能开源大语言模型,专为长时间运行的 AI 智能体设计。相比其他开源前沿模型,推理速度提升 5 倍,复杂智能体任务成本降低 30%。模型具备强大的智能体、推理和对话能力。
That’s so cool! I love the creativity of those guys. An open model for live music generation only 2.4B parameters. If you are bored on long flights you can now start creating bangers
译那太酷了!我爱这些家伙的创意。 一个仅2.4B参数的开放模型,用于实时音乐生成。 如果你在长途飞行中无聊,现在可以开始创作神曲了。
Play our new open-weights music model, @GoogleMagenta RealTime 2, using a MIDI keyboard, live text prompts, and even hand gestures ✌️ https://x.com/GoogleMagenta/status/2062589313372594538
译Google AI for Developers 宣布推出开放权重的实时音乐模型 Magenta RealTime 2 (MRT2)。该模型可通过 MIDI 键盘、实时文本提示甚至手势进行演奏。MRT2 在 MacBook 上原生运行,延迟低于 200ms,提供开放权重、开源推理引擎以及配套应用和插件套件。
1/ NVIDIA shipped Nemotron 3 Ultra today, a fully open 550B model with 55B active params, with the weights, training data, and complete recipe all released openly. That alone is rare at this scale. The headline however actually is speed. Ultra is a hybrid Mamba-Attention MoE, an architecture built for fast decoding and a light memory footprint over long contexts, and NVIDIA clocks it at roughly 6x (!) the throughput of comparable open models on long-output agent workloads while holding the same accuracy. That's a serious engineering result, and it's aimed exactly where the industry is heading: autonomous agents that run long, multi-turn tasks where throughput per GPU is what actually costs money. It was pre-trained in 4-bit (NVFP4) across 20T tokens, the largest stable run of its kind shown to date. And the post-training introduces MOPD, where ten-plus specialist teacher models distill their skills into the student on its own rollouts, sometimes pushing it past the teachers themselves. The interesting aspect:This is a frontier-class model you can fully reproduce.
译NVIDIA 正式发布 Nemotron 3 Ultra,550B 总参数(55B 活跃)的完全开源 MoE 模型,权重、训练数据和完整配方全部公开。采用混合 Mamba-Attention 架构,专为长上下文快速解码和轻内存占用设计。在长输出智能体工作负载上,吞吐量约为可比开源模型的 6 倍(推理速度提升 5 倍),复杂智能体任务成本降低最多 30%。该模型在 4-bit(NVFP4)精度下预训练 20T tokens,后训练使用 MOPD 技术,由十余个专家教师模型蒸馏技能至学生模型。这是首个达到前沿水平且可完全复现的开源模型。
"𝗦𝗲𝗿𝗶𝗼𝘂𝘀𝗹𝘆 𝗶𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝘀𝘁𝘂𝗳𝗳". Thanks for the kind words, @gurru_tech — that's 𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮 𝗨𝟭 turning prompts into professional infographics. Unified model that natively understands and generates text and images. Open-sourced. Run it yourself. 🎥Watch the video: https://youtu.be/HKz2e3STUwg 🎛️ SenseNova Studio: https://unify.light-ai.top/ (Try infographics; also join Discord for text-image interleaved gen) 🤗 https://huggingface.co/collections/sensenova/sensenova-u1 🛠️ https://github.com/OpenSenseNova/SenseNova-U1 👾 Discord: https://discord.com/invite/BuTXPHmQub @huggingface @github
译商汤SenseTime发布SenseNova U1,一个原生理解和生成文本与图像的统一模型。该模型已开源,用户可自行运行。被@gurru_tech称赞“令人印象深刻”。提供在线演示平台SenseNova Studio、HuggingFace模型、GitHub代码及Discord社区。
Post-training is having a moment — Nex-N2-Pro from neolab @NexEcosystem proves it. Built on Qwen3.5-397B-A17B, delivers GPT-5.5 and Claude Opus 4.7–level performance. 🎉 T+0 Support on SiliconFlow · Free for First 2 Weeks N2-Pro: 397B MoE / Reasoning Model / 262K context / VLM → Auto-adjusts reasoning depth, 30–50% fewer thinking tokens, no performance trade-off → SOTA performance on Terminal Bench 2.1, GDPVal, SWE-Verified → Excels at agentic coding, deep search, tool use → Plug-and-play with Claude Code, Cursor, OpenClaw, etc. Try it on SiliconFlow ⬇️
译neolab 推出 Nex-N2-Pro,基于 Qwen3.5-397B-A17B,总参数 397B 的 MoE 推理模型,支持 262K 上下文与多模态(VLM),性能达到 GPT-5.5 和 Claude Opus 4.7 级别。模型可自动调节推理深度,减少 30-50% 思考 token 且无性能折损,在 Terminal Bench 2.1、GDPVal、SWE-Verified 上取得 SOTA。擅长智能体编码、深度搜索和工具使用,兼容 Claude Code、Cursor 等工具。硅基流动已提供 T+0 支持,前两周免费使用。
"𝗦𝗲𝗿𝗶𝗼𝘂𝘀𝗹𝘆 𝗶𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝘀𝘁𝘂𝗳𝗳". Thanks for the kind words, @gurru_tech — that's 𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮 𝗨𝟭 turning prompts into professional infographics. Unified model that natively understands and generates text and images. Open-sourced. Run it yourself. 🎥Watch the video: https://youtu.be/HKz2e3STUwg 🎛️ SenseNova Studio: https://unify.light-ai.top/ (Try infographics; also join Discord for text-image interleaved gen) 🤗 https://huggingface.co/collections/sensenova/sensenova-u1 🛠️ https://github.com/OpenSenseNova/SenseNova-U1 👾 Discord: https://discord.com/invite/BuTXPHmQub
译商汤 SenseTime 推出 SenseNova U1 开源多模态模型,实现原生理解与生成文本和图像,可一键将提示词转化为专业信息图。该模型被开发者 @gurru_tech 评价为“非常令人印象深刻”。项目已开源,提供 SenseNova Studio 在线试用,并公开 HuggingFace 模型集合、GitHub 源码仓库及 Discord 社区入口。
early Fable 5 leak in new claude-code binary. Claude Fable 5 - Our most powerful, most intelligent model. New tier above...
Cohere近日发布North Mini Code,一款30B总参数(3B活跃参数)的开放权重编码模型,采用Apache 2.0开源协议。该模型在Artificial Analysis Intelligence Index上得分27.6,高于gpt-oss-20B (high)的24.5,略低于Mistral Small 4(119B参数,6.5B活跃)的27.8。在Coding Index(Terminal-Bench Hard和SciCode加权平均)上得分33.4,显著高于GLM-4.7-Flash的25.9,低于Qwen3.6 35B A3B的35.2。非编码智能体任务表现较弱:GDPval-AA 14%、τ²-Bench Telecom 37%。在Cohere API上推理速度约199 output tokens/s,快于同类模型。距Cohere上次发布Command A+不到一个月。
Google 推出 Gemini 3.5 Live Translate,一款实时语音转语音翻译模型。它在原说话者尚未说完时即开始翻译,无需等待完整句子。模型采用流式翻译,边听边更新结果,支持 70 多种语言,延迟仅数秒,并能保持语速、音高和语调。该功能通过 Gemini Live API、Google Meet 预览版以及 iOS/Android 版 Google Translate 应用推出。
Today, we released Gemini 3.5 Live Translate, our latest audio model for live speech-to-speech translation. It supports ...
关联讨论 5 条Ars Technica:AI(RSS)X:Jeff Dean (@JeffDean)IT之家(RSS)The Decoder:AI News(RSS)X:Berry Xia (@berryxia)Anthropic 今日发布 Mythos 的公开版本,代号“Fable”。其成本约为 Opus 的两倍,低于此前预览版 5 倍 Opus 的定价。Fable 配备严格安全限制,在网络安全方面比 Project Glasswing 合作伙伴的受限预览版更保守,且在长时间、多步骤任务及智能体式工作流上表现更强。Mythos 预览版于 2026 年 4 月推出,是当时最强前沿模型,尤其擅长编程、推理和网络安全(含发现零日漏洞);因安全问题未公开,仅限 Project Glasswing 合作伙伴用于防御性网络安全,目前已报告发现数千个重大漏洞。
Google AI 推出音频模型 Gemini 3.5 Live Translate,为开发者提供低延迟实时语音翻译,支持 70+ 种语言。模型具备多语言输入(同会话无需切换)、自动语言检测、原生音频处理(保留说话者语调、语速和音高)以及噪声鲁棒性(过滤环境噪音),可直接处理流式语音。
关联讨论 5 条Ars Technica:AI(RSS)X:Jeff Dean (@JeffDean)IT之家(RSS)The Decoder:AI News(RSS)X:Berry Xia (@berryxia)Google AI 推出 Gemini 3.5 Live Translate,一款面向实时语音到语音翻译的音频模型。该模型支持 70 多种语言,可在用户说话的同时开始翻译并流式输出译文,避免尴尬停顿或断续。模型通过毫秒级决策平衡速度与翻译质量,使对话流畅自然。它可边接收输入边输出翻译语音,延迟仅比说话者慢几秒,并能在长对话中维持语速、音高和语调。目前已在 iOS 和 Android 版 Google Translate 应用中上线。
关联讨论 5 条Ars Technica:AI(RSS)X:Jeff Dean (@JeffDean)IT之家(RSS)The Decoder:AI News(RSS)X:Berry Xia (@berryxia)Scoop: A neutered version of Mythos called Claude Fable is coming today. It's expensive-2x the price of Opus-but perhaps...
Apple 发布全新基础模型家族,亮点是 AFM 3 Core Advanced:200 亿参数,完全运行在 iPhone 17 Pro 设备端。通过将完整模型存于闪存,每次仅加载 1-4B 专家参数到活跃内存,巧妙绕过 DRAM 瓶颈,实现设备端更生动的语音和更精准的听写。共 5 个模型,与 Google 合作打造,覆盖从设备端到 Private Cloud Compute 的云端模型,最高性能云端模型运行在 NVIDIA GPU 上。
New Claude model checkpoints (Possibly Mythos GA) - Claude Fable 5 - Claude Fruitcake EAP The new checkpoints were detec...
Sources: Anthropic is planning to release a public version of Mythos tomorrow - Will have substantial guardrails and not...
MiMo推出V2.5 Pro UltraSpeed超高速模型版本,每秒输出超1000 Token,号称全球首个达此速度的万亿参数模型。实测显示:复杂3D小游戏TPS 804 Token/s(峰值810),首次响应4.71秒;官网3D动画峰值1426 Token/s,首次响应0.83秒,32秒输出25624 Token(1000行代码);另一复杂官网3D效果TPS 1136,首次响应4.5秒。相比此前超高速推理方案常见能力下降,MiMo未出现此类迹象。该模型主要面向效率要求极高的ToB客户,在Agent和Sub-Agent并发场景下效率提升明显。
JUST IN: Anthropic will reportedly release its new AI model "Mythos" tomorrow.
New Claude model checkpoints (Possibly Mythos GA) - Claude Fable 5 - Claude Fruitcake EAP The new checkpoints were detec...
xAI推出视频生成模型grok-imagine-video-1.5-preview,目前在Artificial Analysis Video Arena的Image to Video (With Audio)排行榜中排名第二,仅次于字节跳动Seedance 2.0。该模型支持图像转视频并原生生成音频,最长可生成15秒视频。在无音频排行榜中位列第三,紧随Seedance 2.0和自家的grok-imagine-video。模型定价为每分钟视频$8.40,现已通过xAI API提供,并将逐步在Grok app和X上线。
面壁智能 OpenBMB 发布 VoxCPM2 技术报告。该模型为最新语音生成模型,拥有 2B 参数,基于超 200 万小时多语言语音数据训练,支持 30 种语言和 9 种中文方言。具备自然语言语音设计、可控及高保真延续性语音克隆能力。技术报告涵盖架构设计、统一序列公式、AudioVAE 高保真语音重建、大规模训练评估,以及零样本和指令跟随 TTS 基准结果。采用 16kHz 语义编码 + 48kHz 波形重建,在公开 TTS 基准上达到 SOTA 或极具竞争力。模型权重、微调代码和推理工具以 Apache 2.0 开源。
小米 MiMo 联合 TileRT_AI 发布 MiMo-V2.5-Pro-UltraSpeed,首次在 1 万亿参数 MoE 模型上实现超过 1,000 tokens/s 输出速度,仅用单台标准 8-GPGPU 节点(非 Cerebras 或 Groq 方案)。提供限时免费聊天体验,UltraSpeed API 价格为 3 倍,输出体验提升约 10 倍。申请时间为 6 月 8 日至 23 日(PDT),企业可邮件联系 business-mimo@xiaomi.com。
关联讨论 3 条Hacker News 热门(buzzing.cc 中文翻译)公众号:小米 MiMoIT之家(RSS)CJ Zafir团队发布Mac-1模型(6.6B参数),可在任何Mac本地运行,仅需7GB内存(12GB更佳)。它支持487个MacOS原生工具,能执行多工具链式调用,推理开启,输出速度约65 tok/s。应用层基于Mac原生UI/UX设计。作者认为这种本地小模型+原生工具的组合直接挑战云端SaaS agent,甚至可能抢了苹果Siri的活儿。
Here's a teaser of our Mac-1 model. > 6.6B model > runs locally (on any Mac) > requires 7GB RAM (12GB ideal) > can use 4...
👀 Mythos est apparu quelques secondes chez Anthropic ! Son nom sera Claude Mythos 5 : La meilleure classe de modèle ne ...
Google 发布 Gemma 4 的 QAT(量化感知训练)检查点,将最小模型从 11.4GB 缩小至 1.1GB(纯文本版 0.84GB),便于手机和笔记本运行。常规 PTQ(训练后量化)因模型未学会应对舍入而损伤质量;QAT 在训练中模拟压缩,让模型在权重被挤压时学习,压缩版不易丢失推理能力。Google 还构建了移动端优化格式,包含静态激活、通道量化、定向 2-bit 量化及 KV 缓存优化,减少手机缩放计算并防止长对话过快消耗内存。
Google DeepMind 发布 Gemma 4 QAT 量化感知训练模型,专为本地 / 设备端优化。通过量化感知训练减少内存占用,同时相比标准训练后量化保留更多质量。支持 Q4_0 格式及新的移动专用量化格式。Gemma 4 E2B 版本可运行于约 1GB 内存,纯文本版本甚至低于 1GB,使手机、笔记本、边缘设备和消费级 GPU 上的本地 AI 更实用。
谷歌发布 Gemma 4 量化感知训练 (QAT) 检查点,支持在消费级 GPU 和移动设备上本地运行,质量损失极小。新检查点提供 GGUF(Q4_0)格式,覆盖所有尺寸及起草模型,实现最佳本地性能。自定义移动模式采用混合精度方案,将 Gemma 4 压缩至 1GB 以下,包含 2-bit 解码层、优化 KV 缓存和静态激活。通过在训练中模拟压缩(而非训练后量化),大幅降低内存占用并加速解码,同时保持推理质量。
关联讨论 5 条Google Developers Blog(RSS)The Decoder:AI News(RSS)Hacker News 热门(buzzing.cc 中文翻译)X:Jeff Dean (@JeffDean)X:Google AI for Developers (@googleaidevs)MYTHOS 🔥: Another early preview of recently spotted "Oceanus" checkpoint output. "Oceanus" is rumored to be a version o...
Seeing as Claude Mythos is releasing soon, I have two VERY astonishing outputs to share from it. 👀 ZERO-SHOT and LOW ef...
The updated Grok-build model (still the 0.5T one) is much better than before. It's less lazy, more autonomous, and more ...
Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It del...
Introducing Magenta RealTime 2 🎺 - Open model for live music generation - Just 2.4B parameters, perfect for on-device -...
Introducing Magenta RealTime 2 (MRT2): the live music model you can play as an instrument. MRT2 offers MIDI and prompt c...
关联讨论 1 条IT之家(RSS)NVIDIA 正式发布 Nemotron 3 Ultra,550B 总参数(55B 活跃)的完全开源 MoE 模型,权重、训练数据和完整配方全部公开。采用混合 Mamba-Attention 架构,专为长上下文快速解码和轻内存占用设计。在长输出智能体工作负载上,吞吐量约为可比开源模型的 6 倍(推理速度提升 5 倍),复杂智能体任务成本降低最多 30%。该模型在 4-bit(NVFP4)精度下预训练 20T tokens,后训练使用 MOPD 技术,由十余个专家教师模型蒸馏技能至学生模型。这是首个达到前沿水平且可完全复现的开源模型。
Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It del...
关联讨论 5 条X:Perplexity (@perplexity_ai)IT之家(RSS)X:opencode (@opencode)LMSYS:Blog(Chatbot Arena 团队)X:Artificial Analysis (@ArtificialAnlys)商汤SenseTime发布SenseNova U1,一个原生理解和生成文本与图像的统一模型。该模型已开源,用户可自行运行。被@gurru_tech称赞“令人印象深刻”。提供在线演示平台SenseNova Studio、HuggingFace模型、GitHub代码及Discord社区。
关联讨论 1 条X:商汤 SenseTime (@SenseTime_AI)neolab 推出 Nex-N2-Pro,基于 Qwen3.5-397B-A17B,总参数 397B 的 MoE 推理模型,支持 262K 上下文与多模态(VLM),性能达到 GPT-5.5 和 Claude Opus 4.7 级别。模型可自动调节推理深度,减少 30-50% 思考 token 且无性能折损,在 Terminal Bench 2.1、GDPVal、SWE-Verified 上取得 SOTA。擅长智能体编码、深度搜索和工具使用,兼容 Claude Code、Cursor 等工具。硅基流动已提供 T+0 支持,前两周免费使用。
商汤 SenseTime 推出 SenseNova U1 开源多模态模型,实现原生理解与生成文本和图像,可一键将提示词转化为专业信息图。该模型被开发者 @gurru_tech 评价为“非常令人印象深刻”。项目已开源,提供 SenseNova Studio 在线试用,并公开 HuggingFace 模型集合、GitHub 源码仓库及 Discord 社区入口。
关联讨论 1 条X:商汤 SenseTime (@SenseTime_AI)