AI带来的并非平权,而是K型分化。头部用户已默认理解Agent的组成:文档、规则、memory、loop、MCP、CLI、工具调用、权限、安全沙箱、上下文工程、定时任务、心跳、文件系统、代码执行和Skill;普通用户只知道"Agent能写代码"。做好Skill是跨越鸿沟的唯一解法。作者正与藏师傅一起通过Cola帮助大众真正跨越鸿沟。
http://x.com/i/article/2065096982310567936
最近跟藏师傅聊天,都感觉到深深的共鸣。 大众以为 AI 带来平权,但实际带来的是 K 型分化。 头部用户已经默认理解 Agent 的组成:文档、规则、memory、loop、MCP、CLI、工具调用、权限、安全沙箱、上下文工程、定时任务、心跳、文件系统、代码执行和 Skill。 普通用户只知道"Agent 能写代码"。 怎么办?把技能做好,是跨越鸿沟的唯一解法。 我们正在和藏师傅一起做一点实际的事情,让 Cola 帮助大众真正跨越鸿沟。
译AI带来的并非平权,而是K型分化。头部用户已默认理解Agent的组成:文档、规则、memory、loop、MCP、CLI、工具调用、权限、安全沙箱、上下文工程、定时任务、心跳、文件系统、代码执行和Skill;普通用户只知道"Agent能写代码"。做好Skill是跨越鸿沟的唯一解法。作者正与藏师傅一起通过Cola帮助大众真正跨越鸿沟。
Replit is taking over NYC and we can't wait to see you at #Vibecon. 2 days of art, code and the biggest tastemakers in culture. Get the details on http://vibecon.ai @BrandNewSchool
译Replit 正在接管纽约,我们迫不及待在 #Vibecon 与你相见。为期两天的艺术、代码与文化潮流引领者盛宴。详情请见 http://vibecon.ai @BrandNewSchool
Beautiful paper from Google DeepMind. Explains the pathways from AGI to ASI, and why that jump could happen through several routes. The authors frame the AGI-to-ASI transition around 4 technical pathways: - continued scaling of compute, model size, data, and test-time inference; - algorithmic paradigm shifts beyond today’s transformer-based foundation-model stack; - recursive self-improvement, where AI accelerates AI R&D and improves future systems; and - multi-agent collective intelligence, where large populations of specialized agents coordinate into a superhuman group agent. Scaling may work for a while, but it could hit limits in data, compute, energy, or weaker returns from making systems larger. Recursive improvement is the most uncertain path, because AI could speed up AI research, but that loop may also slow if hard research problems need real-world testing, scarce hardware, or new ideas. Multi-agent collectives may be the most underappreciated path, because a society of competent digital workers could outperform a brilliant individual model through specialization, speed, and coordination. The big point is that ASI may not arrive as 1 sudden event, but as a chain of faster changes as AI helps create better AI and stronger scientific tools. ---- Link – arxiv. org/abs/2606.12683 Title: "From AGI to ASI"
译Google DeepMind新论文提出从通用人工智能到超级智能的四条路径:持续扩展(计算、模型规模、数据、测试时推理)、算法范式革新(超越Transformer架构)、递归自我改进(AI加速自身研发)、多智能体集体智能(众多专业AI智能体协作出超人类智能)。扩展可能遇到数据、算力、能源瓶颈;递归改进最不确定;多智能体路径最易被低估,通过专业化与协调能超越单个强模型。ASI可能不是单次跃迁,而是AI辅助创造更好AI的加速链。
Add near real-time voice translation to your apps with Gemini 3.5 Live Translate via the Gemini Live API. 🎙️ Watch how the model handles live broadcast ingestion and translation with continuous speech-to-speech streaming (S2ST) and synced transcripts, letting users tune into global radio broadcasts in their native language.
译通过 Gemini Live API 中的 Gemini 3.5 Live Translate,为你的应用添加近实时语音翻译。🎙️ 观看模型如何处理实时直播流和翻译,包括连续语音到语音流(S2ST)和同步转录文本,让用户能以母语收听全球广播节目。
我观察到身边朋友同事们的 ADHD 越来越严重了。 很容易被细小琐碎的事分散注意力, 反而对大问题视而不见。 把关掉通知,独自沉浸在一件完整的大事里,变得越来越不可能。 进入心流,也变得越来越难。 AI 的高速执行,还加重了这个问题。 每两三分钟一次的对话,是一次次注意力集中和注意力涣散的交替循环。 我们该如何来拯救自己的前额叶呢?
译观察到身边朋友同事的ADHD(注意力缺陷多动障碍)越来越严重:容易被琐事分散注意力,对大问题视而不见,关掉通知、沉浸大事变得不可能,进入心流也变难。AI的高速执行加重了这一问题——每两三分钟一次的对话,形成注意力集中与涣散的交替循环。推文最终发问:该如何拯救自己的前额叶?
day-0 in @vllm_project and it comes with: dedicated MSA prefill/decode kernels, 1M-context serving with prefix caching + chunked prefill, BF16 + MXFP8 on both Hopper and Blackwell 🚀 this is what open-weight done properly looks like. thanks @vllm_project, @NVIDIAAI, @AIatAMD, @inferact
译MiniMax M3 发布,具备前沿编码与智能体能力,原生图像视频输入和计算机使用,1M-token 上下文。核心采用 MSA 稀疏注意力:每个 query 评分 128-token KV 块,仅对 top 块做注意力。vLLM 当日即支持 M3,包括专用 MSA prefill/decode 核、前缀缓存与分块 prefill、BF16 和 MXFP8 检查点、Hopper 与 Blackwell 的 MoE 后端,并在 NVIDIA 与 AMD 硬件上验证。同时支持原生多模态输入、工具调用、推理解析和思考模式控制等智能体工作负载。
With only ~428B params, and ~23B activated params M3 still handles frontier coding + long-horizon agents + native multimodal (text, image, video) at 1M-token context few open-weight models do any of this. M3 does all of it. Thanks @baseten 🚀
译MiniMax 开源 M3 模型,约 428B 总参数、23B 激活参数,支持前沿编码、长周期智能体任务及原生多模态(文本、图像、视频),上下文窗口达 1M token。开放权重,可在 Baseten 部署。在少于 500B 参数的模型中,能同时兼顾编码、智能体工作负载和 1M 上下文的模型极少,M3 全部实现。
M3 now on @FactoryAI droid
译MiniMax M3 现已登陆 FactoryAI Droid。
Claude Convey Agent will be released as a Labs project, similar to Claude Design. > Conway is a managed agent for Claude that will run in a remote container. > Users will be able to install different custom UI Tabs and plugins for Conway. And it might be bigger than you think 👀
译Claude Convey Agent 将作为 Labs 项目发布,类似于 Claude Design。 > Conway 是一个由 Claude 管理的 Agent,将在远程容器中运行。 > 用户将为 Conway 安装不同的自定义 UI 标签和插件。 而且它可能比你想象的更大 👀
IMO sth that is a bit overlooked but will become far more important in the future. GPT is 10-20x more token+cost effective for ~similar outcome.
译Peter Steinberger 指出 GPT 在 token 消耗和成本上比 Fable 高效 10-20 倍,且能达到相似结果。@thorstenball 的对比测试印证:让 Fable 和 deep^2 完成相同的 CLI、Web 服务器等多端功能,deep^2 花费 $20(首次未通过但可修复),Fable 运行 1 小时 40 分、花费 $350(首次成功)。后续追问后 Fable 总花费达 $457,deep^2 预计最多 $40,差距约 17 倍。
10 months later, I gave Claude Code with Fable the same brief, asking it to construct SimRefinery from surviving screenshots and documentation. Fully playable, with a learning mode & all sorts of sophistication. Look at the difference from the old version! https://simrefinery.netlify.app/
译10个月后,Ethan Mollick 再次向 Claude Code 和 Fable 下达同一指令——根据幸存截图和文档重建失传的 Maxis 模拟游戏 SimRefinery。新版本完全可玩,包含学习模式等多种复杂功能,与10个月前 ChatGPT Codex 仅凭一篇文章和截图快速搭建的可玩原型形成鲜明对比。当时他未写一行代码,仅偶尔提小修改请求。
How Lay Bankz turned a few keyboard notes into a psychedelic rock sample
译Lay Bankz 如何将几个键盘音符转变为一段迷幻摇滚采样。
I had already wondered how Apple manages to perform inference at Google while simultaneously protecting their privacy, essentially their unique selling point. The answer: the heaviest requests run on Blackwell B200s inside Google Cloud, with NVIDIA's Confidential Computing encrypting the data while it's processed, so neither Google nor Apple can see it. "NVIDIA Confidential Computing provides a hardware-based security layer for accelerated AI workloads. The technology protects data while it’s being processed by isolating workloads in trusted execution environments and enabling systems to cryptographically verify that the infrastructure has not been tampered with before any sensitive data is sent to the server."
译Kim解释Apple如何在Google Cloud上执行推理时保护隐私:最重的请求运行在Google Cloud的Blackwell B200s上,利用NVIDIA Confidential Computing提供基于硬件的安全层,将工作负载隔离在可信执行环境中加密处理数据,确保Google和Apple都无法看到数据。
Project Ire examined a timely malware sample and determined its intent through reverse engineering—identifying LOTUSLITE characteristics even as most major EDR tools did not detect it. https://msft.it/6011viy4N
译Project Ire 分析了一个及时的恶意软件样本,并通过逆向工程确定其意图——识别出 LOTUSLITE 特征,即使大多数主流 EDR 工具未检测到它。https://msft.it/6011viy4N
Kimi 2.7 Code now available in Go text · image · optimized for coding similar pricing as 2.6
译Kimi 2.7 Code 现已在 Go 中可用 文本 · 图像 · 针对编码优化 定价与 2.6 相似
Text-to-SQL might sound like a solved problem. Far from it. Data gets messy and complex really fast in the real world. Strong reasoning models are great, but nothing beats a custom model at this stuff. Gemini-SQL2 looks very strong here. BIRD is a tough benchmark. I suspect there are plenty of opportunities like this in KBs, search, graph databases, etc.
译GoogleResearch推出Gemini-SQL2,基于Gemini 3.1 Pro,在BIRD benchmark上达到Text-to-SQL的SOTA结果,能将自然语言翻译为可直接执行的SQL查询。DAIR.AI的Elvis Saravia指出,现实世界数据复杂混乱,尽管强推理模型表现不错,但定制模型(如Gemini-SQL2)在此类任务上更胜一筹。他认为在知识库、搜索、图数据库等领域也存在类似机会,BIRD是一个非常具有挑战性的基准。
Looking at the graph, I think Fable 5 will only maintain its lead up to GPT-5.6. And secondly, I think the benchmark will soon be completely saturated.
译观察图表,我认为 Fable 5 只会保持领先直到 GPT-5.6。 其次,我认为该基准测试很快就会完全饱和。
I'm messing around with an agent flow for combining Hyperframes with Gemini video analysis to make interesting annotated videos.
译我正在尝试一种智能体流程,将Hyperframes与Gemini视频分析相结合,制作有趣的注释视频。
Even the Mayor knows where the vibes are 👀 http://vibecon.ai
译连市长都知道气氛在哪里 👀 http://vibecon.ai
appreciate it @SambaNovaAI 🤝 looking forward to M3 on RDUs
译SambaNovaAI 祝贺 MiniMax 发布 M3 开源模型,并表示未来将在其 RDUs 上支持 M3。MiniMax 表示感谢并期待这一合作。
Kimi-K2.7-Code is now available on AI/ML API 👀 > Kimi K2.7 Code is the latest agentic coding model from Kimi AI that supports extended reasoning and tool use. > AI/ML API is a single gateway to Chat, Reasoning, Image, Video, Audio, Voice, Search, and World models under one bill. Kimi K2.7 Code can be tested on both Playground and APIs.
译月之暗面最新智能体编码模型 Kimi-K2.7-Code 已在 AI/ML API 平台上线,支持扩展推理和工具使用,可通过 Playground 和 API 测试。为验证其自我修正能力(而非一次性生成),研究者让四个 Kimi 智能体运行一个 2D 飞行物理模拟,目标是从发射到入轨并让助推器着陆。四次飞行中:第一次在最大动压处解体;第二次过关但分离过早失败;第三次成功入轨但未抓住着陆船;第四次修正着陆计算后成功着陆。该过程展示了模型通过迭代闭环调试从失败中自动学习。
Codex is how @ndrewpignanelli at @intelligenceco updates multiple parts of a website in parallel, turning a week of work into three days.
译Codex 让 @intelligenceco 的 @ndrewpignanelli 能够并行更新网站的多个部分,将一周的工作量缩短为三天。
means a lot coming from @NVIDIAAI free GPU-accelerated M3 endpoint are live now go try it 👇
译来自@NVIDIAAI的认可意义重大 免费GPU加速的M3端点现已上线 快来试试👇
powerful & cool way to navigate a website, makes it feel so much more interactive and intuitive
译OpenAI 在开发者文档网站上线了新的文档智能体,可帮助查找产品相关信息并直接跳转到对应文档。Greg Brockman 表示这是一种强大且酷的网站导航方式,让交互更加直观。
So looks like @SpaceX will spend 2.5% of its market cap to buy @cursor_ai at 15x revenue 👀
译看起来 @SpaceX 将花费其市值的 2.5% 以 15 倍营收收购 @cursor_ai 👀
The shape of the graph is getting very familiar.
译Claude Fable 5 在 FrontierMath 基准测试(Tiers 1-4, v2)中表现优异,Tiers 1-3 得分 87%,Tier 4 得分 88%,延续了 Anthropic 模型数学能力快速提升的趋势。主推文评论道:“图形的形状越来越熟悉了。”
If the world refuses to give you moonlight, light the moon yourself. The Uninvited Sea — PixVerse Originals S1. Built on Canvas. A healing music animation by PixVerse CPP JaneDoeCreates . RT+Follow+Reply = 150 Creds & Full Film + Workflow in DMs (72H ONLY)
译如果世界拒绝给你月光,那就自己点亮月亮。 《不请自来的海》— PixVerse Originals S1,基于 Canvas 构建。 由 PixVerse CPP JaneDoeCreates 创作的治愈系音乐动画。 转发+关注+回复 = 150 积分 及 完整电影+工作流私信(仅 72 小时)
Claude Fable 5 scores very well on FrontierMath: Tiers 1–4 (v2), reaching 87% on Tiers 1–3 and 88% on Tier 4. This continues a streak of Anthropic models improving rapidly at math.
译Claude Fable 5 在 FrontierMath(Tiers 1–4,v2)上得分很高,在 Tiers 1–3 上达到 87%,在 Tier 4 上达到 88%。这延续了 Anthropic 模型在数学上快速提升的趋势。
Fine-grained 3D motion control in AI video just got a little bit closer
译@andrew_n_carr 宣布“编辑视频运动!放弃提示开始导演”,并展示其“通用视频编辑器”工作流:先用 comic 4 捕捉视频,再用运动编辑器修改动作,最后用视频到视频模型(如 Runway、Gemini)重新渲染。他以时装片段为例,希望模特展现高抬腿活力,无需重拍。主推文 fofr 表示,AI视频中精细的3D运动控制已更近一步。
New video is out! You no longer build one thing at a time on Replit. Run parallel agents to ship a website, mobile app, video, and pitch deck from one project, all at once. And you can now add multiple artifacts to projects you already have.
译新视频发布了!你在 Replit 上不再一次只能构建一件事。 运行并行 AI 智能体,从一个项目中同时交付网站、移动应用、视频和推介材料。 而且你现在可以向已有的项目中添加多个工件。
Ask our developer docs. They’ll show you the way The new docs agent on 🔗http://developers.openai.com helps you find answers about OpenAI products and takes you directly to the relevant documentation.
译咨询我们的开发者文档。它们会为你指路。 新的文档智能体在 http://developers.openai.com 上,帮你找到关于 OpenAI 产品的答案,并直接带你到相关文档。
M3 is live on @telnyx Inference on day-0 go build with Telnyx and M3 today
译MiniMax M3现已登陆Telnyx推理平台。M3是首个结合前沿编码与智能体能力的开源权重模型,拥有1M token上下文窗口和原生多模态理解。凭借M3的1M上下文与Telnyx自有GPU基础设施,一次对话即可处理整个代码库。官方鼓励开发者立即使用。
day-0 and already on @FireworksAI_HQ with blazing fast inference long-horizon agents, full-repo understanding, multimodal coding all in one model Try M3 today on Fireworks AI
译MiniMax M3 已在 Fireworks AI 上线,Day-0 即获最快推理端点。模型为开源权重,在 Artificial Analysis 指数排名第一。支持 512K 上下文窗口、原生图像及视频输入;采用 MSA 稀疏注意力机制,实现 9 倍更快的 prefill 与 15 倍更快的 decode。定价与 M2.7 持平。M3 将长周期智能体、全仓库理解与多模态编程集成于单一模型。
SpenseGPT Practical One-shot Pruning Enabling Sparse and Dense GEMMs for LLM Inference
译SpenseGPT 实用的一次性剪枝,实现LLM推理的稀疏和密集GEMM
Run M3 locally today with @UnslothAI
译MiniMax-M3 是一款拥有 428B(23B 激活)参数、1M 上下文的新开源模型,性能与 Gemini 3.1 Pro 相当。可在 138GB 内存/显存上运行动态 2-bit GGUF 版本,或 165GB 上运行 3-bit 版本。在 @UnslothAI 的帮助下,今天即可本地运行 M3。
Most AI agents do not forget because they lack memory; they fail because they remember badly. AGENTCL asks a simple question: does an AI agent really learn from experience, or merely carry clutter forward? Today's agents can spend enormous effort solving one task, then enter the next one almost as if nothing happened. AGENTCL says AI agents need better tests for whether their memory actually helps them learn across tasks. The paper’s main idea is to build task streams where earlier tasks clearly contain pieces that later tasks can reuse, such as a small coding function, evidence for a research question, or a useful workflow. It compares these careful “compositional” streams with normal “naive” streams, where tasks come from the same area but do not have a guaranteed reuse link. Agent memory is easy to overrate when the benchmark is messy. If tasks are not carefully connected, a memory system may look good for the wrong reason, or bad for a reason the test cannot explain. AGENTCL tries to fix that by making the task relationships clear, then measuring whether memory helps on later tasks, stays useful, and transfers to unseen tasks. The key finding is that today’s memory methods can reuse past work when the connection is obvious, but they still struggle to avoid confusion when the next task is different. ---- Link – arxiv. org/abs/2606.02461 Title: "AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents"
译AGENTCL 提出评估 AI 智能体是否真正从经验学习,而非单纯累积信息。通过构建组合任务流(前序任务包含可被后续任务复用的代码片段、研究证据或工作流),与无固定复用线索的随意任务流对比。关键发现:当前记忆方法在任务连接明显时可复用过去经验,但当任务差异较大时仍难以避免混淆。论文旨在为智能体持续学习提供更清晰的测评标准。
Claude Managed Agents can operate in a sandbox you control, on your own infrastructure or with any provider you choose. Today we added new guides for @blaxelAI, @e2b, @googlecloud, @namespacelabs, and @superserve_ai, so you can choose the best fit for your use case.
译Claude 托管智能体可以在您控制的沙盒中运行,在您自己的基础设施上或您选择的任何提供商上运行。 今天我们新增了针对 @blaxelAI、@e2b、@googlecloud、@namespacelabs 和 @superserve_ai 的指南,以便您选择最适合您用例的方案。
How to effectively run autonomous long-running coding agents? This is one of the most exciting discussions on agents I've ever had. I recorded it and am making it freely available. (bookmark it) The idea of autonomous long-running agents is a real thing. We talk about lots of things like /goal, /loop, and dynamic workflows, and what comes next. One interesting discussion was around how to make the agent run for longer while ensuring it stays on track. Most models today will struggle to coordinate work effectively. They sometimes pause the work early. Lots of mistakes happen, and lots of weird shortcuts (reward hacking). What helps is to be extremely clear about the goals it needs to achieve. To clarify the dos and don'ts clearly. Eliminate any assumptions you think the model would make. Deep expertise matters so much in this. But you can get far through careful planning. My formula currently is to use Opus 4.8 for planning carefully and GPT-5.5 for all executions. For the evaluator (via /goal), I am often using something like Deepseek or the latest models from Qwen, Kimi, and MiniMax, etc. Another insight we discussed to enforce goals is to provide strong visual cues for the agent to compare with. I found that a multimodal goal is a much stronger goal than a plain text one. And use agents to help you set clear goals. Watch here: https://academy.dair.ai/events/cmplo7v3b000e04l1pxprat4d
译DAIR.AI创始人Elvis Saravia分享如何有效运行长期自主编码智能体。他指出当前多数模型难以协调工作,会过早暂停、犯错或走捷径(reward hacking)。关键在于明确目标、消除假设,避免模型自行推断。他的实践公式:用Opus 4.8进行细致规划,GPT-5.5执行所有步骤,评估器(通过/goal)则使用Deepseek及Qwen、Kimi、MiniMax等最新模型。另一关键洞察是提供多模态视觉线索作为目标,比纯文本目标更强,能更好地约束智能体。完整讨论已录制并免费开放。
Victorian gothic nightmares, one Canvas workflow. See how @Shanzyin_ai built THE DREAM EATERS on PixVerse Canvas — nodes, shots, and the full project file, open to explore.
译PixVerse 展示 AI 电影制作人 @Shanzyin_ai 使用 Canvas 工作流创作的维多利亚哥特风格短片《THE DREAM EATERS》。短片包含完整节点、多个镜头及项目文件,开放探索。剧情设定为古老庄园中青少年被迫吞噬权贵噩梦,一名有缺陷的新兵将黑暗拖回现实。PixVerse 推出限时活动:转发+关注+回复“DREAM”,72 小时内可获得 150 Credits 及该工作流。
AI带来的并非平权,而是K型分化。头部用户已默认理解Agent的组成:文档、规则、memory、loop、MCP、CLI、工具调用、权限、安全沙箱、上下文工程、定时任务、心跳、文件系统、代码执行和Skill;普通用户只知道"Agent能写代码"。做好Skill是跨越鸿沟的唯一解法。作者正与藏师傅一起通过Cola帮助大众真正跨越鸿沟。
http://x.com/i/article/2065096982310567936
Google DeepMind新论文提出从通用人工智能到超级智能的四条路径:持续扩展(计算、模型规模、数据、测试时推理)、算法范式革新(超越Transformer架构)、递归自我改进(AI加速自身研发)、多智能体集体智能(众多专业AI智能体协作出超人类智能)。扩展可能遇到数据、算力、能源瓶颈;递归改进最不确定;多智能体路径最易被低估,通过专业化与协调能超越单个强模型。ASI可能不是单次跃迁,而是AI辅助创造更好AI的加速链。
观察到身边朋友同事的ADHD(注意力缺陷多动障碍)越来越严重:容易被琐事分散注意力,对大问题视而不见,关掉通知、沉浸大事变得不可能,进入心流也变难。AI的高速执行加重了这一问题——每两三分钟一次的对话,形成注意力集中与涣散的交替循环。推文最终发问:该如何拯救自己的前额叶?
🎉 Congrats to @MiniMax_AI on releasing MiniMax M3! Frontier coding and agentic capabilities, native image and video inp...
关联讨论 1 条X:MiniMax (@MiniMax_AI)Congrats to the MiniMax team on the open-source launch of M3! There are very few <500bn parameter models that can tackle...
Day 3 with Fable. Gave a huge prompt to implement a feature across CLI, web server, and another server to both Fable and...
I gave ChatGPT Codex an article & screenshot from a famous, lost Maxis simulation, SimRefinery, and asked it to create i...
Kim解释Apple如何在Google Cloud上执行推理时保护隐私:最重的请求运行在Google Cloud的Blackwell B200s上,利用NVIDIA Confidential Computing提供基于硬件的安全层,将工作负载隔离在可信执行环境中加密处理数据,确保Google和Apple都无法看到数据。
🚀 Introducing Gemini-SQL2, our breakthrough text-to-SQL capability powered by Gemini 3.1 Pro! We've achieved state-of-t...
Claude Fable 5 scores very well on FrontierMath: Tiers 1-4 (v2), reaching 87% on Tiers 1-3 and 88% on Tier 4. This conti...
Congrats to our partners at @MiniMax_AI on the launch of MiniMax M3. Open-weight models continue to push the ecosystem f...
Kimi K2.7-Code is now available on AI/ML API! Moonshot's latest is built for long-horizon agentic coding that self-corre...
Congrats to the @MiniMax_AI team on the release of MiniMax M3, a long-context multimodal model for text, image, and vide...
Ask our developer docs. They'll show you the way The new docs agent on 🔗http://developers.openai.com helps you find ans...
Claude Fable 5 scores very well on FrontierMath: Tiers 1-4 (v2), reaching 87% on Tiers 1-3 and 88% on Tier 4. This conti...
EDIT MOTION IN VIDEOS!!! Quit prompting and start directing I've been shouting for YEARS about 3D as the control layer. ...
@MiniMax_AI M3 is live on Telnyx Inference 🚀 M3 is the first open-weight model combining frontier coding & agent capabi...
MiniMax M3 is live on Fireworks. Day-0, fastest endpoint for the MiniMax series. → Top open-weight model on the Artifici...
MiniMax M3 can now be run locally!🔥 MiniMax-M3 is a new 428B (23B active) open model with 1M context that performs on p...
AGENTCL 提出评估 AI 智能体是否真正从经验学习,而非单纯累积信息。通过构建组合任务流(前序任务包含可被后续任务复用的代码片段、研究证据或工作流),与无固定复用线索的随意任务流对比。关键发现:当前记忆方法在任务连接明显时可复用过去经验,但当任务差异较大时仍难以避免混淆。论文旨在为智能体持续学习提供更清晰的测评标准。
DAIR.AI创始人Elvis Saravia分享如何有效运行长期自主编码智能体。他指出当前多数模型难以协调工作,会过早暂停、犯错或走捷径(reward hacking)。关键在于明确目标、消除假设,避免模型自行推断。他的实践公式:用Opus 4.8进行细致规划,GPT-5.5执行所有步骤,评估器(通过/goal)则使用Deepseek及Qwen、Kimi、MiniMax等最新模型。另一关键洞察是提供多模态视觉线索作为目标,比纯文本目标更强,能更好地约束智能体。完整讨论已录制并免费开放。
An ancient estate. Teenagers forced to devour the nightmares of the powerful. One defective recruit who drags the darkne...