Claude Code:智能体进化的一年
本期金句 · Key Quotes
- "Every single time Claude makes a mistake, I don't tell Claude to do it differently, I tell it to write it to the CLAUDE.md." / "每次 Claude 犯了错,我不直接告诉它怎么做,而是让它把这件事写进 CLAUDE.md。"
- "I don't talk to an agent anymore. I talk to loop or I talk to a routine. And it prompts Claude for me." / "我不再和智能体对话了,我和 loop 或例行程序对话,由它们来替我提示 Claude。"
- "If an automation doesn't work 100% of the time, it's not really an automation." / "一个只跑通 95% 的自动化,不是真正的自动化。"(引自 Cat Wu)
- "All the roles are merging." / "所有的角色都在融合。"
- "Put Claude at the center of everything." / "把 Claude 放在一切事物的中心。"
— 对话开始 —
1. 回忆:Claude Code 刚发布时的样子
Catherine Wu
When we first released Claude Code, it was like a little video I ever posted, I got to Slack and there was like two people that gave like the reaction… And like people were like excited. I thought it was really cool. Especially for my very easy engineering tasks, it was quite good at it.1
Catherine Wu
当我们首次发布 Claude Code 时,就像我发过的一个小视频,我上了 Slack,大概只有两个人给了类似的回应表情……人们当时还挺兴奋的。我觉得这真的很酷。特别是在处理我那些非常简单的工程任务时,它做得相当不错。1
Boris Cherny
That's like a really nice way to say that it wasn't really good. (Laughs)1
Boris Cherny
这真是一种非常委婉的说法,其实就是说它当时并不怎么好。(笑)1
Catherine Wu
I can't believe it's only been a year since we first launched Claude Code.2
Catherine Wu
我真不敢相信,我们首次推出 Claude Code 至今竟然才过了一年。2
Boris Cherny
It's hard to remember what that was like. Like it is, it's so different than what we're doing today. Like now I just have like armies of agents that are doing stuff. Like, I'm prompting one agent, or I have like an agent that's like prompting agents that's prompting agents, and it's like a tree of like thousands of agents.2
Boris Cherny
很难回忆起当时是什么样子了。现在的状况和我们今天所做的完全不同。现在我手下有一整支智能体大军在做事。比如,我在提示一个智能体,或者我有一个智能体在提示其他智能体,而这些智能体又去提示另一些智能体,就像一棵由数千个智能体组成的树。2
2. 最重要的一课:把错误写进 CLAUDE.md,验证比你想象的更重要
Boris Cherny
But I think it's just like the most important idea when working on this stuff is like every single time Claude makes a mistake, I don't tell Claude to do it differently, I tell it to write it to the CLAUDE.md or to like make a skill or something to do it differently. And if you can do this, then Claude can just like run forever. And I think the other thing that we kind of realized is the verification is really important. Like we didn't realize that.2
Boris Cherny
但我认为,做这些工作时最重要的一点是,每次 Claude 犯了错,我不会直接告诉 Claude 换个方式做,而是告诉它把这件事写进 CLAUDE.md 文件,或者创建一个技能之类的方法来改变做法。如果你能做到这一点,Claude 就能永远运行下去。我觉得我们还意识到的另一件事是,"验证(verification)"非常重要。我们以前并没有意识到这一点。2
Catherine Wu
I hear this come up a lot with developers and enterprises that we meet with. Um, what are your tips for making a really good, making Claude Code really good at verification?3
Catherine Wu
我在我们接触的开发者和企业中经常听到这个话题。对于如何让 Claude Code 非常擅长验证,你有什么建议吗?3
Boris Cherny
I sort of feel like this is this thing that just like everyone misunderstands. Because whenever we talk about verification, people are thinking like unit tests or they're thinking like lint or like type check. These are the things that are obviously really easy to automate and these are the things that were already automated.
But actually when we talk about verification for agents, it's something slightly different. It's like, can the agent run the thing? It takes a little bit of mental work to figure out how exactly do you do this because it's often not straightforward. And I think that's like, that's one of the challenges.3
But actually when we talk about verification for agents, it's something slightly different. It's like, can the agent run the thing? It takes a little bit of mental work to figure out how exactly do you do this because it's often not straightforward. And I think that's like, that's one of the challenges.3
Boris Cherny
我觉得这件事几乎每个人都有误解。因为每当我们谈论验证时,人们想到的往往是单元测试,或者是 lint 代码检查,亦或是类型检查。这些显然是非常容易自动化的东西,而且这些东西早就已经自动化了。
但实际上,当我们谈论针对智能体的验证时,情况略有不同。核心在于:智能体能把这个东西运行起来吗?要弄清楚到底该怎么做,需要花一点脑力,因为它通常并不直接明了。我认为这就是挑战之一。3
但实际上,当我们谈论针对智能体的验证时,情况略有不同。核心在于:智能体能把这个东西运行起来吗?要弄清楚到底该怎么做,需要花一点脑力,因为它通常并不直接明了。我认为这就是挑战之一。3
Boris Cherny
I remember with Opus 4, Claude tested itself. And we just like hooked it up to Opus 4 and I was like, "Claude, build the feature and then test yourself in like bash." And it opened a little Claude CLI and tested its own feature. And I was just like, whoa, it's crazy.3
Boris Cherny
我记得在 Opus 4 的时候,Claude 测试了它自己。我们直接把它接入 Opus 4,然后我说:"Claude,构建这个功能,然后像在 bash 里那样测试你自己。"它就打开了一个小的 Claude 命令行界面,然后测试了它自己的功能。我当时就觉得,哇,这太疯狂了。3
Boris Cherny
Like now we're so used to it. Like, now you know, now we have these loops going for, you know like the iOS simulator and the Android simulator and like computer use for desktop. Like it's not surprising. But back then that was crazy. How are you doing it?3
Boris Cherny
虽然现在我们对这种事已经习以为常了,比如,现在你知道的,我们为 iOS 模拟器、Android 模拟器以及桌面端的计算机使用(computer use)运转了这些闭环机制。这已经不令人惊讶了。但在当时那简直太疯狂了。你现在是怎么做的呢?3
Catherine Wu
So I've been mainly hacking on the desktop app these days, and one of the engineers on the team actually added this desktop development skill that teaches Claude how to run the local desktop app. And I've been having it use it, and it still runs into issues or like bugs with the staging environment sometimes.4
Catherine Wu
这阵子我主要在折腾桌面端应用,团队里有一位工程师实际上添加了这个"桌面开发技能",教 Claude 如何运行本地桌面应用。我一直让它使用这个技能,但它有时在预发环境(staging environment)里还是会遇到问题或 bug。4
Catherine Wu
And so what I have it do is in those cases, I have it read Slack and understand, hey, is the staging down right now, or is there, has someone else already hit this? Um, and then when it debugs the whole issue, I tell it to update the desktop development skill.4
Catherine Wu
所以在这种情况下,我会让它去阅读 Slack,去了解:"嘿,现在预发环境是不是挂了?或者是不是已经有其他人遇到这个问题了?"然后,当它调试完整个问题后,我告诉它去更新这个"桌面开发技能"。4
Catherine Wu
What this skill does is Claude actually spins up a local desktop app and it uses computer use to click around on it. And so when I add a new UX, it clicks around to invoke the new UX. It also tests edge cases, and when there's an issue it fixes it and rechecks.4
Catherine Wu
这个技能的作用是,Claude 会真正启动一个本地桌面应用,并利用"计算机使用"能力在上面四处点击。所以当我添加了一个新的用户界面(UX)时,它会到处点击来调用这个新的 UX。它还会测试边缘情况,当发现问题时,它会修复它并重新检查。4
3. 角色融合:PM、设计师、工程师的边界正在消失
Boris Cherny
This is like honestly one of my favorite things about this team is everyone codes. I've never been on a team where like my PM would code. That it's like crazy, and like your code is like really good. Like…5
Boris Cherny
老实说,这是我最喜欢这个团队的一点,那就是每个人都写代码。我从来没呆过一个连产品经理(PM)都会写代码的团队。这简直太不可思议了,而且你写的代码真的非常好。比如……5
Catherine Wu
You're too nice.5
Catherine Wu
你太客气了。5
Boris Cherny
But I also just feel like it's also just becoming easier because it's like essentially Claude writes the code, and so what matters a little more is like what's the idea that you have. And I feel like if you're a person that has like the product context and the business context and you're thinking about the design and the user, you're just gonna come up with better ideas.5
Boris Cherny
但我也觉得这正在变得越来越容易,因为本质上是 Claude 在写代码,所以稍微更重要的反而是你有什么样的想法。而且我觉得,如果你是一个掌握产品上下文和业务上下文的人,并且你在思考设计和用户,你就一定能想出更好的点子。5
Catherine Wu
It's kind of like all the roles are merging.6
Catherine Wu
这就好像所有的角色都在融合。6
Boris Cherny
I remember seeing Meaghan, our designer's PRs, and I was just horrified at the beginning. I was like oh my god, why is Meaghan putting up PRs? And then she was like, yeah yeah I'm just like I'm fixing the button. And I was like okay all right well the code looks good, so maybe it's fine. And I feel like now it's just like it's totally normal.6
Boris Cherny
我记得当初看到我们的设计师 Meaghan 提交的代码拉取请求(PRs)时,我一开始被吓坏了。我当时想:天哪,为什么 Meaghan 在提 PR?然后她说:"对对,我只是在修复那个按钮。"然后我就觉得:"好吧行吧,这代码看起来不错,所以也许没问题。"而现在我觉得这种事完全习以为常了。6
Catherine Wu
Yeah. We see this across all the enterprises we talk with. Like it's the engineers that adopt Claude Code first. And then the adjacent roles look over their shoulder and they're like, whoa this thing is very powerful, let me try it out.6
Catherine Wu
是的。我们在所有沟通过的企业里都看到了这种现象。最开始通常是工程师先采用 Claude Code。然后相邻的职能角色看到后会想:"哇,这东西非常强大,让我也试试。"6
Catherine Wu
And we found it's crazy. We found that like our designers are more productive making prototypes and making changes directly in the app instead of pinging an engineer. PMs are making changes in the app.6
Catherine Wu
然后我们发现这很疯狂。我们发现我们的设计师直接在应用里做原型和修改反而效率更高,而不再需要去 ping 工程师了。PM 也在直接修改应用。6
Catherine Wu
Like our finance team runs in Claude Code, they do their projections there. Um, data science. Uh like if you talk with our data scientists, it's so cool. It's just like everyone just has Claude Codes up on their screens. Yeah. Um, I feel like it's, it's remarkably versatile for different roles.6
Catherine Wu
比如我们的财务团队都在 Claude Code 里运转,他们在那里做预测模型。嗯,还有数据科学团队。如果你和我们的数据科学家聊聊,那就太酷了。就是每个人屏幕上都开着 Claude Code。是的,我觉得它对于不同角色来说具有极其惊人的多功能性。6
4. 例行程序(Routines):让另一个 Claude 替你修 Bug
Boris Cherny
What do you feel like nowadays are like the use cases that are pushing the limits?7
Boris Cherny
你觉得现在有哪些使用场景是在挑战其极限的?7
Catherine Wu
One that I'm super excited about is routines. There's one engineer on our team who launched voice mode across all of our products. And, um, he has this routine set up that just listens for every ticket that comes, every GitHub issue, every bug report about voice mode, and his Claude just picks it up, proactively puts up a fix, and then pings the PR to him.7
Catherine Wu
有一个让我非常兴奋的场景是"例行程序(Routines)"。我们团队里有一位工程师,他在我们所有的产品中发布了语音模式。他设置了这样一个例行程序:专门监听每一个进来的工单、每一个 GitHub issue,以及每一个关于语音模式的 bug 报告,然后他的 Claude 就会接手,主动提交一个修复补丁,然后把 PR 发送给他。7
Catherine Wu
And when he got that working for voice mode, he thought, okay, we're getting a lot of other feedback that isn't being responded to. So, uh, he also set up a routine to listen for that.7
Catherine Wu
当他把语音模式的这套流程跑通后,他想:"好吧,我们还收到了很多其他没有被回复的反馈。"所以,他也为那些设置了一个监听例行程序。7
Catherine Wu
So I shipped this small feature, and there was like an edge case in it that I didn't see. And so someone filed a bug for it, and I was gonna get to the bug that night. And as my Claude was working, it said, wait a second, another Claude has already fixed this. And I was like, how is this possible? Like I've never talked to him about this feature before.7
Catherine Wu
有一次我发布了一个小功能,里面有一个我没注意到的边缘情况。有人为此提交了一个 bug,我本打算当晚去处理这个 bug。当我的 Claude 正在工作时,它说:"等一下,另一个 Claude 已经修复了这个问题。"我当时想:"这怎么可能?我之前完全没跟他谈过这个功能。"7
Catherine Wu
As I pinged him, and I was like, how did you fix this so quickly? And he said he has another routine that just looks for bug reports that haven't been responded to in 5 hours, and puts up a fix, and he merges the ones that are easy to verify.7
Catherine Wu
于是我去找他,问:"你是怎么这么快修复这个的?"他说他有另一个例行程序,专门寻找那些 5 小时内未被回复的 bug 报告并提交修复,而且他会直接把那些容易验证的 PR 合并进去。7
Boris Cherny
Mm. Claude tells me this like all the time now.8
Boris Cherny
嗯。现在 Claude 经常跟我说这种事。8
Catherine Wu
That someone else has already fixed it.8
Catherine Wu
就是别人已经修好了?8
Boris Cherny
There's always like another person's Claude that's working on it. It's like, yeah that's been one of the changes. I feel like we're um, a while ago we were trying to figure out like how to use routines, and I feel like just like the agent SDK was this first idea that we could use Claude Code programmatically. But I feel like at the beginning it just wasn't obvious how do we use it? What do we use it for?8
Boris Cherny
总是会有另一个人的 Claude 已经在处理它了。是的,这就是变化之一。我觉得,不久前我们还在试图弄清楚如何使用例行程序,我觉得像智能体 SDK 就是我们能够通过编程方式使用 Claude Code 的第一个初步想法。但我感觉一开始并不是很清楚我们该怎么用它?用它来做什么?8
Boris Cherny
And I think routines are the first really obvious application. And um I don't know, like it just does like all the code review. It babysits like every PR. You remember back in the day you used to actually have to like respond to code review comments? You used to have to like fix CI? You used to have to rebase? Yeah. Like I haven't done that in a long time.8
Boris Cherny
而我认为例行程序是第一个真正显而易见的应用。而且我不知道,比如它包揽了所有的代码审查。它像保姆一样照看每一个 PR。你还记得以前你实际上需要去回复代码审查的评论吗?以前你得去修复 CI 问题?以前你得去变基(rebase)代码分支?是的。我已经很久没做过这些事了。8
5. Auto 模式:告别权限提示轰炸,让另一个模型来把关安全
Catherine Wu
Yeah. When you're in the CLI and you're synchronously working with Claude, what are your go-to features?9
Catherine Wu
是的。当你在命令行(CLI)中与 Claude 同步工作时,你最常用的功能是什么?9
Boris Cherny
Okay, what it used to be is plan mode. I don't use that anymore.9
Boris Cherny
好吧,以前是计划模式(plan mode)。我现在不用那个了。9
Catherine Wu
What do you use instead?9
Catherine Wu
那你现在用什么替代?9
Boris Cherny
Auto mode. Auto mode. It's the best.9
Boris Cherny
自动模式(Auto mode)。自动模式是最好的。9
Catherine Wu
Instead of plan mode?9
Catherine Wu
代替了计划模式?9
Boris Cherny
Instead of plan mode. Yeah, because the newer models they don't actually need like a planning step anymore. I think this was really important for like Opus 4 through maybe 4.5. Then I think starting with 4.6 and definitely with 4.7, it just doesn't need that planning step.9
Boris Cherny
代替了计划模式。是的,因为更新的模型实际上不再需要计划步骤了。我认为这对 Opus 4 到大概 4.5 来说非常重要。然后我认为从 4.6 开始,特别是到了 4.7,它就完全不需要那个计划步骤了。9
Boris Cherny
I think some people still use it, they like to have that artifact. I don't use it. And I just do auto mode for everything, because then I start my Claude, it starts to work, and then I just like move on to the next Claude. And I don't have to sit there and watch it.9
Boris Cherny
我想有些人可能还在用,他们喜欢保留那个(计划)产物。但我不用了。我所有的事都用自动模式,因为这样我启动我的 Claude 后它就开始工作,然后我就直接去处理下一个 Claude 了。我不需要坐在那里盯着它看。9
Boris Cherny
But from the very early days we had this like permission prompt model for Claude Code, right? Like it runs a tool and then it asks you like hey are you okay running this tool? And you have to say yes or no.9
Boris Cherny
但在非常早期的阶段,我们为 Claude Code 设置了类似权限提示模型,对吧?就像它要运行一个工具,然后它会问你:"嘿,你可以让我运行这个工具吗?"然后你必须回答是或否。9
Boris Cherny
And at the time that was kind of the best we had a year and a half ago because we didn't have, you know classifiers, the model was not as well aligned as it is today. So auto mode was just such a, it was such a big step up because actually you don't want to read most of these requests. Just routing it to a different model and having it check for security works so much better. Yeah.9
Boris Cherny
在一年半以前,这已经是我们能做到的最好方式了,因为我们当时没有分类器,模型也没有像今天对齐得这么好。所以自动模式简直是巨大的一步飞跃,因为实际上你并不想去读大部分这样的权限请求。直接把请求路由给另一个模型,让它来检查安全性,这种效果要好得多。是的。9
Boris Cherny
And if a thing like is a little sus or you know this isn't the command that you think you want to run or it's not safe, the model will just deny it and then you can go back and you can allow it later. I think this has been one of those like step changes. We just, there's no way we could have done this a year and a half ago.9
Boris Cherny
如果某件事看起来有点可疑,或者你知道这不是你想要运行的命令,或者它不安全,这个模型就会直接拒绝它,然后你可以之后再回去允许它。我认为这是那种阶跃式的变化之一。我们根本不可能在一年半以前做到这一点。9
Catherine Wu
It's just human nature when you accept 99% of requests that your eyes just glaze over when you read it. And so actually we feel that auto mode is more safe than reading every single permission prompt because it means that you're only paying attention to the most important thing and not like being spammed a bunch of things that are just 99% yes.10
Catherine Wu
这就是人的本性,当你同意了 99% 的请求时,你在阅读它们时双眼就会变得呆滞麻木。所以实际上我们觉得自动模式比阅读每一个权限提示要安全得多,因为这意味着你只需要把注意力集中在最重要的事情上,而不是被一大堆 99% 都是"同意"的东西狂轰滥炸。10
6. 安全性:红队测试、威胁模型,以及为何要信任智能体
Boris Cherny
I think security is one of these things like you can talk about it and then it's a totally different thing to actually do it correctly because it just doesn't always look the way that you think it's going to look.10
Boris Cherny
我觉得安全性就是这种东西:你嘴上谈论它是一回事,但要真正正确地做到它则是完全不同的另一回事,因为它呈现出来的样子并不总是如你预想的那样。10
Boris Cherny
And it's just all about like always red teaming, always pen testing, always looking you know, always having a threat model and then using that to figure out, you know how is this thing going to get attacked? How are people going to get prompt injected? Exactly.10
Boris Cherny
这完全取决于持续进行红蓝对抗测试(red teaming),持续进行渗透测试(pen testing),持续去观察寻找,总是保有一个威胁模型,然后利用它去弄清楚:"这个东西会被怎样攻击?人们会如何被提示词注入?"一点没错。10
Boris Cherny
And I just feel like the team is just like obsessed with this. And it's so important because as a result I just trust the agent to run. And I can move on and I can just have like a second agent. And if I didn't trust it then I just wouldn't have been able to do that.10
Boris Cherny
我觉得团队就是痴迷于此。这太重要了,正因为如此,我才敢放心地信任智能体让它自己运行。我才可以脱身去安排第二个智能体。如果我不信任它,我就根本没法这么做。10
Catherine Wu
And internally, um to actually get auto mode out to our users, we needed to really trust it first. And so what we did was we collected thousands of transcripts of like an entire agent trajectory and a permission prompt, and had auto mode classify whether or not it was safe.11
Catherine Wu
而在内部,为了真正把自动模式推给我们的用户,我们必须首先自己充分信任它。所以我们当时做的是,收集了数以千计的完整智能体运行轨迹和权限提示的记录,让自动模式去分类判断它是否安全。11
Catherine Wu
And it was extremely good at this, so then we got red teamers, and we asked them to try to prompt inject and try to hack uh the code base, and we used this to create evals and made sure that all of these were denied.11
Catherine Wu
它在这方面做得非常好,于是我们找来了红队成员,要求他们尝试进行提示词注入并试着黑进代码库,我们利用这些构建了评估测试(evals),并确保所有这些恶意操作都被拒绝了。11
Catherine Wu
And then we had our own internal teams try to prompt inject and hack Claude Code uh Claude Code's auto mode. And then we improved auto mode to make sure that we caught all of these. So it's not only just protecting you against the vulnerabilities that are out there in the wild today, but the most intelligent attacks that we can construct.11
Catherine Wu
接着,我们让我们自己的内部团队也尝试去对 Claude Code 以及 Claude Code 的自动模式进行提示词注入和黑客攻击。然后我们改进了自动模式,以确保我们能拦截住所有的这些攻击。因此,它不仅能保护你免受今天广泛存在的漏洞威胁,还能防范我们所能构建出的最聪明的攻击。11
7. 在新事物上构建:扔掉旧的工程直觉,不断重新学习
Boris Cherny
Yeah I mean it's like it's honestly like a weird approach. I feel like there's like all these features the last year where the first time someone pitched it I was like ah no no way, that's not gonna work. And I feel like over time I've just learned like I'm actually wrong like so often now, because like building on the model is so weird. Yeah.12
Boris Cherny
是的,我的意思是说老实讲这是一种很奇怪的方法。我觉得过去一年里有各种各样的功能,当第一次有人提出时,我的反应都是:"啊,不,不可能的,这行不通。"然后随着时间推移,我逐渐认识到我实际上经常是错的,因为在模型上进行构建就是这么奇怪。是的。12
Boris Cherny
It's just like all this like engineering stuff that I've learned over the years, like so much of it I just have to like throw out. And this is just like part of what the job is now. Like we're building on a new thing and we just have to relearn it.12
Boris Cherny
就好像我这么多年学到的所有这些工程知识,很大一部分我只能抛弃掉。这就是现在这项工作的一部分。我们是在一个全新的事物上进行构建,所以我们只能重新学习。12
Boris Cherny
And auto mode was definitely one of these. I was like the first time I heard it I was like, route the prompt to a model? No way, that's not gonna work. And then it actually turns out empirically it works really, really well.12
Boris Cherny
自动模式绝对就是其中之一。当我第一次听到这个想法时我想:"把提示路由给另一个模型?没门,这行不通的。"然而经验结果证明,它实际上运行得极其出色。12
8. Loop 与下一次飞跃:不再和智能体对话,和"例行程序"对话
Catherine Wu
I heard you also love loop.13
Catherine Wu
我听说你也喜欢 loop(循环指令模式)。13
Boris Cherny
Um, yeah I love loop. How do you use it? I think for loop there's this transition that we went through like a year and a half ago where we were like all right, there's source code, but actually the thing an engineer should interact with, maybe it's not the source code, maybe it's the agent.13
Boris Cherny
嗯,是的我喜欢 loop。你是怎么用它的?我认为关于 loop,我们在一年半前经历过一次转变:我们觉得,好吧,这里有源代码,但其实工程师该去交互的对象,也许不再是源代码了,也许是智能体。13
Boris Cherny
And so we made this leap of like I don't write the source code, I talk to an agent and the agent writes the source code for me. And I think right now what's happening is we're making the next leap. I don't talk to an agent anymore. I talk to loop or I talk to a routine. And it prompts Claude for me.13
Boris Cherny
于是我们实现了这样一个飞跃:我不直接写源代码,我和一个智能体对话,由智能体来为我写源代码。而我认为现在正在发生的事情是我们在进行下一次飞跃。我不再和智能体对话了。我和 loop 或者例行程序对话。然后它来代替我去提示 Claude。13
Boris Cherny
And it's just it's crazy. I mean it's been like a year and a half and this is like two big leaps. If you take like a step back, how are you seeing entire engineering orgs change?13
Boris Cherny
这简直太疯狂了。我的意思是,这也就一年半的时间,这已经是两次巨大的飞跃了。如果你退一步看,你是如何看待整个工程组织正在发生的变化的?13
9. 历史类比:扔掉文件柜,把 Claude 放在一切的中心
Boris Cherny
I'm gonna put on my business cat hat. I have this like favorite case study this is like a Harvard Business Review from the 90s. And they were talking about like computers are here, why are we not seeing the productivity benefits? And it's just this like amazing snapshot into like what it actually felt like at the time.14
Boris Cherny
我要戴上我的商业猫咪帽(换个商业视角)了。我有一个最喜欢的案例研究,那是 90 年代《哈佛商业评论》的一篇文章。他们当时在讨论:计算机已经普及了,为什么我们没有看到生产力的提升?那真的是一个了解当时人们真实感受的绝妙缩影。14
Boris Cherny
Cause like you know people used to use mainframes at some point, companies switched to personal computers. It was sort of a new thing and companies were trying to figure out how to use it. The same way they're trying to figure out how to use AI right now.14
Boris Cherny
因为你知道,人们一度使用大型机,后来公司转向了个人电脑。这在当时是个新事物,公司都在试图弄清楚怎么使用它。这就跟他们现在试图弄明白如何使用 AI 是一样的。14
Boris Cherny
And it turned out that to get the productivity benefits from computers, what you had to do isn't like you have your paper filing cabinet and your paper and pen business process and then there's like a computer on the side that does something.
Actually what you have to do is you throw out the filing cabinet, you have to throw out all your paper and all your pens, and you put a computer in the center. And everything has to run through the computer. It has to be at the center of every business process.14
Actually what you have to do is you throw out the filing cabinet, you have to throw out all your paper and all your pens, and you put a computer in the center. And everything has to run through the computer. It has to be at the center of every business process.14
Boris Cherny
结果证明,要从计算机中获得生产力效益,你要做的并不是保留你纸质的文件柜以及纸笔业务流程,然后在旁边放一台计算机做点辅助。
实际上你必须做的是:扔掉文件柜,扔掉所有的纸和笔,把计算机放在最中心。所有事情都必须通过计算机来运转。它必须是每个业务流程的中心。14
实际上你必须做的是:扔掉文件柜,扔掉所有的纸和笔,把计算机放在最中心。所有事情都必须通过计算机来运转。它必须是每个业务流程的中心。14
Boris Cherny
And I feel like at Anthropic we do this thing where when you onboard you don't ask people questions. Like no one asked me questions when they onboard. They probably have the same thing. They ask Claude. And this is kind of weird. Like this is the first company I've been at like that.14
Boris Cherny
我觉得在 Anthropic,我们有一种做法,当你入职时你不会去问别人问题。比如没有人入职时问过我问题。大家可能都是同样的做法:他们去问 Claude。这有点不可思议。这是我待过的第一家这样的公司。14
Boris Cherny
And I feel like for us Claude is just at the center of everything. Whenever I have a question I ask Claude. Whenever I write code I use Claude. Whenever I need a code review Claude does it. Uh whenever I need a security review Claude does it. Whenever I need to fill out a form or something, Cowork does it.14
Boris Cherny
我觉得对我们来说,Claude 绝对是处于一切事物的中心。每当我有问题,我问 Claude。每当我写代码,我用 Claude。每当我需要代码审查,Claude 来做。每当我需要安全审查,Claude 来做。每当我需要填表或做其他事,Co-work 帮我做。14
Boris Cherny
So it's just like Claude is at the center of everything. And I feel like the companies that are really figuring it out and there's a bunch of them now. They're just putting Claude at the center of it.14
Boris Cherny
所以 Claude 就处于一切的中心。我觉得那些真正弄明白这一点的公司——现在已经有不少了——他们正是把 Claude 放在了核心位置。14
Catherine Wu
And I think for computers the transition took 10 to 15 years, but actually for AI because so much of our work is already ready, digitized, and Claude can use a computer and it can write code and run code, this transition is happening a lot faster.15
Catherine Wu
我认为对于计算机来说,这种转变花了 10 到 15 年的时间,但实际上对于 AI,因为我们有如此大量的工作已经准备好了、被数字化了,而且 Claude 能够使用计算机,能够编写代码并运行代码,所以这种转变发生得要快得多。15
10. 告别繁琐,找回工程的乐趣——以及未来的产品形态
Boris Cherny
I think it's just really exciting. Like I feel like now I don't have to bug people anymore. And when I interact with people it's because it's like fun and I get to collaborate with them on stuff and we get to create something together. It's not that like I need them, I need something from them. Cause like Claude can actually do a lot of that stuff now.15
Boris Cherny
我觉得这真的非常令人兴奋。就像我觉得我现在再也不用去打扰别人了。当我与人互动时,那是因为觉得很有趣,我可以和他们在某些事情上协作,一起创造一些东西。不再是因为我需要他们、我必须从他们那里得到些什么。因为现在 Claude 实际上能干很多这类的事情。15
Boris Cherny
And I also feel like as an engineer I've just never had this much fun doing engineering because the tedious part I don't have to do like I'm just coming up with ideas, I'm talking to customers. And every idea I don't have a to-do list anymore like Claude just builds everything.15
Boris Cherny
此外,作为一名工程师,我感觉我从来没有在做工程时体会过这么多的乐趣,因为繁琐的部分我不需要做了,我只是想出点子,和客户交谈。对于每一个想法,我都不再需要一个待办事项列表了,因为 Claude 会把一切都构建好。15
Boris Cherny
And so my job is to come up with these ideas and it's just so fun. Okay so here's a question. Is the future product or engineering? Like is everyone gonna be a PM or is everyone gonna be an engineer?15
Boris Cherny
所以我的工作就是想出这些绝妙的点子,这太有趣了。好的,那么问题来了:未来是属于产品还是工程?是每个人都将成为产品经理(PM)还是每个人都将成为工程师?15
Catherine Wu
Everyone's gonna be both. I feel pretty strongly that these roles are merging. Like when we look at our team, our product team all writes code, our devrel team all writes code, our design team all writes code.16
Catherine Wu
每个人都会兼具这两种身份。我非常强烈地感觉到这些角色正在融合。比如看看我们的团队:我们的产品团队都写代码,我们的开发者关系团队都写代码,我们的设计团队也都写代码。16
Catherine Wu
And then we look at our engineers and a lot of them ship products end to end. They have an idea for what to build, they build it, they work with legal and marketing to figure out how we communicate this to the world and make sure it's safe and secure too. And a lot of times they just see through this whole process end to end.16
Catherine Wu
然后再看看我们的工程师,他们中的许多人都在端到端地交付产品。他们产生了一个关于要构建什么的想法,他们把它构建出来,他们与法务和营销团队合作,弄清楚我们该如何向世界传达这个产品,并确保它同样是安全的。很多时候他们就是端到端地跟进完整个流程。16
Catherine Wu
So I think right now AI really benefits people who have a lot of curiosity, have a lot of product taste, who love to have this like end-to-end ownership. And now a lot of people are running like hundreds of agents. What are the products that you think people should be adopting as they transition from single to multiple to hundreds?16
Catherine Wu
所以我认为现在 AI 真的能让那些拥有强烈好奇心、拥有极佳产品品味、并且喜欢这种端到端所有权的人受益。而现在有很多人同时运行着成百上千个智能体。随着人们从单个智能体过渡到多个甚至数百个,你认为人们应该采用哪些产品工具?16
11. 手机编程:远程控制让 Boris 把笔记本电脑留在了办公桌上
Boris Cherny
Until recently the way that I wrote code was I had like six terminal tabs with six git checkouts of the same repo and then I would just like tab between them. Now it's pretty different. I have like one tab.17
Boris Cherny
直到最近,我写代码的方式还是开着大概 6 个终端标签页,对同一个代码库检出 6 份 git 代码,然后在它们之间来回切换。现在非常不一样了。我只有一个标签页。17
Boris Cherny
I use the new agent view that we just shipped. It's like so good and I'm so glad that we took a while to iterate on it to make that really good. And I also use the desktop app because I don't have to fiddle with checkouts that way. It just like you know it does the work tree cloning like it creates the work trees for me.17
Boris Cherny
我使用了我们刚刚发布的新型智能体视图(agent view)。它简直太棒了,我很高兴我们花了一段时间去迭代它,让它变得非常出色。我也使用桌面应用程序,因为那样我就不用再摆弄什么代码检出了。你知道,它会自动进行工作树克隆,会为我创建工作树(worktrees)。17
Boris Cherny
And the thing that I would not have expected six months ago is probably half my engineering now I do on my phone. So I just have like I have so many agents running that I just start from my phone. I use remote control, which is like amazing now.17
Boris Cherny
另外有一件在半年前我绝对想不到的事是:现在我可能有一半的工程工作都是在手机上完成的。因为我有这么多个智能体在运行,所以我直接从手机上启动。我使用远程控制(remote control),它现在的体验棒极了。17
Boris Cherny
And like I'll start something on my computer and then I'll just remote control in from my phone and I'll just like walk around, I'll like get a coffee and then check in on my agents and maybe I'll start another agent.17
Boris Cherny
我会先在电脑上启动个任务,然后我用手机远程连接,接着我就能四处走动,去买杯咖啡,然后随时检查一下我的智能体,或许再启动另一个智能体。17
Boris Cherny
And sometimes I'm like talking to someone and we come up with a new idea. I'll just start an agent on the spot. I'll like talk to it with voice mode and just have it build something. And I don't even have to go back to my computer anymore.17
Boris Cherny
有时我在和某人聊天,我们想出了一个新点子,我就会当场启动一个智能体。我会用语音模式跟它交流,直接让它去把东西建出来。我甚至都不需要再回到我的电脑前了。17
Catherine Wu
I remember when you started doing this, because you would actually leave work, have your computer on your desk, open, plugged in, screen locked. And I just thought you would like come back to the office at some point to get your computer.18
Catherine Wu
我记得你刚开始这么做的时候,因为你下班时真的把电脑留在桌子上,打开着,连着电源,锁着屏。我还以为你会在晚些时候回办公室来拿电脑。18
Catherine Wu
But then it would be like pretty late and I was like hmm maybe he just like left it here by accident. And then it happened again the next day, and happened again the next day. And I was like wait this is so weird cause you're landing PRs but your computer's right next to me. And I remember you responding and you're like yeah I'm coding from my couch.18
Catherine Wu
但后来天都挺晚了,我就想:嗯,也许他只是不小心把它忘在这儿了。结果第二天又发生了同样的事,第三天还是如此。我当时就想:等等,这太奇怪了,因为你一直在合并 PR,但你的电脑就在我旁边。我记得当时问你,你回答说:"是啊,我正躺在沙发上敲代码呢。"18
Boris Cherny
Yeah. That was the week that remote control got really good.19
Boris Cherny
哈哈,没错。那就是远程控制变得非常好用的那一个星期。19
12. 上下文工程:极简主义,给模型一种"拉取上下文的方法"
Catherine Wu
Yeah. So another thing that users are asking about all the time is how do you do context engineering, especially in a large enterprise?19
Catherine Wu
是啊。那么用户经常问到的另一个问题是,你是怎么做上下文工程(context engineering)的?特别是在大型企业里?19
Boris Cherny
This is a thing you know people used to talk about prompt engineering, they used to talk about context engineering. This is sort of matching where the model was at the time. Back in the days of Sonnet 3.5 you had to prompt engineer, back in the days of Opus 4 you had to context engineer.
But with the models of today you don't do any of this. You give it the minimal possible system prompt, the minimal possible tools, and then you let the model figure it out. Like you just have to give the model some way to pull in the context. I think that's the most important thing. How do you think about it?19
But with the models of today you don't do any of this. You give it the minimal possible system prompt, the minimal possible tools, and then you let the model figure it out. Like you just have to give the model some way to pull in the context. I think that's the most important thing. How do you think about it?19
Boris Cherny
这是一件有时代印记的事,你知道大家以前谈论提示工程,后来谈论上下文工程。这某种程度上是与模型当时的能力水平相匹配的。在 Sonnet 3.5 的时代,你必须做提示工程;在 Opus 4 的时代,你必须做上下文工程。
但有了今天的这些模型,你完全不需要做这些了。你只需要给它最少可能的系统提示词,给它最少可能的工具,然后让模型自己去弄明白。你只需要给模型一种"拉取上下文的方法"。我认为那才是最重要的事情。你是怎么看这个的?19
但有了今天的这些模型,你完全不需要做这些了。你只需要给它最少可能的系统提示词,给它最少可能的工具,然后让模型自己去弄明白。你只需要给模型一种"拉取上下文的方法"。我认为那才是最重要的事情。你是怎么看这个的?19
Catherine Wu
I see things very similarly. I'm a context minimalist. So my general philosophy is tell the model only what it needs to know, and let it figure out the rest of it. Um I think when you give the model too much context it's kind of like you're micromanaging it. And sometimes the model knows a better way to get to the same outcome. And I personally prefer to give the model that freedom to do that.20
Catherine Wu
我对这个的看法非常相似。我是一个"上下文极简主义者(context minimalist)"。所以我总体的哲学是:只告诉模型它需要知道的东西,让它自己去弄明白剩下的部分。我觉得当你给模型塞了太多的上下文时,就好像你在微观管理它。而有时模型其实知道一种能达到相同结果的更好方法。我个人更倾向于给予模型这份去自由发挥的权利。20
Catherine Wu
Um and then in general we're also making our harness more lean so that you have more room for your own prompts. Um and so that follows your prompts better.20
Catherine Wu
嗯,而且总体来说,我们也正在让我们的驱动框架(harness)变得更加精简,这样你就有更多的空间来存放自己的提示词。并且这也让它能更好地遵循你的指令。20
13. 展望:未来一年的产品形态,来自团队和社区
Boris Cherny
There's all these different ways to use Claude now, but I feel like in a year it's going to be a totally new set of things. And it's going to be so surprising if it's still these same things. Cause I think like we're seeing these giant trends happening right now. Agents are running for longer, they're more autonomous.21
Boris Cherny
现在有各种不同使用 Claude 的方法,但我觉得一年之后,这将是一套完全不同的新东西。如果到时候还是这些旧东西,那反而太令人惊讶了。因为我想我们现在正在目睹这些巨大的趋势正在发生。智能体运行的时间越来越长,越来越自主。21
Boris Cherny
Very rarely am I running one agent at a time. It's usually like a few agents or dozens or hundreds or thousands. And so like the form factor for that, it's going to be really different than what came before.21
Boris Cherny
我现在很少只一次运行一个智能体了。通常是几个、几十个、几百个甚至几千个智能体同时运行。因此,与其适配的产品形态(form factor)将与以前完全不同。21
Boris Cherny
And I don't know what it's going to be, and I think in large part it's going to be up to the team to figure it out. And this is um this is why I'm like so happy we run the team the way that we do where everyone just comes up with ideas and everyone is able to think about the product, everyone talks to users all the time. Because I don't think these ideas are gonna come from us, it's gonna come from the team.21
Boris Cherny
我不知道它究竟会是什么样子,而且我认为在很大程度上这将取决于团队去弄清楚它。这就是为什么我非常高兴我们以现在的这种方式运营团队:每个人都能提出自己的想法,每个人都能够去思考产品,每个人都时刻在和用户交谈。因为我不认为这些点子会由我们想出来,它们将来自整个团队。21
Catherine Wu
Totally. And from everyone in our community building with us.22
Catherine Wu
完全同意。同时也来自我们社区中和我们一起构建的每一个人。22
— 对话结束 · End of Conversation —