大語言模型(LLM)+人工智能體(AI Agent)基本入門/名詞 介紹
大語言模型(LLM)+人工智能體(AI Agent)基本入門/名詞 介紹
資料來源: https://www.youtube.com/watch?v=I7qvs56u7Zk
文字稿:
影片備份: https://www.youtube.com/watch?v=GkEejlcI6to 1 00:00:00,000 --> 00:00:02,800 面试官根本不会看你会不会調用API 2 00:00:02,800 --> 00:00:04,800 他们张口就是A纸的机制 3 00:00:04,800 --> 00:00:06,000 RG照回率 4 00:00:06,000 --> 00:00:07,800 听着是不是很头大 5 00:00:07,800 --> 00:00:09,200 其实大可不必 6 00:00:09,200 --> 00:00:12,000 从底层的token处理到外部的工具调用 7 00:00:12,000 --> 00:00:14,400 再到智能体逻辑非常清晰 8 00:00:14,400 --> 00:00:17,400 今天我就带你们把这些技术名词一次性拆解清楚 9 00:00:17,400 --> 00:00:20,200 帮你建立起一套完整的AI技术世界观 10 00:00:20,200 --> 00:00:22,200 视频内容可以分为五个部分 11 00:00:22,200 --> 00:00:25,800 首先我们会从最基础的BioM和token开始 12 00:00:25,800 --> 00:00:28,400 搞清楚AI到底是怎么把人类的语言 13 00:00:28,400 --> 00:00:30,400 变成他听得懂的数字 14 00:00:30,400 --> 00:00:32,400 这是所有技术的起点 15 00:00:32,400 --> 00:00:35,200 接着我们看一下技艺和信息加工 16 00:00:35,200 --> 00:00:37,800 也就是context和RG 17 00:00:37,800 --> 00:00:39,300 AI进行不好怎么办 18 00:00:39,300 --> 00:00:41,600 怎么让它像人一样拥有短期技艺 19 00:00:41,600 --> 00:00:44,400 还能够随时的去翻阅这个百课全书 20 00:00:44,400 --> 00:00:46,900 对里面的RG技术就是关键 21 00:00:46,900 --> 00:00:49,000 然后我会带你拆解pronged 22 00:00:49,000 --> 00:00:50,800 但你怎么通过精准的指令 23 00:00:50,800 --> 00:00:54,000 让AI从胡说八道变成言听计从 24 00:00:54,000 --> 00:00:56,000 当然光会聊天还没有用 25 00:00:56,000 --> 00:00:57,200 还得能干实识 26 00:00:57,200 --> 00:00:59,200 所以第四部分我会重点讲一下 27 00:00:59,200 --> 00:01:01,800 怎么给AI装上眼睛和耳朵 28 00:01:01,800 --> 00:01:05,100 又怎么让它通过MCP协议去调用各种工具 29 00:01:05,100 --> 00:01:08,100 这是现在大场面是最喜欢的加分项 30 00:01:08,100 --> 00:01:10,700 之后我们看他怎么通过agents和scale 31 00:01:10,700 --> 00:01:13,500 让AI从一个被动的无拿机器进化成 32 00:01:13,500 --> 00:01:17,100 能自己拆解任务自主规划流程的超级助手 33 00:01:17,100 --> 00:01:18,400 好我们直接开始 34 00:01:18,400 --> 00:01:20,600 聊一聊最底层的这两个概念 35 00:01:20,600 --> 00:01:24,200 首先我们要先搞清楚LM也就是大语言模型 36 00:01:24,200 --> 00:01:26,200 它的心脏到底是什么 37 00:01:26,200 --> 00:01:28,500 大家看这一张复杂的流程图 38 00:01:28,500 --> 00:01:30,200 或者是Transformer架构 39 00:01:30,200 --> 00:01:32,900 其实我们不需要把整个架构研究透 40 00:01:32,900 --> 00:01:34,000 你只需要知道 41 00:01:34,000 --> 00:01:38,100 现在市面上几乎所有的大模型底层引擎都是它 42 00:01:38,100 --> 00:01:41,700 或者架构最早是Google在2017年提出来的 43 00:01:41,700 --> 00:01:43,100 那边论文非常有名 44 00:01:43,100 --> 00:01:45,300 叫做A Tension is O-U-Nate 45 00:01:45,300 --> 00:01:46,400 很有意思的是 46 00:01:46,400 --> 00:01:48,600 虽然这个火种是Google点的 47 00:01:48,600 --> 00:01:51,000 但是真正把火烧变全球的 48 00:01:51,000 --> 00:01:54,700 是OpenAI是OpenAI的GPT-305 49 00:01:54,700 --> 00:01:57,400 那这个大模型到底怎么工作的呢 50 00:01:57,400 --> 00:01:59,600 形立它的原理非常非常朴素 51 00:01:59,600 --> 00:02:01,000 我们看这张图 52 00:02:01,000 --> 00:02:03,800 它本质上是在玩文字接笼游戏 53 00:02:03,800 --> 00:02:06,700 你看比如我们问它你喜欢什么水果 54 00:02:06,700 --> 00:02:08,000 模型介绍这个问题 55 00:02:08,000 --> 00:02:09,800 经过内部的一通运算 56 00:02:09,800 --> 00:02:11,900 并不会把整个句子吐出来 57 00:02:11,900 --> 00:02:14,100 而是先预测下一个概率最高的词 58 00:02:14,100 --> 00:02:16,300 比如说它先吐出一个我字 59 00:02:16,300 --> 00:02:17,600 那关键点来了 60 00:02:17,600 --> 00:02:19,800 它吐出了我之后不会停下来 61 00:02:19,800 --> 00:02:22,200 而是会把我这个字给抓回来 62 00:02:22,200 --> 00:02:24,400 追加到刚才的那个输入后面 63 00:02:24,400 --> 00:02:26,300 变成你喜欢吃什么水果 64 00:02:26,300 --> 00:02:28,600 我然后拿着这个新句子 65 00:02:28,600 --> 00:02:30,000 再去预测下一个词 66 00:02:30,000 --> 00:02:33,400 比如喜欢接着再把喜欢塞回去 67 00:02:33,400 --> 00:02:34,500 继续预测 68 00:02:34,500 --> 00:02:35,800 直到它觉得话说完了 69 00:02:35,800 --> 00:02:37,900 输出一个特殊的一个结束符号 70 00:02:37,900 --> 00:02:39,700 整个回答才算结束 71 00:02:39,700 --> 00:02:41,500 所以我们看到了流逝输出 72 00:02:41,500 --> 00:02:42,900 一个字一个字往外蹦 73 00:02:42,900 --> 00:02:44,700 其实就是因为它就是这么一个词 74 00:02:44,700 --> 00:02:46,300 一个词算出来的 75 00:02:46,300 --> 00:02:47,600 但是这里有个问题啊 76 00:02:47,600 --> 00:02:49,400 我们人类输入的是文字 77 00:02:49,400 --> 00:02:51,500 大模型它真的能看懂吗 78 00:02:51,500 --> 00:02:52,800 其实是不能的 79 00:02:52,800 --> 00:02:55,200 大模型本质上是一个巨大的数据函数 80 00:02:55,200 --> 00:02:57,200 里面跑的都是举真一算 81 00:02:57,200 --> 00:02:58,500 它只认识数字 82 00:02:58,500 --> 00:03:00,800 压根就不认识我们人话 83 00:03:00,800 --> 00:03:02,900 所以呢在人类和模型之间 84 00:03:02,900 --> 00:03:04,600 必须要有一个翻译观 85 00:03:04,600 --> 00:03:07,600 它这就引出了token和tokenize的概念 86 00:03:07,600 --> 00:03:09,300 tokenize负责两件事 87 00:03:09,300 --> 00:03:11,300 编码和解码 88 00:03:11,300 --> 00:03:14,100 编码呢就是把我们的文字变成数字 89 00:03:14,100 --> 00:03:18,500 而解码就是把模型算出来的数字变回文字 90 00:03:18,500 --> 00:03:20,700 我们来看一下这个工作流拆解 91 00:03:20,700 --> 00:03:22,500 首先看左边的蓝字区域 92 00:03:22,500 --> 00:03:24,400 这就是编码的过程 93 00:03:24,400 --> 00:03:27,200 当我们输入你喜欢什么水果这句话时 94 00:03:27,200 --> 00:03:30,100 tokenize会先做一个动作叫做切分 95 00:03:30,100 --> 00:03:31,300 它把句子给切碎 96 00:03:31,300 --> 00:03:34,900 变成了你喜欢什么水果 97 00:03:34,900 --> 00:03:36,400 这四个小块 98 00:03:36,400 --> 00:03:39,800 注意看这里的每一个小块就是一个token 99 00:03:39,800 --> 00:03:42,200 紧接着它会进入硬设环节 100 00:03:42,200 --> 00:03:43,700 因为模型指令是数字 101 00:03:43,700 --> 00:03:47,000 所以tokenize会给每一个token发一个身份证号 102 00:03:47,000 --> 00:03:47,900 你看图里面 103 00:03:47,900 --> 00:03:50,400 你变成了id102 104 00:03:50,400 --> 00:03:52,700 喜欢变成了id450 105 00:03:52,700 --> 00:03:55,300 居然以来原本的一句话在模型的眼里 106 00:03:55,300 --> 00:03:57,400 就变成了一串数字列表 107 00:03:57,400 --> 00:03:59,100 有了这一串数字之后 108 00:03:59,100 --> 00:04:01,900 可以送进中间这一个绿色的区域 109 00:04:01,900 --> 00:04:04,200 也就是模型运算的核心了 110 00:04:04,200 --> 00:04:06,600 在这里还要mg型复杂的矩阵运算 111 00:04:06,600 --> 00:04:08,900 最后预测出下一个最可能的数字 112 00:04:08,900 --> 00:04:12,100 比如说这里的tokenid203 113 00:04:12,100 --> 00:04:15,100 这时候就轮到右边的紫色区域发挥作用了 114 00:04:15,100 --> 00:04:16,900 这就是解码 115 00:04:16,900 --> 00:04:20,000 模型突出来的是冷避密的数字203 116 00:04:20,000 --> 00:04:21,500 但是用户看不懂啊 117 00:04:21,500 --> 00:04:24,000 所以tokenize要再次出场 118 00:04:24,000 --> 00:04:25,800 拿着这个数字区域查表 119 00:04:25,800 --> 00:04:29,500 把它翻译回人类能看懂的文字就是西瓜 120 00:04:29,500 --> 00:04:31,200 大家看这里有个小提示 121 00:04:31,200 --> 00:04:33,500 解码的时候是不需要切分的 122 00:04:33,500 --> 00:04:35,600 因为模型每次只突出一个结果 123 00:04:35,600 --> 00:04:38,900 tokenize只需要一对一的把它还原成文字就行了 124 00:04:38,900 --> 00:04:43,800 那就是为什么我们说token是大模型处理文本的最小单元 125 00:04:43,800 --> 00:04:45,700 它不仅是输入时的碎片 126 00:04:45,700 --> 00:04:47,700 也是输出时的积木 127 00:04:47,700 --> 00:04:50,200 最后我们得纠正一个常见的误区 128 00:04:50,200 --> 00:04:52,300 很多人觉得token就是磁 129 00:04:52,300 --> 00:04:53,600 其实不是的 130 00:04:53,600 --> 00:04:57,100 token是模型自己学会的一套文本切分规则 131 00:04:57,100 --> 00:04:59,600 然后磁并不是一对一的关系 132 00:04:59,600 --> 00:05:01,000 那怎么换算呢 133 00:05:01,000 --> 00:05:02,600 大家看我这里写了一个比例 134 00:05:02,600 --> 00:05:03,600 评计来讲 135 00:05:03,600 --> 00:05:07,400 一个token大约等于0.75个英文单词 136 00:05:07,400 --> 00:05:09,900 或者是1.5到两个汉字 137 00:05:09,900 --> 00:05:13,300 所以如果下次有人跟你说这个模型这是多少多少token 138 00:05:13,300 --> 00:05:14,700 你就大概能心里有数 139 00:05:14,700 --> 00:05:16,800 这到底能装下多少字了 140 00:05:16,800 --> 00:05:18,900 好,老龙吧,i.m和token 141 00:05:18,900 --> 00:05:21,100 我们就有了最基础的砖块 142 00:05:21,100 --> 00:05:22,700 但现在有个新问题 143 00:05:22,700 --> 00:05:24,500 模型每次回答完就忘了 144 00:05:24,500 --> 00:05:27,000 怎么才能让它记住之前的对话呢 145 00:05:27,000 --> 00:05:28,500 这就引出了下一个词 146 00:05:28,500 --> 00:05:31,000 huntx也就是上下文 147 00:05:31,000 --> 00:05:34,600 再看标题叫做大模型的临时记体 148 00:05:34,600 --> 00:05:37,700 其实大模型它本质上还是个数据函数 149 00:05:37,700 --> 00:05:39,400 它自己是没有记忆的 150 00:05:39,400 --> 00:05:41,700 那为什么我们觉得它能记住呢 151 00:05:41,700 --> 00:05:43,400 这中间的秘密就在于 152 00:05:43,400 --> 00:05:45,500 每次我们发新消息的时候 153 00:05:45,500 --> 00:05:49,500 背后的程序会自动把我们之前的整段对话历史给找出来 154 00:05:49,500 --> 00:05:52,400 连同新问题一起打包发给大模型 155 00:05:52,400 --> 00:05:58,200 所以huntx其实就是模型每次处理任务时接收到了一个所有信息的总和 156 00:05:58,200 --> 00:05:59,300 你看右边这个图 157 00:05:59,300 --> 00:06:01,700 这里面装的东西还真不少啊 158 00:06:01,700 --> 00:06:04,500 有开发者在后台设定的sysnproms 159 00:06:04,500 --> 00:06:06,200 也就是系统提示词 160 00:06:06,200 --> 00:06:08,000 有我们之前的对话历史 161 00:06:08,000 --> 00:06:09,600 有当前用户的问题 162 00:06:09,600 --> 00:06:13,400 甚至还有工具列表和正在生成的token 163 00:06:13,400 --> 00:06:16,100 你可以把想象成模型的临时工作台 164 00:06:16,100 --> 00:06:18,800 每次干活之前都要把相关的资料啊 165 00:06:18,800 --> 00:06:19,900 之前的进度啊 166 00:06:19,900 --> 00:06:22,600 还有当前的任务全都摊在这个台子上面 167 00:06:22,600 --> 00:06:24,200 它才能开始工作 168 00:06:24,200 --> 00:06:26,800 那这个工作台能放多少东西呢 169 00:06:26,800 --> 00:06:29,800 各就设计到了context window上下门窗口 170 00:06:29,800 --> 00:06:33,700 funtx window代表了context能容纳的最大token数量 171 00:06:33,700 --> 00:06:37,200 比如说gp5.4它的窗口是105万 172 00:06:37,200 --> 00:06:39,800 cloud opo4.6是100万 173 00:06:39,800 --> 00:06:41,300 就听起来很大 174 00:06:41,300 --> 00:06:43,100 但是呢如果我们要让模型 175 00:06:43,100 --> 00:06:44,700 独一本上千页的手册 176 00:06:44,700 --> 00:06:46,600 它还是会遇到麻烦 177 00:06:46,600 --> 00:06:48,500 左边这个图就展示了痛点 178 00:06:48,500 --> 00:06:50,700 如果你把整本手册全塞进去 179 00:06:50,700 --> 00:06:51,800 不仅成本很高 180 00:06:51,800 --> 00:06:53,600 还可能会撑爆窗口 181 00:06:53,600 --> 00:06:54,800 那怎么办呢 182 00:06:54,800 --> 00:06:56,600 这就要用到rg技术 183 00:06:56,600 --> 00:06:58,900 也就是减锁增强生成 184 00:06:58,900 --> 00:07:00,400 你看这里的解决方案 185 00:07:00,400 --> 00:07:02,900 rg就像一个聪明的图书管理员 186 00:07:02,900 --> 00:07:05,200 他不会把整本书都扔给模型 187 00:07:05,200 --> 00:07:07,400 而是先从书里找出和用户问题 188 00:07:07,400 --> 00:07:09,000 最相关的几个片段 189 00:07:09,000 --> 00:07:11,400 只把这些片段发送给模型 190 00:07:11,400 --> 00:07:12,100 转一来 191 00:07:12,100 --> 00:07:13,600 既突破了窗口的限制 192 00:07:13,600 --> 00:07:15,800 又大大降低了成本 193 00:07:15,800 --> 00:07:16,200 好 194 00:07:16,200 --> 00:07:17,300 说完了context 195 00:07:17,300 --> 00:07:19,900 我们再来看一下prompt提示词 196 00:07:19,900 --> 00:07:21,900 prompt其实就是我们给模型的具体问题 197 00:07:21,900 --> 00:07:23,100 或指令 198 00:07:23,100 --> 00:07:24,500 再看这样图 199 00:07:24,500 --> 00:07:25,800 左边的模糊输入式 200 00:07:25,800 --> 00:07:27,300 帮我写一首诗 201 00:07:27,300 --> 00:07:29,300 模型可能会随机采你的意图 202 00:07:29,300 --> 00:07:30,500 结果可能是 203 00:07:30,500 --> 00:07:31,400 古诗 204 00:07:31,400 --> 00:07:32,200 现代诗 205 00:07:32,200 --> 00:07:34,000 甚至是打游诗 206 00:07:34,000 --> 00:07:36,000 而右边的精准输入 207 00:07:36,000 --> 00:07:39,300 五言绝句秋天落叶悲凉风格 208 00:07:39,300 --> 00:07:41,400 就直接收缩了模型的生成范围 209 00:07:41,400 --> 00:07:43,800 让它能更准确的理解我们的意图 210 00:07:43,800 --> 00:07:45,900 这其实就是proms engineering 211 00:07:45,900 --> 00:07:48,000 也就是提示词工程的核心 212 00:07:48,000 --> 00:07:49,600 把话说清楚 213 00:07:49,600 --> 00:07:51,700 不过现在随着模型能力越来越强 214 00:07:51,700 --> 00:07:54,000 即使你的提示词稍微跑安灰点 215 00:07:54,000 --> 00:07:56,000 还掩猜个八九个里时 216 00:07:56,000 --> 00:07:59,100 所以现在这个领域的热度已经不如从前了 217 00:07:59,100 --> 00:07:59,700 接下来呢 218 00:07:59,700 --> 00:08:02,200 我们再细分一下prompt两种类型 219 00:08:02,200 --> 00:08:04,500 userprompt和seasonprompt 220 00:08:04,500 --> 00:08:07,540 首先userprompt就是用户在对话框里 221 00:08:07,540 --> 00:08:09,100 输入的具体任务需求 222 00:08:09,100 --> 00:08:11,600 比如说三加五等于几 223 00:08:11,600 --> 00:08:13,200 这就是userprompt 224 00:08:13,200 --> 00:08:16,600 而seasonprompt是开发者在后台影视配置呢 225 00:08:16,600 --> 00:08:19,700 用来设定模型的人设和作词规则 226 00:08:19,700 --> 00:08:22,300 比如说你是耐心的老师 227 00:08:22,300 --> 00:08:23,700 不可以直接给出答案 228 00:08:23,700 --> 00:08:25,700 需要引导学生思考 229 00:08:25,700 --> 00:08:27,300 把这两者结合起来 230 00:08:27,300 --> 00:08:30,300 模型就能记守住规则就能完成任务了 231 00:08:30,300 --> 00:08:31,500 比如说这个案例 232 00:08:31,500 --> 00:08:33,500 就不问三加五等于几 233 00:08:33,500 --> 00:08:35,800 模型因为有seasonprompt约束 234 00:08:35,800 --> 00:08:37,300 就不会给出答案它 235 00:08:37,300 --> 00:08:39,000 而是会引导用户思考 236 00:08:39,000 --> 00:08:40,400 你有三个苹果 237 00:08:40,400 --> 00:08:41,700 又拿了五个回来 238 00:08:41,700 --> 00:08:43,500 一共有几个数一数 239 00:08:43,500 --> 00:08:44,200 这样一来 240 00:08:44,200 --> 00:08:46,000 模型的输出就既符合规则 241 00:08:46,000 --> 00:08:47,800 又满足了用户需求 242 00:08:47,800 --> 00:08:48,700 啊 243 00:08:48,700 --> 00:08:51,600 刚才我们讲了模型怎么思考和记忆 244 00:08:51,600 --> 00:08:53,800 但其实还有一个很现实的问题 245 00:08:53,800 --> 00:08:55,900 它其实是被关在一个黑盒子里 246 00:08:55,900 --> 00:08:58,400 因为外面的世界是一无所知的 247 00:08:58,400 --> 00:09:00,900 我这里写了大模型的致命缺点 248 00:09:00,900 --> 00:09:02,700 它其实只是一个文字接龙有戏 249 00:09:02,700 --> 00:09:03,900 我们刚才说过了 250 00:09:03,900 --> 00:09:06,700 它要知识库停留在训练节日时间 251 00:09:06,700 --> 00:09:08,500 你问它长沙天际怎么样 252 00:09:08,500 --> 00:09:10,000 它还真回答不了 253 00:09:10,000 --> 00:09:12,400 因为它连今天是节号都不知道 254 00:09:12,400 --> 00:09:15,500 那我们怎么让它睁眼看世界呢 255 00:09:15,500 --> 00:09:17,800 这就得靠吐噁也就是工具 256 00:09:17,800 --> 00:09:20,100 吐噁其实本质上就是一个函数 257 00:09:20,100 --> 00:09:21,500 你看右边这个图 258 00:09:21,500 --> 00:09:24,300 它就像是给这个被关在黑盒子里的大脑 259 00:09:24,300 --> 00:09:26,800 插了一根管子通到外面 260 00:09:26,800 --> 00:09:28,600 输入参数调用外部接口 261 00:09:28,600 --> 00:09:30,600 然后把真实数据拿回来 262 00:09:30,600 --> 00:09:31,200 有了错 263 00:09:31,200 --> 00:09:34,200 模型就能感知甚至影响物理世界了 264 00:09:34,200 --> 00:09:34,800 但是这里呢 265 00:09:34,800 --> 00:09:36,600 其实有个特别容易误解的地方 266 00:09:36,600 --> 00:09:40,400 我们一定要看清楚这一张完整的调用链路图 267 00:09:40,400 --> 00:09:43,500 很多人以为调用工具是模型自己去调的 268 00:09:43,500 --> 00:09:44,600 其实不是 269 00:09:44,600 --> 00:09:47,700 大家看中间这个黄色的柱子叫做平台 270 00:09:47,700 --> 00:09:50,700 你可以把理解成传话筒或者是手脚 271 00:09:50,700 --> 00:09:51,900 流程是这样的 272 00:09:51,900 --> 00:09:54,200 用户问今天长沙天际怎么样 273 00:09:54,200 --> 00:09:56,700 平台会把问题传给大模型 274 00:09:56,700 --> 00:09:57,800 同时告诉他 275 00:09:57,800 --> 00:10:00,300 你手里有个查天际的工具可以用 276 00:10:00,300 --> 00:10:01,500 大模型一琢磨 277 00:10:01,500 --> 00:10:02,900 我没有实时数据啊 278 00:10:02,900 --> 00:10:04,100 但是我有工具 279 00:10:04,100 --> 00:10:04,600 行 280 00:10:04,600 --> 00:10:06,600 那我就决定调用这个工具 281 00:10:06,600 --> 00:10:07,600 完结来了 282 00:10:07,600 --> 00:10:09,900 大模型自己并不会真的去调接口 283 00:10:09,900 --> 00:10:12,200 他唯一的能力是输出文本 284 00:10:12,200 --> 00:10:14,700 所以他会输出一段特定的文本指令 285 00:10:14,700 --> 00:10:15,700 到处平台 286 00:10:15,700 --> 00:10:18,400 我要用这个工具参数是长沙 287 00:10:18,400 --> 00:10:20,500 这时候平台接到指令 288 00:10:20,500 --> 00:10:22,800 他才是真正动手的那个 289 00:10:22,800 --> 00:10:25,600 他去调用天气工具拿到原始数据 290 00:10:25,600 --> 00:10:28,800 最后平台把数据再争回给大模型 291 00:10:28,800 --> 00:10:30,800 大模型就负责把冷避密的数据 292 00:10:30,800 --> 00:10:31,800 含一成人话 293 00:10:31,800 --> 00:10:34,300 比如说今天天气情26度 294 00:10:34,300 --> 00:10:36,900 在经由平台展示给用户 295 00:10:36,900 --> 00:10:38,400 所以总结一下 296 00:10:38,400 --> 00:10:41,100 大模型是负责动嘴做决策 297 00:10:41,100 --> 00:10:44,400 平台是负责动手去执行 298 00:10:44,400 --> 00:10:46,700 说到这里又有新问题了 299 00:10:46,700 --> 00:10:48,500 现在市面上平台这么多 300 00:10:48,500 --> 00:10:50,400 OpenAI有OpenAI的标准 301 00:10:50,400 --> 00:10:52,300 anstropy给有anstropy的规矩 302 00:10:52,300 --> 00:10:54,100 Google也有自己的一套 303 00:10:54,100 --> 00:10:56,200 但是导致开发者特别痛苦 304 00:10:56,200 --> 00:10:57,500 开发一个天气工具 305 00:10:57,500 --> 00:11:00,400 得为这三个平台洗三遍代码 306 00:11:00,400 --> 00:11:02,600 就像左边这个图插头都不一样 307 00:11:02,600 --> 00:11:03,800 钻套了 308 00:11:03,800 --> 00:11:05,500 于是呢就了mcp 309 00:11:05,500 --> 00:11:08,200 全称叫做模型上下文协议 310 00:11:08,200 --> 00:11:10,000 这名字听起来挺缺数的 311 00:11:10,000 --> 00:11:11,700 其实他就想做一件事 312 00:11:11,700 --> 00:11:14,700 成为AI界的tabc接口 313 00:11:14,700 --> 00:11:16,000 不管你是什么平台 314 00:11:16,000 --> 00:11:17,600 也不管你是哪个工具 315 00:11:17,600 --> 00:11:19,900 只要大家都遵守mcp这个标准 316 00:11:19,900 --> 00:11:21,800 开发者只需要写一次代码 317 00:11:21,800 --> 00:11:24,200 就能在所有平台上无缝较用 318 00:11:24,200 --> 00:11:25,400 对于开发者来说 319 00:11:25,400 --> 00:11:27,600 可意味着终于不用重复到轮子了 320 00:11:27,600 --> 00:11:29,500 效率直接起飞 321 00:11:29,500 --> 00:11:31,900 啊工具让模型有了手脚 322 00:11:31,900 --> 00:11:33,500 能连接外部世界了 323 00:11:33,500 --> 00:11:34,600 那再进一步 324 00:11:34,600 --> 00:11:35,800 如果任务特别复杂 325 00:11:35,800 --> 00:11:37,100 需要他自己去动脑子 326 00:11:37,100 --> 00:11:38,700 分步着去完成呢 327 00:11:38,700 --> 00:11:40,800 对到我们最后的主角Agent 328 00:11:40,800 --> 00:11:42,300 也就是智能体 329 00:11:42,300 --> 00:11:44,200 大家可以先看一下这个循环 330 00:11:44,200 --> 00:11:45,200 Agent的核心 331 00:11:45,200 --> 00:11:47,900 就是拥有了自主规划和执行的能力 332 00:11:47,900 --> 00:11:49,700 比如说这里有一个复杂需求 333 00:11:49,700 --> 00:11:51,300 那帮我查一下天气 334 00:11:51,300 --> 00:11:52,500 如果不下雨的话呢 335 00:11:52,500 --> 00:11:54,800 帮我找一下附近的公园 336 00:11:54,800 --> 00:11:57,100 这可不是简单的一句话就能解决的 337 00:11:57,100 --> 00:11:59,200 Agent得自己拆解任务 338 00:11:59,200 --> 00:12:01,100 那思考卧乘可能是样的 339 00:12:01,100 --> 00:12:03,400 首先他得知道我在哪 340 00:12:03,400 --> 00:12:05,700 所以第一步是调用定位工具 341 00:12:05,700 --> 00:12:07,000 拿到经纬度 342 00:12:07,000 --> 00:12:08,600 然后拿到这个经纬度之后呢 343 00:12:08,600 --> 00:12:10,200 去调用天气工具 344 00:12:10,200 --> 00:12:12,000 查一下有没有下雨 345 00:12:12,000 --> 00:12:13,200 发现没有下雨 346 00:12:13,200 --> 00:12:15,800 但下一步就是调用地图工具 347 00:12:15,800 --> 00:12:17,400 找附近的公园 348 00:12:17,400 --> 00:12:19,300 最后把所有信息综合起来 349 00:12:19,300 --> 00:12:21,000 给出最终答案 350 00:12:21,000 --> 00:12:24,200 这个思考规划执行观察的循环 351 00:12:24,200 --> 00:12:26,000 就是Agent的灵魂 352 00:12:26,000 --> 00:12:27,900 它不再是被动的回答一个问题 353 00:12:27,900 --> 00:12:29,600 而是像一个真正的助手 354 00:12:29,600 --> 00:12:32,200 能主动把一个大任务拆解成小任务 355 00:12:32,200 --> 00:12:33,800 一步步去完成 356 00:12:33,800 --> 00:12:35,500 好我们接着往下聊 357 00:12:35,500 --> 00:12:37,700 刚才说了Agent能自己干活 358 00:12:37,700 --> 00:12:39,500 但是怎么保证他干活的方式 359 00:12:39,500 --> 00:12:41,000 符合我们的习惯呢 360 00:12:41,000 --> 00:12:43,100 比如说下雨要提醒带伞 361 00:12:43,100 --> 00:12:44,300 买拦要整齐 362 00:12:44,300 --> 00:12:46,000 语言风格也要确定一下 363 00:12:46,000 --> 00:12:48,500 总不能每一次都重复交代一遍吧 364 00:12:48,500 --> 00:12:50,900 这时候就得靠Agent's skill 365 00:12:50,900 --> 00:12:52,700 你可以把它理解成是Agent的 366 00:12:52,700 --> 00:12:54,600 专属行为说明书 367 00:12:54,600 --> 00:12:56,300 它的原理其实也很简单 368 00:12:56,300 --> 00:12:59,100 就是一份存在本地的Markdown文档 369 00:12:59,100 --> 00:13:01,600 预先告诉Agent该怎么行事 370 00:13:01,600 --> 00:13:03,900 比如说文档里面可能会写清楚 371 00:13:03,900 --> 00:13:05,300 元数据层 372 00:13:05,300 --> 00:13:07,000 定义的是Agent的身份 373 00:13:07,000 --> 00:13:08,800 比如说它是一个出门助手 374 00:13:08,800 --> 00:13:11,400 是负责查天气和提建议 375 00:13:11,400 --> 00:13:14,300 然后指令层规定具体的执行步骤 376 00:13:14,300 --> 00:13:15,900 比如说要先查天气 377 00:13:15,900 --> 00:13:17,800 再根据天气能判断带什么 378 00:13:17,800 --> 00:13:18,900 下雨要带伞 379 00:13:18,900 --> 00:13:20,100 大风要带外套 380 00:13:20,100 --> 00:13:20,800 派扬呢 381 00:13:20,800 --> 00:13:22,300 你得穿一个仿善衣 382 00:13:22,300 --> 00:13:24,500 还有规定输出格式 383 00:13:24,500 --> 00:13:26,000 要求最后总结一句话 384 00:13:26,000 --> 00:13:27,400 并且列个清单 385 00:13:27,400 --> 00:13:29,100 有了这份说明书 386 00:13:29,100 --> 00:13:31,000 Agent就能按我们的习惯干活 387 00:13:31,000 --> 00:13:32,600 并且是按虚读取 388 00:13:32,600 --> 00:13:33,600 还能节省token 389 00:13:33,600 --> 00:13:34,800 好啊 390 00:13:34,800 --> 00:13:37,600 最后我们就把前面聊的所有内容给串起来 391 00:13:37,600 --> 00:13:40,200 从下往上的看一下这个全景图 392 00:13:40,200 --> 00:13:42,100 最底层是结构层 393 00:13:42,100 --> 00:13:44,500 也就是核心引擎LM和token 394 00:13:44,500 --> 00:13:46,700 这是处理数据的最基本单元 395 00:13:46,700 --> 00:13:48,600 往上是容计层 396 00:13:48,600 --> 00:13:50,700 这里面装了complex window 397 00:13:50,700 --> 00:13:51,900 负责技艺加工 398 00:13:51,900 --> 00:13:53,100 还有prome 399 00:13:53,100 --> 00:13:54,400 负责意图控制 400 00:13:54,400 --> 00:13:55,940 以及RD技术 401 00:13:55,940 --> 00:13:57,400 是负责外挂这支库 402 00:13:57,400 --> 00:13:59,700 再往上的是桥两层 403 00:13:59,700 --> 00:14:01,200 这里就是mcp 404 00:14:01,200 --> 00:14:02,500 它负责能力拓展 405 00:14:02,500 --> 00:14:05,100 让Agent能够连接各种各样的工具 406 00:14:05,100 --> 00:14:07,400 最顶层是编排层 407 00:14:07,400 --> 00:14:09,140 这里呢就是Agent 408 00:14:09,140 --> 00:14:10,700 按下一个自主大脑 409 00:14:10,700 --> 00:14:12,700 负责探索规划和执行 410 00:14:12,700 --> 00:14:14,400 而AgentScale 411 00:14:14,400 --> 00:14:16,600 就是给这个大脑的定制行为指南 412 00:14:16,600 --> 00:14:18,800 所以这是我们这期视频要讲的 413 00:14:18,800 --> 00:14:20,440 AI底层技术全景 414 00:14:20,740 --> 00:14:22,840 从最基础的引擎到机翼 415 00:14:22,840 --> 00:14:24,040 到连接外部时间 416 00:14:24,040 --> 00:14:25,240 再到自主规划 417 00:14:25,240 --> 00:14:27,740 一层一层构成我们今天开网的智能体 418 00:14:27,740 --> 00:14:29,340 希望今天讲的这些 419 00:14:29,340 --> 00:14:31,740 能帮你把之前零散的知识点都串起来 420 00:14:31,740 --> 00:14:34,340 以后再看到被AI相关的产品或技术 421 00:14:34,340 --> 00:14:36,040 心里都有一个清晰的框架
One thought on “大語言模型(LLM)+人工智能體(AI Agent)基本入門/名詞 介紹”
影片備份: