AI工程已经迅速成为科技领域最有价值的技能之一
问题在于大多数初学者并不清楚自己到底应该学什么
有些人从机器学习理论开始
有些人陷入无休止地观看教程中
有些人直接跳到提示词和代理上,却不了解API、后端基础,或真实产品究竟是如何构建的
结果通常都一样:一头雾水,实际技能却很少
如果你的目标是成为一名AI工程师,你不需要精通人工智能的每一个领域
你需要学会如何在现实世界中构建有用的AI系统
这意味着学习如何:
使用大型语言模型(LLM)构建端到端的应用程序
使用诸如 OpenAI 和 Anthropic 的模型 API
正确设计提示和上下文
使用结构化输出和工具调用
在需要时加入检索机制
部署项目,让人们能够真正使用它们
本指南旨在为你提供一份实用的6个月路线图
这篇文章超过 10,000 字,因此阅读可能需要几个小时甚至更久
但它的真正价值在于,对于你需要学习的每项技能,都有资源和清晰的操作说明
这样,你在六个月内就能达到 AI 工程的水平,并且在前 1-2 个月内就可以开始自己使用
撰写这篇文章花费了超过 40 小时,我和我的朋友 @andy_ai0 一起完成了
他刚开始在X上打造个人品牌,但他对AI非常了解,并且在这篇文章中提供了很多帮助
我确实认为在他成长的过程中,他值得你的关注和支持
现在让我们开始阅读这篇文章⬇️
AI工程师实际上是做什么的
很多人听到“AI工程师”这个说法时,会想象有人从零开始训练巨型模型
但实际上,大多数现代AI工程师做的事情要更务实得多
他们是在现有模型之上构建产品和系统
这通常包括:
接入 LLM API
设计提示词和上下文流程
构建聊天、搜索或自动化系统
集成工具、数据库和外部 API
处理结构化输出
提升可靠性、成本效率和延迟表现
将 AI 功能部署到真实应用中
因此在实践中,AI 工程师通常处于以下两者之间:
软件工程
产品工程
自动化
应用型人工智能
这就是这个岗位增长如此之快的原因
公司不再只需要研究人员
他们还需要能够把模型转化为实用产品的人才
这也是为什么这份路线图更少关注深奥理论,而更注重实际执行
如果你能够构建真实的LLM应用、检索系统、自动化流程以及可投入生产的工作流,那么你已经比大多数初学者更接近具备就业能力了
⏩------------------------------------------------------------------------⏪
第1个月:在编程和基础知识上打下足够扎实的基础
你这个月的目标:成为一名具备基本实战能力的Python开发者
你不需要成为专家,你只需要停止去Google基础语法,并且能够自信地构建简单的程序
AI工程首先也是最重要的是软件工程
后面几个月的所有内容都默认你已经能够编写整洁的Python代码、使用终端、调用API并管理代码库。本月是你的基础
要学习的内容
1. Python
Python 是人工智能工程的语言。就这样。接下来六个月里,你将遇到的几乎所有库、API 和教程都将使用 Python。
如何学习它:
从一个结构化的课程开始,强迫你编写代码,而不仅仅是观看视频。
初学者最常犯的错误是被动地消费内容,一边阅读一边点头,却从不打开代码编辑器
通过在学习过程中把每一个示例都亲手编码来对抗这一点
资源:
1. Python for Everybody(Coursera,可免费旁听)
链接: https://www.coursera.org/specializations/python
绝对初学者的最佳起点。Chuck博士是互联网上最适合初学者的Python老师之一
2. freeCodeCamp Python课程(YouTube,免费)
链接: https://www.youtube.com/watch?v=rfscVS0vtbw
一段全面的4小时视频,涵盖所有基础知识
3. CS50P:Python编程入门(哈佛,免费)
链接:https://cs50.harvard.edu/python/
更加严谨。包含习题集和期末项目。如果你想要系统化学习,这很适合
4. 官方 Python 文档(教程)
链接:https://docs.python.org/3/tutorial/
干巴巴但权威,可作为参考
关注点:
变量、数据类型、循环、条件语句、函数
列表、字典、集合、元组
文件输入输出与 JSON 处理
类与基础面向对象编程(仅需理解你正在阅读的内容)
使用 try/except 进行错误处理
虚拟环境(venv)和 pip
包管理——理解 requirements.txt
实践项目:用 Python 构建一个简单的 CLI 工具。例如一个读取/写入 JSON 文件的个人开支跟踪器,或者一个调用公共 API(如天气 API)并打印格式化结果的脚本
2. Git 和 GitHub
Git 是专业开发者用来保存和共享代码的工具。你会经常用到它,用于对项目进行版本管理、协作开发,并在 GitHub 上展示你的作品集项目
如何学习它:
Git 一开始会让人困惑,因为它的思维模型并不直观
不要试图死记命令,而是要理解 Git 在解决什么问题
(跟踪更改、支持协作、让你撤销错误),这样这些命令就会变得容易理解
资源:
1. GitHub Skills(免费,交互式)
链接: https://skills.github.com/
GitHub 内建的官方互动课程。从这里开始
2. 学习 Git 分支(免费,互动式)
链接: https://learngitbranching.js.org/
无可争议的最佳可视化工具,用于理解分支和合并
3. Pro Git 书籍(免费在线书籍)
链接:https://git-scm.com/book/en/v2
全面的参考资料。跳过你需要的章节
重点关注:
git init, add, commit, push, pull
分支与合并
理解 .gitignore
在 GitHub 上创建仓库并推送本地项目
阅读和编写基本的 README 文件
练习:从现在开始,您构建的每一个项目,甚至是小脚本,都应该放在 GitHub 仓库中。这可以培养习惯并建立您的作品集
3. 命令行 / 终端基础
作为一名AI工程师,你将通过命令行运行脚本、安装软件包、管理服务器以及在文件中导航
在终端中行动缓慢或心存畏惧会成为真正的瓶颈
资源:
1. 最常用的50个Linux和终端命令(初学者完整课程)
链接: https://www.youtube.com/watch?v=ZtqBQ68cfJc
适合Linux/Mac的零基础初学者
2. 你CS教育中缺失的一学期(MIT,免费)
链接: https://missing.csail.mit.edu/
涵盖 shell 脚本、终端工具以及大多数计算机科学课程忽略的命令行流利度
重点关注:
导航: cd, ls, pwd, mkdir, rm
读取文件: cat, less, grep
从终端运行 Python 脚本
环境变量
对 PATH 的基本理解
4. JSON、API、HTTP 及异步基础
从第2个月的第一天起,你就会开始调用LLM API
这意味着在你接触OpenAI或Anthropic的SDK之前,你需要先理解Web API是如何工作的
资源:
1. HTTP基础——MDN Web文档(免费)
链接: https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview
关于HTTP请求和响应如何工作的最清晰解释
2. REST API 教程
链接: https://restfulapi.net/
简短而实用
3. Python requests 库文档
链接: https://requests.readthedocs.io/en/latest/
学习如何在 Python 中调用任何 Web API
4. Python async/await(免费)
链接: https://realpython.com/async-io-python/
理解 async 对于之后处理流式 LLM 响应至关重要
重点关注:
GET、POST 请求——它们是什么以及如何在 Python 中使用
读取和写入 JSON
HTTP 状态码(200、400、401、404、500——每个的含义)
什么是 API 密钥以及基本的身份验证模式
async def 和 await 的作用以及它们为何存在
实践项目:编写一个 Python 脚本,调用一个免费的公共 API(可以尝试 Open-Meteo 获取天气数据——无需 API 密钥),并将结果格式化为整洁的 JSON 输出
5. 基础 SQL 和 Pandas
你不需要成为数据科学家,但你将经常需要检查、查询和处理数据
SQL 基础和 pandas 熟练度将不断为你节省时间
资源:
1 . SQLBolt(免费,互动式)
链接: https://sqlbolt.com/
从零开始学习 SQL 的最快方法。20 节短课程,配有浏览器内练习
2. Pandas 官方入门指南
链接: https://pandas.pydata.org/docs/getting_started/index.html
完成《10 分钟学 Pandas》教程
3. Kaggle Pandas 课程(免费)
链接: https://www.kaggle.com/learn/pandas
动手实践,实用,短时
重点学习内容:
SQL:SELECT, WHERE, GROUP BY, JOIN, ORDER BY
Pandas:加载CSV文件、筛选行、选择列、基础聚合
6. FastAPI
资源:
1. FastAPI 官方教程(免费)
链接:https://fastapi.tiangolo.com/tutorial/
真的是写得最好的框架文档之一
从头到尾完成它。涵盖路径参数、请求体、Pydantic 校验以及运行开发服务器
2. Python API 开发(19 小时课程,freeCodeCamp,YouTube,免费)
链接: https://www.youtube.com/watch?v=ZtqBQ68cfJc
涵盖 API 设计基础,包括路由、序列化、模式验证和 SQL 数据库集成。从零构建一个完整的社交媒体风格 API
重点关注:创建 GET 和 POST 接口,路径和查询参数,使用 Pydantic 的请求体,运行 uvicorn,以及使用 FastAPI 内置的 /docs 界面在不编写客户端的情况下测试 API
第1个月里程碑
在本月底,你应该能够:
编写可以读写文件、调用 API 并处理错误的 Python 程序
使用 Git 对代码进行版本管理,并将项目推送到 GitHub
在终端中自如导航,毫不犹豫
理解什么是HTTP请求,并用Python发起一个请求
使用基础SQL查询SQLite数据库
在本地构建并运行一个简单的FastAPI应用
⏩------------------------------------------------------------------------⏪
第二个月:掌握LLM应用开发
本月目标:使用OpenAI和Anthropic API构建真实的AI驱动应用程序
到月底时,你应该能够熟练地编写可靠的提示,提取模型的结构化数据,调用你的函数,并处理可能出现的所有问题
这是AI工程的核心。路线图中的其他所有内容都建立在你在这里学到的东西之上
要学习的内容
1. 提示工程基础
提示工程不仅仅是礼貌地提问,而是编写指令的技艺,以便从本质上具有概率性的模型中获得一致、可靠的输出
作为一名人工智能工程师,你会在这里花费意想不到的时间
学习方法:
从Anthropic的互动教程开始,因为它最具实践性
然后阅读OpenAI的官方指南。之后,《提示工程指南》会将所有内容整合起来
按顺序完成这三个——每一个都会强化其他部分
资源:
1. Anthropic 的交互式提示工程教程(免费,GitHub)
链接:https://github.com/anthropics/prompt-eng-interactive-tutorial
一个分为 9 章并附有练习的循序渐进课程,旨在为你提供多次练习自己编写和调试提示的机会
以 Jupyter 笔记本形式运行,并使用 Claude API
2. Anthropic 提示工程文档(免费)
链接: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
官方参考。涵盖从基础清晰度到XML结构化和代理系统的所有内容
3. OpenAI提示工程指南(免费)
链接:https://platform.openai.com/docs/guides/prompt-engineering
OpenAI官方指南,涵盖与其模型兼容并能产生更有用输出的提示格式
4. PromptingGuide.ai(免费)
链接:https://www.promptingguide.ai/
涵盖从基础提示到高级策略的核心技巧,还包括函数调用、工具集成和代理系统
关注点:系统消息与用户消息的区别,为什么具体性很重要,思维链式提示(一步步思考),在提示中使用示例(少量示例),以及小的措辞变化如何显著改变输出质量
练习:拿一个实际任务——总结文档、从文本中提取关键信息、分类反馈——并为此写出5个不同的提示。比较输出。你会立刻看到提示设计如何影响结果的可靠性。
3. 结构化输出 / JSON 模式
在实际应用中,你几乎从来不希望从 LLM 获得原始文本,你希望获得可以解析、存储并在代码中使用的结构化数据。
结构化输出通过强制模型匹配你定义的模式来解决这个问题。
资源:
1. OpenAI 结构化输出指南(官方文档,免费)
链接: https://platform.openai.com/docs/guides/structured-outputs
介绍了该功能如何确保模型始终生成符合你的 JSON Schema 的响应,因此你无需担心缺少键或出现幻觉值
2. Instructor 库(免费,开源)
链接:https://python.useinstructor.com/
使用 Pydantic 模型从任何 LLM 提供商获取结构化输出的最简洁方式
通过相同的代码接口兼容 OpenAI、Anthropic、Google 以及 15+ 其他提供商,并在验证失败时自动重试
这才是大多数生产环境中的AI工程师实际使用的内容
3. OpenAI Cookbook:结构化输出入门(免费)
链接:https://developers.openai.com/cookbook/examples/structured_outputs_intro/
包含思维链输出、结构化数据提取和UI生成的实用示例,有助于理解真实世界的应用场景
重点关注:为你的数据定义 Pydantic 模型,将 schema 传递给 API,理解结构化输出与 JSON 模式的区别,以及优雅地处理拒绝情况
练习项目:构建一个发票或收据解析器。给它原始文本(例如 "Invoice #123, $45.99 for 3 widgets, due March 30"),让它返回一个结构化的 Python 对象,包含字段如 invoice_number、amount、items、due_date
4. 函数 / 工具调用
工具调用是将 LLM 从文本生成器转变为可以执行操作的工具的过程——搜索网络、查询数据库、调用你的 API、运行代码。这是本指南中最重要的技能之一
如何理解这一点:模型实际上并不会执行你的函数
它会分析提示,并在判断需要使用工具时返回一个包含函数名称和参数的结构化调用
然后由你的代码执行该调用,并将结果发送回去
资源:
1. OpenAI 函数调用指南(官方文档,免费)
链接:https://platform.openai.com/docs/guides/function-calling
权威参考。涵盖工具定义、5 步调用流程、并行调用和最佳实践
2. Anthropic 工具使用文档(免费)
链接: https://docs.anthropic.com/en/docs/build-with-claude/tool-use
Anthropic 的 Claude 使用指南。概念相同,但语法略有不同
3. OpenAI 食谱:如何使用聊天模型调用函数(免费,GitHub)
链接: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_call_functions_with_chat_models.ipynb
一个完整可运行的 Notebook,通过真实示例演示完整的工具调用循环流程
重点关注:在 JSON Schema 中清晰描述函数、解析工具调用响应、执行函数并将结果回传、处理不需要工具调用的情况,以及 tool_choice: "auto" 的概念
练习项目:构建一个简单的助手,包含三个工具:get_weather(city)、calculate(expression) 和 search_notes(query)(只是在一个硬编码的字典中搜索)。把它们全部连接起来,观察模型根据你的提问决定调用哪个工具
5. 流式响应
流式输出意味着在模型生成内容的同时逐字展示——而不是等待完整响应后再显示。这会让你的应用感觉明显更快、更有生命力
资源:
1. OpenAI 流式文档(官方,免费)
链接:https://platform.openai.com/docs/api-reference/streaming
关于在请求中添加 stream=True 并按块迭代的参考
2. Anthropic 流式文档(官方,免费)
链接:https://docs.anthropic.com/en/api/messages-streaming
Anthropic 的流式 API 参考,包含 Python 示例
3. 流式 LLM API 的工作原理——Simon Willison(免费)
链接:https://til.simonwillison.net/llms/streaming-llm-apis
对 OpenAI、Anthropic 和 Google 所使用的 Server-Sent Events 在底层如何工作的清晰技术解析,有助于理解在 HTTP 层面实际发生了什么
重点关注:设置 stream=True,遍历 delta 数据块,从各个部分组装完整响应,以及使用 StreamingResponse 将流式输出接入 FastAPI 端点
提示:对于面向用户的应用,流式响应几乎总是正确的选择。没有人愿意盯着加载转圈,等上10秒只为一次性看到完整响应
5. 对话状态
LLM 是无状态的——它们在调用之间没有记忆。对话历史需要你通过在每次请求中发送完整的消息列表来管理。理解这一点至关重要
资源:
1. OpenAI 聊天完成指南,管理对话(官方,免费)
链接: https://platform.openai.com/docs/guides/conversation-state
关于 messages 数组的工作原理以及如何管理多轮对话的权威说明
2. Anthropic Messages API 文档(官方,免费)
链接: https://docs.anthropic.com/en/api/messages
Anthropic的等效版本。相同的概念,值得阅读两者,看看它们有什么不同
重点关注: messages数组结构,为什么需要附加用户和助手消息,上下文窗口的限制以及超过限制时会发生什么,以及基本的截断策略(删除最旧的消息,总结历史)
实践项目: 在终端中构建一个简单的多轮聊天机器人。每一轮都附加到messages列表中。添加一个/reset命令来清除历史记录,并在每次交流后打印当前的token数量
6. 成本、延迟与令牌基础
在不了解成本和令牌的情况下发布 AI 应用,会导致意外账单和应用运行缓慢。这虽然无聊,但至关重要
资源:
1. OpenAI 定价页面(官方)
链接: https://openai.com/api/pricing
了解每个模型的输入和输出 tokens 成本。收藏此页,并在选择模型时随时查看
2. Anthropic 定价页面(官方)
链接: https://www.anthropic.com/pricing
Claude 模型也是一样
3. OpenAI Tokenizer 工具(免费、可交互)
链接:https://platform.openai.com/tokenizer
粘贴任意文本,即可准确查看它包含多少 token。在学习过程中请经常使用它
4. Tiktoken(Python 库,免费)
链接:https://github.com/openai/tiktoken
OpenAI 的分词器库,用于在发送请求前统计代码中的 token 数量
重点关注:什么是 token(大约相当于 4 个字符或 3/4 个单词),输入 token 与输出 token 的定价差异,上下文窗口大小如何影响你的使用能力,以及更小更快的模型与更大更智能的模型之间的延迟权衡
另外:不要事事都用 GPT-4/Opus —— 对于简单任务,更便宜的模型通常已经足够
7. 故障处理
LLM API 会出问题。会触发速率限制、响应超时,模型还可能返回格式错误的 JSON。能否优雅地处理这些故障,是区分演示和生产应用的关键
资源:
1. OpenAI 错误代码参考(官方,免费)
链接: https://platform.openai.com/docs/guides/error-codes
你会遇到的每种错误类型以及应对方法
2. Anthropic 错误处理文档(官方,免费)
链接: https://docs.anthropic.com/en/api/errors
Claude 同样适用
3. Tenacity(Python 库,免费)
链接: https://tenacity.readthedocs.io/
一个干净的库,可以为任何 Python 函数添加带有指数回退的重试逻辑。只需要一个装饰器,重试逻辑就能搞定。
关注的重点:速率限制错误 (429) 和指数回退,使用 httpx/requests 处理超时,使用前验证模型输出,回退策略(使用不同的模型重试,返回缓存的响应),以及永远不要因为 LLM 返回意外输出而崩溃你的应用。
8. 提示注入意识
提示注入是 LLM 应用中的头号安全风险
当不受信任的用户输入与系统指令结合时,就会发生这种情况,这允许用户更改、覆盖或注入新的行为到提示中——导致系统执行未预期的操作或生成被操控的输出
你不需要成为安全专家,但在发布任何内容之前,你需要知道这种情况的存在
资源:
1. OWASP 面向大语言模型应用的前十大安全风险 – LLM01: 提示注入(免费)
Link: https://genai.owasp.org/llmrisk/llm01-prompt-injection/
权威分类涵盖直接注入(越狱)、通过文档或网站等外部内容进行的间接注入,以及现实世界的攻击场景
2. OWASP 提示注入防护速查表(免费)
Link: https://cheatsheetseries.owasp.org/cheatsheets/LLM_Prompt_Injection_Prevention_Cheat_Sheet.html
实用的防御性模式:输入验证、权限控制和输出验证
3. Evidently AI:什么是提示注入(免费)
链接:https://www.evidentlyai.com/llm-guide/prompt-injection-llm
一份面向开发者的清晰解释,涵盖攻击类型、风险以及缓解设计模式
需要重点关注的内容:直接注入与间接注入的区别、为什么系统提示并不真正“安全”、工具访问的最小权限原则,以及绝不能在未经验证的情况下信任LLM输出来自动做出关键决策
第2个月里程碑
到本月结束时,你应该能够:
编写提示词,使其在给定任务中产生一致且可靠的输出
使用 Pydantic + Instructor 从任何模型中获取结构化 JSON 数据
配置工具调用,让模型可以调用你的 Python 函数
通过 FastAPI 接口实时流式返回响应
正确管理多轮对话历史
在发送请求前估算其 token 成本
在不导致程序崩溃的情况下处理 API 错误、超时和异常输出
解释什么是提示词注入并应用基本防御措施
⏩------------------------------------------------------------------------⏪
第3个月:系统学习RAG
你这个月的目标:构建能够让LLM从你的文档中回答问题的系统,而不仅仅依赖其训练数据
到本月结束时,你应该能够导入文档、进行向量嵌入并存储,在查询时检索相关片段,并生成有依据、准确且可引用的答案
RAG是当前AI工程领域最紧缺的实用技能之一。几乎所有真实的企业级AI应用场景——客服机器人、内部知识库、文档问答——都是基于它构建的
真正深入理解,而不是照抄教程,才是将优秀工程师与卓越工程师区分开来的关键
1. 嵌入(Embeddings)
在构建RAG系统之前,你需要先理解嵌入究竟是什么——因为它是所有其他内容建立的基础
文本嵌入是将一段文本投射到高维向量空间中的表示
该文本在这个空间中的位置由一长串数字表示
关键在于,语义相似的文本在该空间中会彼此靠近——这正是相似性搜索成为可能的原因
资源:
1 . Stack Overflow 博客:文本嵌入的直观入门(免费)
链接: https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-embeddings/
最佳的初学者解释。由一位从事自然语言处理产品开发多年的开发者撰写,重点是建立正确的直觉,而非数学公式。
2. 谷歌机器学习速成课程:嵌入(免费)
链接: https://developers.google.com/machine-learning/crash-course/embeddings
解释了为什么密集向量表示解决了单热编码无法解决的问题——特别是在捕捉项目之间的语义关系方面
3. HuggingFace:开始使用嵌入(免费)
链接:https://huggingface.co/blog/getting-started-with-embeddings
实践指南。展示了如何使用sentence-transformers库生成嵌入,托管它们,并在真实的FAQ数据集上进行语义搜索
4. OpenAI 嵌入指南(官方文档,免费)
链接:https://platform.openai.com/docs/guides/embeddings
这是使用 OpenAI 的 text-embedding-3-small 和 text-embedding-3-large 模型的代码参考
重点关注:向量的概念是什么,为什么相似的文本会生成相似的向量,余弦相似度是如何工作的,嵌入模型(OpenAI、HuggingFace sentence-transformers)之间的区别,以及嵌入维度在实践中的含义
练习:选取20个相关主题的句子,使用OpenAI或sentence-transformers对它们进行向量嵌入,然后编写一个简单的最近邻搜索,返回与查询最相似的3个句子。这实际上就是RAG核心思想的微型版本
2. 分块(Chunking)
你的文档整体太大,无法直接进行嵌入。分块是指在嵌入之前将文档拆分成更小片段的过程
你如何对文档进行分块会直接影响系统查找相关信息和给出准确答案的能力;即使检索系统本身完美,如果是在准备不良的数据上进行搜索,也会失败
资源:
1. Weaviate:RAG 的分块策略(免费)
链接:https://weaviate.io/blog/chunking-strategies-for-rag
最实用的指南。涵盖固定大小、递归和语义分块,并提供关于何时使用每种方法的清晰指导
2. 非结构化:RAG最佳实践中的分块(免费)
链接:https://unstructured.io/blog/chunking-for-rag-best-practices
关于分块大小、重叠以及嵌入模型的上下文窗口如何施加硬性限制的技术深度解析
一个好的实验起点是大约250个标记的分块大小(约1,000个字符),并且连续分块之间有10-20%的重叠,以避免在边界处丢失上下文
3. LangChain 文本分割器文档(官方,免费)
链接: https://python.langchain.com/docs/concepts/text_splitters/
用于在代码中使用 RecursiveCharacterTextSplitter、MarkdownTextSplitter 和语义分割器的实用参考
关注重点: 以重叠的固定大小分块作为基础,递归分块用于结构化文档,语义分块用于更好的边界检测,以及核心权衡: 分块过大失去检索精度;分块过小失去上下文
新手提示:从 LangChain 的 RecursiveCharacterTextSplitter 开始,设置 chunk_size=500 和 chunk_overlap=50。这是适用于大多数文档的最合理默认值,并能为你提供一个可持续改进的基线
3. 向量数据库
一旦你有了嵌入(embeddings),你就需要一个地方来高效地存储和检索它们。这正是向量数据库的用途
正确的选择取决于你的具体情况:本地快速原型开发可使用 Chroma,需要托管的一体化扩展可选 Pinecone,追求开源灵活性并具备强大混合搜索可选 Weaviate,复杂过滤和高性价比自托管可选 Qdrant,如果你已经在使用 PostgreSQL 并希望避免引入新系统,则可以使用 pgvector
资源:
1. Chroma 官方文档(免费)
链接:https://docs.trychroma.com/
Chroma 非常适合重视开发速度和简洁性的个人开发者和小型团队,它可在内存中或本地运行,无需管理任何基础设施
2. Pinecone 学习中心(免费)
链接: https://www.pinecone.io/learn/
优秀的免费教程,涵盖向量搜索概念、混合搜索以及 RAG 流水线。即使不使用 Pinecone,也有很好的与供应商无关的学习材料
3. Qdrant 文档(免费)
链接: https://qdrant.tech/documentation/
用于生产环境并支持高级过滤的最佳开源方案。速度很快、灵活,并且可免费自托管
4. pgvector(开源,免费)
链接: https://github.com/pgvector/pgvector
如果你正在构建一个已经使用 PostgreSQL 的项目,pgvector 可以直接在现有数据库中添加向量搜索,无需新的基础设施
关注重点:创建集合、插入带有元数据的嵌入、使用 top_k 按相似度查询,以及在查询时按元数据过滤
你不需要理解索引算法(HNSW、IVF)——只需了解如何使用它们
实践项目:将任意公共文档(例如 Python 文档或维基百科文章导出)中的 50-100 页索引到 Chroma,并附上元数据(来源 URL、章节标题)。编写查询函数,能够为任何问题检索最相关的 5 个片段
4. 元数据过滤
仅靠原始相似度搜索不足以满足真实应用。元数据过滤允许你将检索范围限制在一个相关子集——按日期、来源、文档类型、用户、类别,或你与每个数据块一起存储的任何其他属性进行筛选
资源:
1. Pinecone:元数据过滤指南(免费)
链接: https://docs.pinecone.io/guides/data/filter-with-metadata
通过清晰的讲解和代码示例,说明如何在相似性搜索之前或过程中根据元数据字段对向量进行过滤
2. LlamaIndex:元数据过滤指南(官方文档,免费)
链接: https://docs.llamaindex.ai/en/stable/module_guides/querying/node_postprocessors/node_postprocessors/
解释了如何在 LlamaIndex 流水线中在查询时应用过滤器
需要关注的要点:在数据摄取阶段为每个分块打上相关元数据标签(来源文件名、页码、章节、日期、类别),并在查询时使用这些字段来过滤结果。这正是一个玩具演示与生产系统之间的区别,在生产系统中,用户可以提出“只显示来自 2025 年第四季度到 2026 年第一季度报告的结果”这样的请求
5. 重排序
重排序是一种为任何关键词或向量搜索系统的搜索质量提供语义增强的技术
在第一阶段检索返回候选集合之后,重排序器会根据与查询的真实上下文相关性对这些结果重新评分——而不仅仅是向量相似度
两阶段模式是:嵌入并搜索(快速、近似)→ 对前k个结果进行重排序(较慢、更准确)。结果是在仅增加适度延迟成本的情况下,显著提升检索质量
资源:
1. Cohere 重排序文档(官方,免费)
链接: https://docs.cohere.com/docs/reranking-with-cohere
最佳入门起点。涵盖完整的重排序工作流程,包括电子邮件和 JSON 文档等半结构化数据。只需添加一行代码即可集成到现有的检索流水线中
2. LangChain:Cohere 重排序器集成(官方文档,免费)
链接: https://python.langchain.com/docs/integrations/retrievers/cohere-reranker/
解释了如何使用 ContextualCompressionRetriever 将 Cohere 重排序接入 LangChain 的检索器中
需要关注的重点:两阶段“先检索再重排”的模式,双编码器(用于第一阶段的向量检索)与交叉编码器(用于重排序)的区别,以及在前20个结果 vs 前5个结果上进行重排序在延迟与效果之间的实际权衡
6. 检索质量问题
大多数 RAG 失败并不是模型失败,而是检索失败。理解检索可能出错的方式,对于调试真实系统至关重要
常见需要学习的问题:
语义漂移:查询的嵌入与相关片段的嵌入不匹配,即使信息实际上存在。解决方法:尝试查询重写或使用 HyDE(假设性文档嵌入)
分块边界问题:相关信息被分割在两个片段中。解决方法:增加重叠或使用语义分块
缺失元数据上下文:片段在语义上与查询相似,但属于错误的文档、日期或用户。解决方法:使用元数据过滤
Top-k 太小:正确的片段存在,但不在检索结果的前 5 名中。修复方法:在检索时增加 top_k,并在重排序后减少
资源:
1. LangChain:查询转换(免费)
链接:https://python.langchain.com/docs/how_to/#query-analysis
涵盖查询重写、Step-back Prompting 和 HyDE
2. Pinecone:提升检索质量(免费)
链接:https://www.pinecone.io/learn/retrieval-augmented-generation/#retrieval-quality
对常见失败模式的实用演练,并提供修复方法
7. 减少幻觉
与基础大模型相比,RAG 能显著减少幻觉,但并不能完全消除
通过在运行时为模型提供检索到的事实,RAG 将其回答锚定在真实来源上,而不是仅依赖训练数据,模型的输出甚至可以引用这些来源,从而提高透明度和可信度
但检索失败、不良切片以及信息冲突仍然可能导致模型编造内容
资源:
1. Zep:减少大型语言模型幻觉——开发者指南(免费)
链接:https://www.getzep.com/ai-agents/reducing-llm-hallucinations/
面向开发者的实用指南,涵盖提示基础策略、事实任务的思路链方法以及输出验证模式
2. Voiceflow:减少大语言模型幻觉的5种方法(免费)
链接:https://www.voiceflow.com/blog/prevent-llm-hallucinations
对组合策略的良好概述:RAG + 思维链 + 护栏机制 结合使用优于任何单一方法
重点关注:提示模型仅根据提供的上下文作答(当答案不存在时说“我不知道”),在展示回复前设置置信度阈值,并且在归因于大语言模型之前始终验证检索质量
8. 引用与溯源
一个有依据的RAG系统不仅会给出答案——还会告诉你答案来自哪里。这对于建立用户信任和进行调试至关重要
资源:
1. Anthropic:为Claude提供来源(文档,免费)
链接: https://docs.anthropic.com/en/docs/build-with-claude/citations
解释了如何提示 Claude 生成带有来源引用的答复
2. LangChain:带来源的 RAG(免费)
链接: https://python.langchain.com/docs/how_to/qa_sources/
讲解了如何在 LangChain RAG 管道中在返回答案的同时附带源文档
需要重点关注的内容:将分块元数据(源文件名、页码、URL)传入提示上下文,指示模型在回答中引用来源,并在你的 UI 或 API 响应中展示这些来源
9. 你的 RAG 框架:LangChain 还是 LlamaIndex
你不需要从零开始构建 RAG 管道。有两个框架在这个领域占据主导地位,值得了解:
LlamaIndex 针对搜索和索引优先的场景进行了优化,它将数据摄取、分块、嵌入和查询抽象为几行代码,让你可以在一个下午内构建出可运行的原型
当你的应用更像一个编排引擎时,LangChain 表现尤为出色——它在多智能体工作流、工具调用以及在生成答案前查询多个 LLM 或外部 API 的条件链方面非常擅长
在第 3 个月,从使用 LlamaIndex 进行 RAG 开始。当你进入第 4 个月的智能体相关工作时,再转向 LangChain
资源:
1. LlamaIndex:RAG介绍(官方文档,免费)
链接:https://developers.llamaindex.ai/python/framework/understanding/rag/
涵盖RAG的五个关键阶段:加载、索引、存储、查询和评估,以及LlamaIndex如何处理每个阶段
2. LlamaIndex入门教程(官方文档,免费)
链接: https://developers.llamaindex.ai/python/framework/getting_started/starter_example/
官方快速入门。在不到30行代码内构建一个可运行的RAG系统
3. LangChain:构建一个RAG代理(官方文档,免费)
链接: https://docs.langchain.com/oss/python/langchain/rag
展示如何使用RAG代理构建一个基于非结构化文本的问答应用,从40行的简化版本到一个完整的检索管道并进行重新排名。
实践项目:构建一个"与你的文档聊天"的应用。导入10-20个PDF或文本文件(你自己的笔记、一本教科书章节、产品文档—任何内容)。构建一个FastAPI端点,接受问题,检索前5个最相关的片段并进行重新排名,并从Claude或OpenAI返回带有引用的答案。这是一个真实的作品集项目。
第三个月的里程碑
到本月末,你应该能够:
解释什么是嵌入,并说明为什么相似的文本会产生相似的向量
使用适当的策略智能地分块任何文档
在向量数据库中存储和查询嵌入,并进行元数据过滤
添加重新排名步骤以提高检索质量
系统性地调试常见的检索失败问题
使用 LlamaIndex 或 LangChain 构建完整的端到端 RAG 流水线,能够摄取文档、检索相关片段,并返回有依据且带引用的答案
⏩------------------------------------------------------------------------⏪
第4个月:Agents、Tools、Workflows 和 Evals
你这个月的目标:构建能够自主执行一系列操作的AI系统,串联多步骤工作流程,并对其是否有效进行批判性评估
到最后,你应该能够从零开始构建一个真正的智能体,理解何时不应该使用智能体,并能够衡量你所构建的任何东西的性能
这正是AI工程开始变得真正复杂的地方。第4个月所掌握的技能,正是将初级AI工程师与能够端到端负责整个AI功能的人区分开的关键
1. 智能体循环
Agent并不神奇,它其实是一个出奇简单的模式
可以把agent看作由目标驱动的系统,不断在观察、推理和行动之间循环
这种循环让它们能够处理超越简单问答的任务,进入真正的自动化、工具使用以及即时适应
“thinking”发生在提示词中,“branching”是agent在可用工具之间做选择的时候,“doing”发生在我们调用外部函数时。其余的都只是管道工作
一旦你内化了这个,即使是最复杂的智能体框架也变得可读
资源:
1. Anthropic: 构建高效智能体(官方,免费)
链接:https://www.anthropic.com/research/building-effective-agents
关于生产环境中代理的最佳文章。动手写任何一行代理代码之前先读这个。
2. OpenAI:《构建代理的实用指南》(官方PDF,免费)
链接: https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
OpenAI 的配套指南,涵盖生产环境中的代理模式、护栏机制和安全模式
3. freeCodeCamp:开源 LLM 代理手册(免费)
链接:https://www.freecodecamp.org/news/the-open-source-llm-agent-handbook/
一本全面的实用指南,涵盖代理循环、LangGraph、CrewAI、规划、记忆和工具使用。适合快速动手实践
4. LangChain 学院:LangGraph 入门(免费课程)
链接: https://academy.langchain.com/courses/intro-to-langgraph
LangGraph 官方免费课程,这是目前使用最广泛的智能体编排框架。内容涵盖状态、记忆、人类参与(human-in-the-loop)等
重点关注:perceive → plan → act → observe 循环,智能体循环是如何终止的,当循环中工具调用失败时会发生什么,以及为什么智能体本质上就是由 LLM 做分支决策的 while 循环
练习:不使用任何框架,从零构建一个智能体——直接使用 OpenAI 或 Anthropic API。给它 3 个工具、一个目标和一个循环。这是你真正理解框架抽象内容的最有价值的实践
2. 工具选择
编写好的工具就是成功的一半。你的工具及其参数的描述就是 LLM 的用户手册。如果手册含糊不清,LLM 就会错误使用工具。务必痛苦而毫不妥协地明确说明
描述不清的工具会被错误调用、在错误的时机调用,或者完全被忽略。描述清晰的工具则行为可预测,并能在各种输入下被正确选择
资源:
1. OpenAI:函数调用最佳实践(官方文档,免费)
链接:https://platform.openai.com/docs/guides/function-calling/best-practices
关于如何编写可靠运行的工具描述的权威指南,包含命名规范和参数文档模式
2. Anthropic:工具使用最佳实践(官方文档,免费)
链接: https://docs.anthropic.com/en/docs/build-with-claude/tool-use/implement-tool-use#best-practices-for-tool-definitions
Anthropic 的等效内容。特别注意关于何时让模型选择工具与强制使用特定工具的指导意见
重点:编写自解释的动词作为工具名称,编写描述来解释何时调用工具(不仅仅是它的功能),保持参数最小化且类型明确,并将 LLM 设计为工具的调用者
初学者提示:通过问自己以下问题来测试每个工具描述:“如果没有文档,只有这个 JSON 架构,我是否能完全理解何时以及如何调用这个工具?”如果不能,它需要更多的修改
3. 状态管理
在 LangGraph 中,状态是一个在图中流动的共享内存对象。它存储所有相关信息——消息、变量、中间结果和决策历史——并在整个执行过程中自动进行管理
理解状态是构建能够处理多轮任务、从失败中恢复并在组件之间顺畅交接的智能体的关键
资源:
1. LangGraph 官方文档:状态管理(免费)
链接: https://langchain-ai.github.io/langgraph/concepts/low_level/#state
权威参考资料。涵盖状态模式、reducers 以及状态如何在节点和边之间流动
2. DataCamp:LangGraph 代理教程(免费)
链接: https://www.datacamp.com/tutorial/langgraph-agents
通过动手代码讲解状态、节点和边的基础知识,并逐步构建具有跨会话持久内存的有状态代理
3. Real Python:Python 中的 LangGraph(免费)
链接: https://realpython.com/langgraph-python/
一份详尽的教程,构建一个完整的有状态 LangGraph 智能体,并对状态图和条件边进行详细讲解
需要重点关注:使用 TypedDict 定义状态结构,reducer 如何用于合并并行更新,内存状态与持久化检查点之间的区别,以及如何通过在执行过程中检查和修改状态来实现 human-in-the-loop 暂停机制
4. 智能体中的重试与失败处理
智能体的失败方式不同于常规的 LLM 调用。循环过程中一次错误的工具调用可能会破坏状态、导致无限循环,或悄然生成错误答案。你需要针对这些情况制定明确的策略
资源:
1. LangGraph:错误处理与重试(官方文档,免费)
链接:https://langchain-ai.github.io/langgraph/how-tos/autofill-tool-errors/
讲解了如何在 LangGraph 的工具节点层级添加自动错误处理和重试逻辑
2. OpenAI 实用代理指南:保护措施部分(免费)
链接:https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
介绍了作为分层防御的保护措施,结合了基于大语言模型(LLM)的检查、基于规则的过滤器(如正则表达式)和审查 API,在代理循环的每个阶段对输入和输出进行筛查。
关注重点:最大迭代次数限制以防止无限循环、每个工具的重试和指数退避、在工具执行层捕获和记录异常而不崩溃代理,以及在何时向用户展示失败信息与静默重试。
5. 何时不使用代理
这是 AI 工程中最重要且最容易被忽视的技能之一。代理很令人兴奋,但它们也很慢、成本高、不可预测且难以调试。知道何时使用更简单的方法是良好判断力的体现。
Anthropic 建议尽可能找到最简单的解决方案,只有在必要时才增加复杂性——这可能意味着根本不构建代理系统。
代理系统以延迟和成本换取更好的任务表现,你应该仔细考虑这种权衡何时合理。
决策框架如下:
如果任务只需在提供正确上下文的情况下通过一次提示即可完成,就使用单次 LLM 调用
如果步骤是固定且可预测的,就使用工作流
只有当步骤数量真正不可预测且需要动态决策时才使用代理
资源:
1. Anthropic:构建高效代理,何时使用代理(官方,免费)
链接:https://www.anthropic.com/research/building-effective-agents
关于这个问题最权威的答案,直接来自构建这些模型的团队
2. Simon Willison:设计代理循环(免费)
链接:https://simonwillison.net/2025/Sep/30/designing-agentic-loops/
一位资深工程师对何时引入代理复杂性是合理的,以及如何思考代理循环设计的实用见解
需要记住的要点:由3个固定LLM调用组成的链条,始终会比一个可能进行3次调用的代理更快、更便宜、也更易调试。仅在真正开放式任务中才使用代理
6. 多步骤工作流
在“单一提示”和“完整智能体”之间,存在一个广阔且高效的中间地带:工作流。当任务可以被清晰地拆分为固定的子任务时,工作流是理想选择——通过让每一次单独的LLM调用变得更简单、更专注,以延迟为代价换取更高的准确性。
常见模式包括提示链(一次调用的输出作为下一次的输入)、路由(对输入进行分类并发送给专门的处理器)、并行化(同时运行多个调用并进行汇总),以及编排器-子代理(一个LLM负责规划,其他负责执行)。
资源:
1. Anthropic:工作流模式(官方,免费)
链接:https://www.anthropic.com/research/building-effective-agents#workflow-patterns
涵盖所有主要模式,并配有图示和代码示例。其中并行化和编排部分特别实用
2. LangGraph:多智能体网络(官方文档,免费)
链接: https://langchain-ai.github.io/langgraph/concepts/multi_agent/
讲解了如何将多个代理连接成一个网络,包括监督者和交接模式
练习项目:构建一个三步内容流水线:
步骤1 – 由一个LLM从文章中提取关键事实
第2步——另一个大型语言模型调用使用这些事实并行生成一条推文、一条领英帖子和一个摘要
第3步——最后一次大型语言模型调用对三者进行质量评分并选择最佳
无需代理,纯工作流
7. 评估工具
评估(Evals)是判断你的 AI 系统是否真正有效的方法——不仅仅是在你手动测试过的示例上,而是能够在数百个输入上进行系统性的验证。
AI 智能体(agents)功能强大,但部署起来很复杂,因为它们具有概率性、多步骤的行为,这会带来许多潜在的失败点。
智能体的不同组成部分——LLM、工具、检索器以及工作流程——都需要各自对应的评估方法。
资源:
1. DeepEval(开源,免费)
链接: https://deepeval.com/docs/getting-started
一个受 pytest 启发的开源 LLM 评测框架。编写包含输入和期望输出的测试用例,使用 50+ 内置指标(包括幻觉、答案相关性和事实一致性)运行,并能捕捉不同版本之间的回归问题
2. Promptfoo(开源,免费)
链接: https://github.com/promptfoo/promptfoo
一个用于测试和评估 LLM 应用的 CLI 工具和库,支持自动化测试套件。支持在多个模型之间对多个提示进行并排比较、CI/CD 集成,以及用于安全漏洞的红队测试
3. LangSmith(免费层)
链接: https://smith.langchain.com/
用于 LangChain 和 LangGraph 应用的追踪、调试和评估。免费套餐很慷慨,追踪界面使调试代理循环变得极其简单
4. Ragas(开源,免费)
链接: https://docs.ragas.io/
专门针对 RAG 流水线的评估框架。衡量准确性、答案相关性、上下文精确度和上下文召回率。如果你从第三个月开始评估 RAG 系统,这是必备工具
重点关注:构建一个包含 20-50 个具有代表性的输入及预期输出或评分标准的黄金测试集,编写能够确定性评分输出的评估函数(如字符串匹配、JSON 模式验证)或使用 LLM 作为评判者,并在更改提示或更换模型时自动运行评估
关键心态:评估不是可选的润色。每次修改提示、交换模型或调整检索而不运行评估,都是一次赌博。那些发布可靠 AI 产品的工程师会不断运行评估
8. 任务成功指标
除了自动化评估之外,你还需要能够告诉你代理是否实现其实际目标的指标
资源:
1. Hamel Husain:你的 AI 产品需要评估(免费)
链接:https://hamel.dev/blog/posts/evals/
这是关于为真实生产级 AI 系统构建评估流水线最实用的文章之一,作者是在大规模实践中有经验的人
2. OpenAI Evals 框架(开源,免费)
链接:https://github.com/openai/evals
OpenAI 自己的评估框架,包含大量由社区贡献的评估模式库,你可以根据需要进行调整
重点关注:过程指标(代理是否调用了正确的工具?)与结果指标(任务是否成功?)之间的区别,在构建任何内容之前先定义清晰的成功标准,以及使用 LLM 作为评判者来评估那些难以进行精确匹配的输出(例如长篇回答或多步骤推理过程)
练习项目:将你第3个月的RAG流水线升级,围绕它构建一个完整的评估框架。从你的文档中创建30个问答对,将它们通过你的流水线运行,并使用DeepEval从相关性、忠实性和完整性三个方面为每个答案打分。然后更改一个变量(分块大小、模型、top-k),重新运行以查看是否有所改进
第4个月里程碑
在本月结束时,你应该能够:
解释什么是代理循环,并在不使用框架的情况下从零实现一个
编写能够被正确且稳定选择的工具描述
使用 LangGraph 或等效方案正确管理代理状态
在代理循环中处理故障而不崩溃
自信地判断任务是需要代理、工作流还是单一提示
构建多步骤工作流,将LLM调用进行串联、路由和并行化
编写自动化评估,在你更改提示或模型时捕捉性能回退
为你构建的任何AI系统定义并衡量任务成功指标
⏩------------------------------------------------------------------------⏪
第五个月:部署、产品思维和可靠性
这个月的目标:将你所构建的一切做好生产准备
到月底,你应该能够部署一个能够处理真实用户、真实流量和真实故障的AI应用,且在凌晨2点时不会崩溃
大多数AI工程师都会在这里停滞不前。他们能够构建出一个很棒的演示,但无法交付一个能够经受住真实世界考验的产品
这里的技能正是公司真正愿意付钱购买的:可靠性、安全性、成本控制,以及当不可避免出现故障时让系统持续运行的能力
1. FastAPI 生产环境实践
从第1个月开始你已经知道如何构建一个 FastAPI 应用。现在你需要让它能够承受生产环境的真实流量
开发环境和生产环境之间的差距非常残酷。使用带有 --reload 的单个 uvicorn 进程在开发时没问题。但一旦真实流量到来,在生产环境中它立刻就会成为瓶颈
你真正需要的是:多工作进程 ASGI 配置、正确的错误处理中间件、健康检查端点以及 CORS 策略
资源:
1. FastAPI 部署文档(官方,免费)
链接: https://fastapi.tiangolo.com/deployment/
涵盖 Uvicorn worker、Gunicorn 和 Docker 部署的官方指南。在开始任何其他内容之前先从这里入手
2. FastAPI 生产部署指南(CYS 文档,免费)
链接:https://craftyourstartup.com/cys-docs/fastapi-production-deployment/
全面的生产实践模式:Gunicorn 配置、Nginx 反向代理、健康检查、速率限制。包含可直接参考和修改的真实配置文件
3. FastAPI 生产环境最佳实践(FastLaunchAPI,免费)
链接:https://fastlaunchapi.dev/blog/fastapi-best-practices-production-2026
涵盖异步数据库连接池、Redis 缓存、JWT 认证以及后台任务。这些是来自真实模板的生产验证模式,已被 100+ 开发者使用
重点关注:使用 Gunicorn 搭配 Uvicorn workers(而不是直接使用 Uvicorn)、设置健康检查端点、添加 CORS 中间件、实现规范的异步数据库会话,以及将不需要阻塞响应的任务放入后台执行
2. Docker
Docker 就是让你不再说“在我电脑上能跑”,而开始交付一致部署的工具
如果你在构建 AI 应用,Docker 可以解决依赖冲突、确保环境一致性,并让扩展变得简单直接
你不需要成为 Docker 专家。你需要的是能够把你的 FastAPI + LLM 应用容器化,并将其部署到任何地方
资源:
1. Docker 官方入门指南(免费)
链接:https://docs.docker.com/get-started/
权威入门起点,涵盖镜像、容器、Dockerfile 和 Docker Compose
2. freeCodeCamp:如何使用 Python 和 Docker 构建和部署多智能体 AI 系统(免费)
链接:https://www.freecodecamp.org/news/build-and-deploy-multi-agent-ai-with-python-and-docker/
实用的端到端教程,使用 Docker Compose 构建真实的多智能体管道。涵盖了关注点分离、定时任务调度和安全性考虑。
3. DataCamp:使用 Docker 部署 LLM 应用程序(免费)
链接: https://www.datacamp.com/tutorial/deploy-llm-applications-using-docker
专门针对具有RAG管道的LLM应用的逐步指南。涵盖Dockerfile创建、环境管理和部署。
4. LLM应用的Docker容器化 (ApXML, 免费)
链接: https://apxml.com/courses/python-llm-workflows/chapter-10-deployment-operational-practices/containerization-docker-llm-apps
涵盖基础镜像选择、依赖管理、多阶段构建,以及用于多服务大模型部署的 Docker Compose
重点关注:为 Python/FastAPI 应用编写 Dockerfile,使用多阶段构建以保持镜像体积小,使用 Docker Compose 搭建多服务架构(应用 + 数据库 + Redis),使用环境变量管理密钥,以及使用 .dockerignore 避免泄露敏感文件
实践项目:将你在第 3 个月的 RAG 应用进行容器化。创建一个 docker-compose.yml,运行你的 FastAPI 应用、一个向量数据库(Chroma 或 Qdrant),以及用于缓存的 Redis。进行部署,使得执行 docker compose up 时可以启动所有服务
3. 后台任务与队列
LLM调用很慢。如果用户要求你的应用处理一个文档,而你让他们等待30秒才收到回应,他们会离开。
后台任务让你可以立即接受请求,异步处理,并在完成时通知用户。
资源:
1. Celery官方入门指南(免费)
链接: https://docs.celeryq.dev/en/stable/getting-started/introduction.html
标准的 Python 任务队列。涵盖基础设置、任务定义以及工作进程管理
2. FastAPI 后台任务文档(官方,免费)
链接: https://fastapi.tiangolo.com/tutorial/background-tasks/
内置的轻量级后台任务,适用于简单场景。用于快速的“发起即忘”(fire-and-forget)任务,更复杂的任务使用 Celery
重点关注:理解何时使用 FastAPI 的内置 BackgroundTasks 与像 Celery 这样的任务队列,配置 Redis 作为消息代理,处理任务失败与重试,以及向用户返回任务状态
4. 身份认证与 API Key 安全
如果你的 AI 应用提供 API,就必须进行身份认证。否则任何人都可以调用你的端点,耗光你的 LLM 额度,等你醒来可能已经产生了 5000 美元的账单
资源:
1. FastAPI 安全文档(官方,免费)
链接:https://fastapi.tiangolo.com/tutorial/security/
涵盖 OAuth2、JWT 令牌、API 密钥以及基于依赖的认证模式。官方参考资料,建议完整学习整个教程
2. OWASP API 安全 Top 10(免费)
链接:https://owasp.org/API-Security/
权威的 API 安全风险清单。在发布任何内容之前,先了解认证失效、注入攻击和批量赋值等问题
3. Auth0:API 身份验证最佳实践(免费)
链接: https://auth0.com/docs/get-started/authentication-and-authorization
在 API 中实现身份验证与授权的实用指南
重点关注:用于用户认证的 JWT 令牌、用于服务间通信的 API 密钥管理、按用户/密钥进行速率限制、绝不要在代码中存储密钥(使用环境变量),以及理解身份验证(你是谁)与授权(你能做什么)之间的区别
5. 日志与可观测性
在生产环境中,如果你看不到发生了什么,就无法修复出问题的地方
LLM 应用有一个独特的挑战:模型可能返回 200 状态码,但仍然生成无用或产生幻觉的答案。传统监控无法捕捉这一点。你需要针对 LLM 的可观测性
资源:
1. Langfuse(开源,免费层)
Link: https://langfuse.com/docs/observability/overview
开源的大语言模型可观测性平台。追踪每一次请求:发送的提示词、接收到的响应、token 使用情况、延迟、工具调用。支持提示词版本管理、评估,以及基于 LLM-as-judge 的评分。可与 OpenAI、Anthropic、LangChain、LlamaIndex 集成
2. LangSmith(免费层)
Link: https://smith.langchain.com/
来自 LangChain 团队。如果你正在使用 LangChain/LangGraph,只需设置一个环境变量即可。支持追踪、调试、监控仪表盘以及在线评估。免费套餐对开发和小规模生产来说非常友好
3. Python Structlog(免费)
链接:https://www.structlog.org/
用于 Python 的结构化日志工具。生成真正可搜索、可解析的 JSON 日志。相比 print() 或基础日志记录,更适合生产环境应用
需要重点关注的内容:追踪每一次 LLM 调用(输入提示词、输出、tokens、延迟、成本),使用 JSON 输出进行结构化日志记录,搭建展示请求量、错误率和每日成本的仪表板,并在出现故障或成本激增时发出告警
6. 提示词与版本管理
在生产环境中,你的提示词就是代码。它们需要版本控制、测试以及回滚能力
在生产环境中修改提示词却不记录改动内容,是导致系统出问题却无法定位原因的典型方式
资源:
1. Langfuse 提示词管理(免费)
链接:https://langfuse.com/docs/prompts
集中化的提示词版本管理,内置测试用的 Playground。将提示词与应用代码分开进行版本控制。无需重新部署应用即可部署提示词更改
2. Anthropic 提示词管理最佳实践(免费)
链接: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview
用于大规模组织、迭代和管理提示词的最佳实践
重点关注:将提示词存储在应用代码之外、对每一次提示词更改进行版本管理、在生产环境中对提示词变体进行 A/B 测试,以及当新提示词表现更差时制定回滚策略
7. 成本监控和速率限制
LLM API 按 token 收费。如果没有成本控制,流量激增或提示词中的错误可能在几分钟内花掉数百美元
资源:
1. OpenAI 使用仪表板(官方)
链接: https://platform.openai.com/usage
按模型、按天跟踪支出,并设置使用限制
2. Anthropic 使用情况仪表板(官方)
链接: https://console.anthropic.com/同样适用于 Claude API 使用情况
3. Helicone(免费层)
链接:https://www.helicone.ai/
基于代理的可观测性工具,可捕获每一次 LLM 调用并自动跟踪成本。只需一行代码即可设置:只需更改你的基础 URL
4. LiteLLM(开源,免费)
链接: https://github.com/BerriAI/litellm
为100多个大型语言模型提供统一接口。包括预算管理、速率限制以及跨提供商的支出跟踪
重点关注: 为每天/每月设置严格的支出上限,在你的API中实施每用户速率限制,为简单任务使用更便宜的模型(不要对所有任务都使用GPT-4/Opus),使用Redis缓存重复的相同请求,并监控每次请求的成本以提前发现昂贵的提示
8. 缓存
如果你有20%的用户提出类似的问题,那么你实际上为同一个LLM调用支付了20次费用
缓存是同时降低成本和延迟的最简单方法
资源:
1. Redis 官方文档(免费)
链接: https://redis.io/docs/
标准的内存数据存储。快速、简单,并且非常适合用于LLM响应缓存
2. GPTCache(开源,免费)
链接: https://github.com/zilliztech/GPTCache
专为大型语言模型(LLM)应用设计的语义缓存。使用嵌入相似度来查找语义上相似(不仅仅是完全相同)查询的缓存响应
关注重点:对完全相同的提示使用精确匹配缓存,对相似查询使用语义缓存,缓存失效策略(基于 TTL 最简单),以及通过测量缓存命中率来了解实际节省成本
第 5 个月里程碑
到本月底,你应该能够:
在 Docker 中以正确的生产配置部署 FastAPI + LLM 应用
使用后台任务和队列处理长时间运行的任务
通过身份验证、速率限制和 API 密钥管理来保护你的 API
使用 Langfuse 或 LangSmith 跟踪并调试 LLM 调用
管理带版本控制和回滚功能的提示
实时监控费用并设置消费限制
缓存LLM响应以减少延迟和成本
⏩------------------------------------------------------------------------⏪
第6个月:专精并具备就业能力
你所获得的这些知识和技能可以应用于三个方向(当然这只是我看到的)
你需要从中选择一个方向并专注练习
尽管以上提到的一切最好也都是通过纯实践来学习的
方向 1:AI 产品工程师
如果你想快速找到创业公司工作,这是最佳选择
这是最常见的路径。你将构建真实用户可以互动的 AI 驱动产品
你在第 1-5 个月已经掌握了大部分技能。现在在产品方面更深入发展
重点关注:
LLM 应用
RAG
智能体
部署
产品用户体验
本月要学习的内容:
1. 端到端产品构建
别再只做教程了。去做人们真正能用的产品
资源:
1. Vercel AI SDK(免费)
链接: https://sdk.vercel.ai/docs
构建支持流式处理的AI驱动UI的最快方式。提供React、Next.js和Vue集成,内置流式UI组件
2. Streamlit(免费)
链接: https://docs.streamlit.io/
使用纯Python构建数据应用和AI演示。非常适合内部工具和MVP,不适用于生产级规模的UI
3. Gradio(免费)
链接: https://www.gradio.app/docs 用最少的代码快速构建 ML/AI 界面。特别适合用于演示模型和构建原型
重点关注:本月完成并发布 2-3 个可以演示的完整项目。一个“与文档聊天”的应用、一个 AI 驱动的内部工具,或者一个能够自动化真实工作流程的智能代理。把它们做出来。放到 GitHub 上。部署到某个地方,让别人可以试用
2. 面向 AI 的产品用户体验(UX)
当 UX 没有考虑到模型的局限性时,AI 产品就会失败
资源:
1. Google:People + AI Guidebook(免费)
链接: https://pair.withgoogle.com/guidebook/
关于设计人机交互的最佳资源。涵盖了设定预期、处理错误和建立信任。
2. Nielsen Norman Group: AI 用户体验指南(免费)
链接: https://www.nngroup.com/topic/artificial-intelligence/
基于研究的 AI 接口指南
需要关注的重点:如何用流式传输处理加载状态、当模型出错时展示什么、如何让用户提供反馈,以及在设计时考虑到 AI 输出具有概率性——它有时会出错
方向 2:应用机器学习 / LLM 工程师
如果你想要更深入的技术岗位,这是最佳选择
这个方向适合希望不仅仅停留在调用 API,而是想理解底层发生了什么的工程师
重点关注:
微调
何时进行微调与提示工程的取舍
评估
推理优化
开源模型
训练流水线
本月要学习的内容:
1. 何时进行微调 vs 提示工程
在应用机器学习中最重要的决策:你是需要改变模型本身,还是只需要改变与它交互的方式?
资源:
1. Google 机器学习速成课程:微调、蒸馏与提示工程(免费)
链接: https://developers.google.com/machine-learning/crash-course/llm/tuning
对三种方法及其各自适用场景最清晰的解释
2. Codecademy:提示工程 vs 微调(免费)
链接: https://www.codecademy.com/article/prompt-engineering-vs-fine-tuning
一个实用的决策框架,为每种方法提供清晰的使用场景
3. IBM:RAG vs 微调 vs 提示工程(免费)
链接: https://www.ibm.com/think/topics/rag-vs-fine-tuning-vs-prompt-engineering
涵盖完整的决策空间,包括何时将多种方法结合使用
记忆决策框架:首先使用提示工程(最便宜,最快) 如果模型需要访问特定数据,则添加RAG 仅在提示 + RAG无法实现所需的质量、一致性或延迟时才进行微调
2. 实践中的微调
当你确实需要微调时,步骤如下
资源:
1. OpenAI 微调指南(官方,免费)
链接: https://platform.openai.com/docs/guides/fine-tuning
开始微调最简单的方法。上传一个 JSONL 数据集,运行任务,即可获得自定义模型。即使之后转向开源模型,也很适合用来学习整个流程。
2. HuggingFace Transformers 微调教程(免费)
链接: https://huggingface.co/docs/transformers/training
用于处理开源模型的标准库。涵盖训练、评估和模型保存
3. Unsloth(开源,免费)
链接: https://github.com/unslothai/unsloth
以快2倍的速度进行微调,同时内存占用减少80%。开箱即用支持LoRA和QLoRA。在消费级硬件上微调开源模型的最快途径
4. LLaMA-Factory(开源,免费)
链接: https://github.com/hiyouga/LLaMA-Factory
用于微调100+ LLM的统一框架。包含一个无需代码的网页界面用于微调。支持LoRA、QLoRA、全量微调、RLHF和DPO
重点关注:准备训练数据集(JSONL 格式),理解 LoRA 和 QLoRA(参数高效微调),在 OpenAI 上或使用 HuggingFace 运行微调任务,将微调模型与基础模型进行评估比较,以及判断何时微调不值得成本
3. 开源模型
并非所有事情都需要通过 OpenAI 或 Anthropic。开源模型让你拥有完全控制、无需 API 成本,并且可以在本地运行
资源:
1. Ollama(免费)
链接:https://ollama.ai/
用一条命令即可在本地运行开源大语言模型。支持 Llama、Mistral、Gemma 以及数十种其他模型。体验开源模型的最快方式
2. HuggingFace 模型中心(免费)
链接: https://huggingface.co/models
最大的开源模型库。浏览、下载并部署适用于任何任务的模型
3. vLLM(开源,免费)
链接: https://github.com/vllm-project/vllm
高吞吐量的LLM推理引擎。比朴素的HuggingFace服务快2到4倍。开源模型生产环境服务的标准。
关注重点:使用Ollama本地运行模型进行测试,了解量化(GGUF、GPTQ、AWQ)以及它对部署的重要性,基准测试开源模型与API模型在您用例中的表现,以及使用vLLM在生产环境中提供模型服务。
4. 推理优化
使模型在生产环境中运行得更快、更便宜
资源:
1. HuggingFace:优化大语言模型推理(免费)
链接:https://huggingface.co/docs/transformers/llm_optims
涵盖KV缓存优化、量化以及批处理策略
2. NVIDIA TensorRT-LLM(免费)
链接:https://github.com/NVIDIA/TensorRT-LLM
在 NVIDIA GPU 上实现最大推理性能。被大多数大规模生产 LLM 服务使用
重点关注:批处理策略以提高吞吐量、量化以减少内存和成本、KV-cache 优化以加速生成,以及为推理工作负载选择合适的硬件
方向3:AI自动化工程师
如果你想立刻为企业构建解决方案,这是最佳选择
这个方向是用AI自动化真实的业务流程。较少关注构建产品,更侧重解决运营问题
重点关注:
工作流编排
业务流程自动化
多工具系统
CRM、文档、电子邮件、支持与运营用例
本月要学习的内容:
1. 工作流编排
真正的业务自动化几乎从来不是一次LLM调用,而是跨多个系统的一连串操作
资源:
1. n8n(开源,可以自托管,免费)
链接:https://docs.n8n.io/
通过AI节点的可视化工作流自动化。将LLM连接到400多个集成(Slack、Gmail、Notion、CRM等)。最好的无代码/低代码AI自动化选项
2. LangGraph:多代理工作流(免费)
链接: https://langchain-ai.github.io/langgraph/concepts/multi_agent/
面向复杂多智能体系统的代码优先编排。当 n8n 不够用且你需要完全的程序化控制时
3. Temporal(开源,免费)
链接: https://docs.temporal.io/
耐用的工作流引擎,用于长期运行、容错的流程。当您的自动化需要能够经受崩溃、重试和超时时
需要关注的内容:设计能够优雅处理失败的工作流,将人工智能连接到真实的商业工具(电子邮件、CRM、数据库、电子表格),构建人类审批步骤,并记录每个自动化操作以便审计
2. 商业流程自动化
AI自动化的价值在于解决特定的、昂贵的商业问题
资源:
1. Zapier AI Actions(免费套餐)
链接:https://zapier.com/ai
无需编写代码即可将 AI 连接到 6,000 多个应用。在构建自定义解决方案之前,非常适合用于自动化原型开发
2. Make(Integromat)(免费层)
链接:https://www.make.com/
具备高级逻辑和AI集成的可视化自动化平台。对于复杂工作流程,比Zapier更强大
重点关注:识别最高投资回报率(ROI)的自动化目标(通常是重复性强、耗时且基于规则的任务),构建增强人类而非取代人类的自动化,并衡量实际节省的时间和成本
3. CRM、文档、电子邮件、支持自动化
最常见且最有价值的AI自动化用例
资源:
1. OpenAI Cookbook:AI驱动的电子邮件处理(免费)
链接: https://github.com/openai/openai-cookbook
使用AI对电子邮件进行分类、路由和回复的模式
2. LangChain:文档处理流水线(免费)
链接: https://python.langchain.com/docs/how_to/#document-loaders
从80多个来源摄取和处理文档
关注重点:构建一个AI驱动的电子邮件分类器和自动回复系统,创建一个提取结构化数据的文档处理管道,构建一个使用RAG技术的支持聊天机器人,整合AI到现有的CRM工作流程中(如HubSpot、Salesforce等)
方向3的实践项目:构建一个端到端的潜在客户资格审核系统。它应当:
从一个来源(CSV、API或表单)抓取或导入潜在客户
使用大型语言模型(LLM)对每个潜在客户进行调研(公司信息、匹配度评估)
根据你的理想客户画像(ICP)对潜在客户进行评分和排序
撰写个性化的外联消息
将所有内容记录到电子表格或CRM中。这是一个真实且可销售的自动化方案,企业确实愿意为此付费
⏩------------------------------------------------------------------------⏪
总结
这6个月之后你可以期待什么???
我会对你说实话,如果没有一大笔资金
这份路线图不会让你在6个月内成为一名资深AI工程师
但它会让你成为一个能够构建、发布并部署真正解决实际问题的AI系统的人
而现在,这正是市场所愿意支付的能力
对AI工程师的需求并没有放缓。职位发布同比增长了25%
普华永道发现,要求具备人工智能技能的岗位比不要求这些技能的同类岗位工资高出 56%
只有 1% 的公司被认为是“人工智能成熟型”,这意味着还有 99% 的公司仍然需要帮助。美国劳工统计局预计,到 2034 年相关岗位将增长 26%
这些不是夸大的数字。这是基于分析的真实数据(取自 Claude kek)
如果你在美国全职工作:
初级AI工程师起薪为9万至13万美元
中级(3-5年经验)薪资约为15.5万至20万美元
高级职位薪资为19.5万至35万美元以上
根据Glassdoor(2026年3月)的数据,平均薪资为184,757美元
由于企业迫切需要能够在无需持续监督的情况下交付生产级AI的人才,中级薪资区间同比增长最快,达到9.2%
如果你更倾向于自由职业:
AI代理开发收费为每小时175至300美元
RAG实现收费为每小时150至250美元
LLM 集成 $125-$200/小时
一位在 Reddit 上的开发者在两周内为一家律师事务所开发了一个文档摘要工具,赚了 $8,000。一名按 $150/小时、每周计费 25 小时的自由职业者一年可收入 $195,000
如果你走咨询这条路(我在之前的帖子里提到过),你可以收费:
为企业搭建一个 AI 代理收费 $300-$5,000
每月500至2000美元用于AI内容管理
1000至4000美元用于自动化客户支持
500至2000美元用于冷启动外联搭建
服务范围其实更广,但一旦你掌握了这一路线图中的技能,你在2026年已经是炙手可热的专业人才
这些是真实的数据,来自真实的人在做真实的工作
现在,这是我真正希望你从这一切中得到的要点:
从每个月中选一个项目并把它做出来。不是去阅读它。不是去看教程。去做它、弄坏它、修复它、部署它,把它放到 GitHub 上。能被录用的工程师,是那些展示他们做过什么的人,而不是展示他们学过什么的人
开始分享你所学到的东西。在 X、LinkedIn 或任何地方写出来。教学是学习最快的方式,同时还能建立你的声誉。我见过的最好的机会,来自那些有可见度的人,而不是那些投了 500 份职位申请的人
而且,请不要等到你觉得自己准备好了才开始。你永远不会觉得完全准备好。“我在学习”和“我在做项目”之间的鸿沟,是大多数人永远卡住的地方
一旦你有了可以运行的项目,就开始申请、开始接自由职业、开始提供服务。即使它们还不完美。市场不会奖励完美,它奖励的是能够把东西做出来并交付的人
如果你真的投入努力,6个月足以改变一切
而且我真的相信,正在读这段话的你们每一个人都可以做到
永远不要停止创造,也不要停止学习
希望这对你们有用,家人们 ❤️
显示英文原文 / Show English Original
AI engineering has quickly become one of the most valuable skill sets in tech The problem is that most beginners have no clear idea what they should actually study Some start with machine learning theory Some get stuck endlessly watching tutorials Others jump straight into prompts and agents without understanding APIs, backend basics, or how real products are actually built The result is usually the same: a lot of confusion and very little practical skill If your goal is to become an AI engineer, you don’t need to master every field of artificial intelligence You need to learn how to build useful AI systems in the real world
That means learning how to: build end-to-end applications with LLMs work with model APIs such as OpenAI and Anthropic properly design prompts and context use structured outputs and tool calling add retrieval when needed deploy projects so people can actually use them This guide was created to give you a practical 6-month roadmap
The article is 10,000+ WORDS, so reading it may take a few hours or even longer But its real value is that for every skill you need to learn, there are resources and clear explanations of what to do That way, within six months you can reach the level of AI engineering, and start using it for yourself already within the first 1-2 months Writing this article took more than 40 HOURS, and I worked on it together with my friend @andy_ai0 He just started building his personal brand on X, but he understands AI very well and helped a lot with this article I definitely think he deserves your follow and support as he grows Now let's start reading the article ⬇️ What an AI Engineer actually does
A lot of people hear the phrase "AI engineer" and imagine someone training giant models from scratch In reality, most modern AI engineers do something much more practical They build products and systems on top of existing models That usually includes: connecting to LLM APIs designing prompts and context flows building chat, search, or automation systems integrating tools, databases, and external APIs
handling structured outputs improving reliability, cost, and latency deploying AI features into real applications So in practice, an AI engineer often sits somewhere between: software engineering product engineering automation applied AI
This is why the role is growing so fast Companies do not only need researchers They need people who can take models and turn them into useful products That is also why this roadmap focuses less on heavy theory and more on practical execution If you can build real LLM apps, retrieval systems, automations, and production-ready workflows, you are already much closer to being employable than most beginners ⏩------------------------------------------------------------------------⏪ Month 1: Get solid enough in coding and the fundamentals Your goal this month: Become a functional Python developer
You don't need to be an expert, you just need to stop Googling basic syntax and be able to build simple programs confidently AI engineering is first and foremost software engineering Everything in the later months assumes you can write clean Python, use the terminal, call APIs, and manage a codebase. This month is your foundation What to learn 1. Python Python is the language of AI engineering. Full stop. Almost every library, API, and tutorial you'll encounter over the next six months is in Python How to learn it: Start with a structured course that forces you to write code, not just watch videos
The most common mistake beginners make is consuming content passively, reading along, nodding, and never opening a code editor Fight this by coding every single example as you go Resources: 1. Python for Everybody (Coursera, free to audit) Link: https://www.coursera.org/specializations/python The best starting point for absolute beginners. Dr. Chuck is one of the most beginner-friendly Python teachers on the internet 2. freeCodeCamp Python Course (YouTube, free) Link: https://www.youtube.com/watch?v=rfscVS0vtbw
A comprehensive 4-hour video covering all the fundamentals 3. CS50P: Introduction to Programming with Python (Harvard, free) Link: https://cs50.harvard.edu/python/ More rigorous. Includes problem sets and a final project. Great if you want structure 4. Official Python docs (the tutorial) Link: https://docs.python.org/3/tutorial/ Dry but authoritative, use as a reference What to focus on:
Variables, data types, loops, conditionals, functions Lists, dictionaries, sets, tuples File I/O and working with JSON Classes and basic OOP (just enough to understand what you're reading) Error handling with try/except Virtual environments (venv) and pip Package management – understanding requirements.txt Practice project: Build a simple CLI tool in Python. Something like a personal expense tracker that reads/writes to a JSON file, or a script that calls a public API (like a weather API) and prints formatted results
2. Git and GitHub Git is how professional developers save and share code. You'll need it constantly, to version your projects, collaborate, and showcase your portfolio work on GitHub How to learn it: Git is confusing at first because the mental model is non-obvious Don't try to memorize commands instead, understand what problem Git is solving (tracking changes, enabling collaboration, letting you undo mistakes) and the commands will make sense Resources: 1. GitHub Skills (free, interactive)
Link: https://skills.github.com/ Official interactive courses built inside GitHub itself. Start here 2 . Learn Git Branching (free, interactive) Link: https://learngitbranching.js.org/ Hands-down the best visual tool for understanding branches and merges 3. Pro Git Book (free online book) Link: https://git-scm.com/book/en/v2 The comprehensive reference. Skip to chapters you need
What to focus on: git init, add, commit, push, pull Branching and merging Understanding .gitignore Creating repos on GitHub and pushing local projects Reading and writing basic README files Practice: From now on, every single project you build, even small scripts, should live in a GitHub repo. This builds the habit and gives you a portfolio 3. CLI / Terminal Basics
As an AI engineer you'll be running scripts, installing packages, managing servers, and navigating files entirely from the command line Being slow or scared in the terminal is a real bottleneck Resources: 1. The 50 most popular Linux & Terminal commands (full course for beginners) Link: https://www.youtube.com/watch?v=ZtqBQ68cfJc Good for absolute beginners on Linux/Mac 2. The Missing Semester of Your CS Education (MIT, free) Link: https://missing.csail.mit.edu/
Covers shell scripting, terminal tools, and the command line fluency that most CS courses skip What to focus on: Navigation: cd, ls, pwd, mkdir, rm Reading files: cat, less, grep Running Python scripts from the terminal Environment variables Basic understanding of PATH 4. JSON, APIs, HTTP, and Async Basics
You'll be calling LLM APIs from day one of Month 2 That means you need to understand how web APIs work before you ever touch OpenAI or Anthropic's SDKs Resources: 1. HTTP basics – MDN Web Docs (free) Link: https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview The clearest explanation of how HTTP requests and responses work 2. REST API Tutorial Link: https://restfulapi.net/
Short and practical 3. Python requests library docs Link: https://requests.readthedocs.io/en/latest/ Learn how to call any web API in Python 4. Python async/await (free) Link: https://realpython.com/async-io-python/ Understanding async is essential for working with streaming LLM responses later What to focus on:
GET, POST requests – what they are and how to make them in Python Reading and writing JSON HTTP status codes (200, 400, 401, 404, 500 – what each means) What an API key is and basic auth patterns What async def and await do and why they exist Practice project: Write a Python script that calls a free public API (try Open-Meteo for weather data – no API key needed) and formats the result as a clean JSON output 5. Basic SQL and Pandas You won't need to be a data scientist, but you will regularly need to inspect, query, and manipulate data
SQL basics and pandas fluency will save you constantly Resources: 1 . SQLBolt (free, interactive) Link: https://sqlbolt.com/ The fastest way to learn SQL from scratch. 20 short lessons with in-browser exercises 2. Pandas official getting started guide Link: https://pandas.pydata.org/docs/getting_started/index.html Work through the 10 Minutes to Pandas tutorial
3. Kaggle Pandas course (free) Link: https://www.kaggle.com/learn/pandas Hands-on, practical, short What to focus on: SQL: SELECT, WHERE, GROUP BY, JOIN, ORDER BY Pandas: loading CSVs, filtering rows, selecting columns, basic aggregations 6. FastAPI Resources:
1. FastAPI Official Tutorial (free) Link: https://fastapi.tiangolo.com/tutorial/ Genuinely one of the best framework docs ever written Work through it start to finish. Covers path parameters, request bodies, Pydantic validation, and running a dev server 2. Python API Development (19-Hour Course, freeCodeCamp, YouTube, free) Link: https://www.youtube.com/watch?v=ZtqBQ68cfJc Covers API design fundamentals including routes, serialization, schema validation, and SQL database integration. Builds a full social-media-style API from scratch What to focus on: Creating GET and POST endpoints, path and query parameters, request bodies with Pydantic, running uvicorn, and using FastAPI's built-in /docs interface to test your API without writing a client
Month 1 Milestone By the end of this month you should be able to: Write Python programs that read/write files, call APIs, and handle errors Version your code with Git and push projects to GitHub Navigate the terminal without hesitation Understand what an HTTP request is and make one in Python Query a SQLite database with basic SQL Build and run a simple FastAPI app locally
⏩------------------------------------------------------------------------⏪ Month 2: Master LLM App Development Your goal this month: Build real AI-powered applications using the OpenAI and Anthropic APIs By the end you should be comfortable writing prompts that work reliably, getting structured data out of models, making them call your functions, and handling everything that can go wrong This is the core of AI engineering. Everything else in the roadmap builds on what you learn here What to learn 1. Prompting Fundamentals Prompting isn't just asking questions nicely. It's the craft of writing instructions that produce consistent, reliable outputs from models that are fundamentally probabilistic
As an AI engineer you'll spend a surprising amount of time here How to learn it: Start with Anthropic's interactive tutorial because it's the most hands-on Then read OpenAI's official guide. After that, the Prompt Engineering Guide consolidates everything Work through all three in order – each one reinforces the others Resources: 1. Anthropic's Interactive Prompt Engineering Tutorial (free, GitHub) Link: https://github.com/anthropics/prompt-eng-interactive-tutorial
A step-by-step course broken into 9 chapters with exercises, designed to give you many chances to practice writing and troubleshooting prompts yourself Run it as Jupyter notebooks with the Claude API 2. Anthropic Prompt Engineering Docs (free) Link: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview The official reference. Covers everything from basic clarity to XML structuring and agentic systems 3. OpenAI Prompt Engineering Guide (free) Link: https://platform.openai.com/docs/guides/prompt-engineering The official guide from OpenAI, covering prompt formats that work well with their models and lead to more useful outputs
4. PromptingGuide.ai (free) Link: https://www.promptingguide.ai/ Covers essential techniques from basic prompting to advanced strategies, plus function calling, tool integration, and agentic systems What to focus on: The difference between system and user messages, why specificity matters, chain-of-thought prompting (think step by step), using examples in prompts (few-shot), and how small wording changes can dramatically shift output quality Practice: Take a real task – summarize a document, extract key info from text, classify a piece of feedback – and write 5 different prompts for it. Compare outputs. You'll immediately see how much prompt design affects reliability 3. Structured Outputs / JSON Schemas In real applications you almost never want raw text from an LLM, you want structured data you can parse, store, and use in your code Structured outputs solve this by forcing the model to match a schema you define
Resources: 1. OpenAI Structured Outputs Guide (official docs, free) Link: https://platform.openai.com/docs/guides/structured-outputs Covers the feature that ensures models always generate responses adhering to your JSON Schema, so you don't need to worry about missing keys or hallucinated values 2. Instructor library (free, open source) Link: https://python.useinstructor.com/ The cleanest way to get structured outputs from any LLM provider using Pydantic models Works with OpenAI, Anthropic, Google, and 15+ other providers using the same code interface, with automatic retries when validation fails
This is what most production AI engineers actually use 3. OpenAI Cookbook: Structured Outputs Introduction (free) Link: https://developers.openai.com/cookbook/examples/structured_outputs_intro/ Practical examples covering chain-of-thought outputs, structured data extraction, and UI generation, good for understanding real-world use cases What to focus on: Defining Pydantic models for your data, passing schemas to the API, understanding the difference between structured outputs and JSON mode, and handling refusals gracefully Practice project: Build an invoice or receipt parser. Give it raw text (e.g. "Invoice #123, $45.99 for 3 widgets, due March 30") and have it return a structured Python object with fields like invoice_number, amount, items, due_date 4. Function / Tool Calling Tool calling is what transforms an LLM from a text generator into something that can take actions – search the web, query a database, call your API, run code. It's one of the most important skills in this entire guide
How to understand it: The model doesn't actually execute your functions It examines the prompt and returns a structured call with the function name and arguments when it decides a tool should be used Your code then executes the call and sends the result back Resources: 1. OpenAI Function Calling Guide (official docs, free) Link: https://platform.openai.com/docs/guides/function-calling The definitive reference. Covers defining tools, the 5-step calling flow, parallel calls, and best practices 2. Anthropic Tool Use Docs (free)
Link: https://docs.anthropic.com/en/docs/build-with-claude/tool-use Anthropic's equivalent guide for Claude. The concepts are the same, the syntax is slightly different 3. OpenAI Cookbook: How to Call Functions with Chat Models (free, GitHub) Link: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_call_functions_with_chat_models.ipynb A complete runnable notebook walking through the full tool-calling loop with real examples What to focus on: Describing functions clearly in JSON Schema, parsing tool call responses, executing the function and feeding results back, handling cases where no tool call is needed, and the concept of tool_choice: "auto" Practice project: Build a simple assistant that has three tools: get_weather(city), calculate(expression), and search_notes(query) (just search a hardcoded dict). Wire them all up and watch the model decide which one to call based on what you ask it 5. Streaming Responses
Streaming means showing the model's output as it's being generated – word by word – rather than waiting for the full response. It makes your apps feel dramatically faster and more alive Resources: 1. OpenAI Streaming Docs (official, free) Link: https://platform.openai.com/docs/api-reference/streaming The reference for adding stream=True to requests and iterating over chunks 2. Anthropic Streaming Docs (official, free) Link: https://docs.anthropic.com/en/api/messages-streaming Anthropic's streaming API reference with Python examples
3. How Streaming LLM APIs Work – Simon Willison (free) Link: https://til.simonwillison.net/llms/streaming-llm-apis A clear technical breakdown of how Server-Sent Events work under the hood for OpenAI, Anthropic, and Google, useful for understanding what's actually happening at the HTTP level What to focus on: Setting stream=True, iterating over delta chunks, assembling the full response from parts, and wiring streaming into a FastAPI endpoint using StreamingResponse Tip: Streaming is almost always the right choice for user-facing apps. Nobody wants to stare at a loading spinner for 10 seconds waiting for a full response to appear at once 5. Conversation State LLMs are stateless – they have no memory between calls. Conversation history is something you manage by sending the full message list with every request. Understanding this is fundamental Resources:
1. OpenAI Chat Completions Guide, Managing Conversations (official, free) Link: https://platform.openai.com/docs/guides/conversation-state The canonical explanation of how the messages array works and how to manage multi-turn conversations 2. Anthropic Messages API Docs (official, free) Link: https://docs.anthropic.com/en/api/messages Anthropic's equivalent. Same concept, worth reading both to see how they differ What to focus on: The messages array structure, why you append both user and assistant messages, context window limits and what happens when you exceed them, and basic truncation strategies (drop oldest messages, summarize history) Practice project: Build a simple multi-turn chatbot in the terminal. Each turn appends to the messages list. Add a /reset command to clear history, and print the current token count after each exchange
6. Cost, Latency, and Token Basics Shipping AI apps without understanding costs and tokens is how you end up with surprise bills and slow apps. This is boring but critical Resources: 1. OpenAI Pricing Page (official) Link: https://openai.com/api/pricing Know what input and output tokens cost per model. Bookmark this and check it whenever you pick a model 2. Anthropic Pricing Page (official) Link: https://www.anthropic.com/pricing
Same for Claude models 3. OpenAI Tokenizer Tool (free, interactive) Link: https://platform.openai.com/tokenizer Paste any text and see exactly how many tokens it is. Use this constantly while you're learning 4. Tiktoken (Python library, free) Link: https://github.com/openai/tiktoken OpenAI's tokenizer library for counting tokens in code before sending requests What to focus on: What a token is (roughly 4 characters / 3/4 of a word), how input vs output tokens are priced differently, how context window size affects what you can do, and the latency trade-off between smaller faster models and larger smarter ones
Also: don't use GPT-4/Opus for everything – cheaper models are often good enough for simple tasks 7. Failure Handling LLM APIs fail. Rate limits get hit, responses time out, the model returns malformed JSON. Handling failures gracefully is what separates a demo from a production app Resources: 1. OpenAI Error Codes Reference (official, free) Link: https://platform.openai.com/docs/guides/error-codes Every error type you'll encounter and what to do about it 2. Anthropic Error Handling Docs (official, free)
Link: https://docs.anthropic.com/en/api/errors Same for Claude 3. Tenacity (Python library, free) Link: https://tenacity.readthedocs.io/ A clean library for adding retry logic with exponential backoff to any Python function. One decorator and your retries are handled What to focus on: Rate limit errors (429) and exponential backoff, timeout handling with httpx/requests, validating model output before using it, fallback strategies (retry with a different model, return a cached response), and never crashing your app because the LLM returned unexpected output 8. Prompt Injection Awareness Prompt injection is the #1 security risk in LLM applications
It happens when untrusted user input is combined with system instructions, allowing a user to alter, override, or inject new behavior into the prompt –causing the system to perform unintended actions or generate manipulated outputs You don't need to be a security expert, but you need to know this exists before you ship anything Resources: 1. OWASP Top 10 for LLM Apps – LLM01: Prompt Injection (free) Link: https://genai.owasp.org/llmrisk/llm01-prompt-injection/ The authoritative classification covering direct injections (jailbreaking), indirect injections via external content like documents or websites, and real-world attack scenarios 2. OWASP Prompt Injection Prevention Cheat Sheet (free) Link: https://cheatsheetseries.owasp.org/cheatsheets/LLM_Prompt_Injection_Prevention_Cheat_Sheet.html
Practical defensive patterns: input validation, privilege control, and output validation 3. Evidently AI: What is Prompt Injection (free) Link: https://www.evidentlyai.com/llm-guide/prompt-injection-llm A clear developer-focused explainer on attack types, risks, and design patterns to mitigate them What to focus on: The difference between direct and indirect injection, why system prompts aren't truly "secure", the principle of least privilege for tool access, and never trusting unvalidated LLM output to make consequential decisions automatically Month 2 Milestone By the end of this month you should be able to: Write prompts that produce consistent, reliable outputs for a given task
Get structured JSON data out of any model using Pydantic + Instructor Wire up tool calling so a model can call your Python functions Stream responses in real time through a FastAPI endpoint Manage multi-turn conversation history properly Estimate the token cost of a request before sending it Handle API errors, timeouts, and bad outputs without crashing Explain what prompt injection is and apply basic defenses ⏩------------------------------------------------------------------------⏪
Month 3: Learn RAG Properly Your goal this month: Build systems that let LLMs answer questions from your documents, not just from their training data By the end you should be able to ingest documents, embed and store them, retrieve the right chunks at query time, and produce answers that are grounded, accurate, and citable RAG is the most in-demand practical skill in AI engineering right now. Almost every real enterprise AI use case – customer support bots, internal knowledge bases, document Q&A – is built on it Understanding it deeply, not just copying a tutorial, is what separates good engineers from great ones 1. Embeddings Before you can build a RAG system, you need to understand what embeddings actually are – because they're the foundation everything else is built on A text embedding is a piece of text projected into a high-dimensional vector space
The position of that text in this space is represented as a long sequence of numbers Critically, text that is semantically similar ends up close together in that space – which is what makes similarity search possible Resources: 1 . Stack Overflow Blog: An Intuitive Introduction to Text Embeddings (free) Link: https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-embeddings/ The best beginner explanation. Written by a developer who has spent years building NLP products, with a focus on building the right intuition rather than the math 2. Google ML Crash Course: Embeddings (free) Link: https://developers.google.com/machine-learning/crash-course/embeddings
Covers why dense vector representations solve problems that one-hot encoding can't – specifically, capturing semantic relationships between items 3. HuggingFace: Getting Started With Embeddings (free) Link: https://huggingface.co/blog/getting-started-with-embeddings Hands-on guide. Shows how to generate embeddings using the sentence-transformers library, host them, and use them for semantic search over a real FAQ dataset 4. OpenAI Embeddings Guide (official docs, free) Link: https://platform.openai.com/docs/guides/embeddings The reference for using OpenAI's text-embedding-3-small and text-embedding-3-large models in code What to focus on: What a vector is conceptually, why similar text produces similar vectors, how cosine similarity works, the difference between embedding models (OpenAI, HuggingFace sentence-transformers), and what embedding dimension means in practice
Practice: Take 20 sentences on related topics, embed them using OpenAI or sentence-transformers, and write a simple nearest-neighbor search that returns the 3 most similar to a query. This is literally the heart of RAG in miniature 2. Chunking Your documents are too large to embed as a whole. Chunking is the process of breaking them into smaller pieces before embedding How you chunk your documents directly affects your system's ability to find relevant information and give accurate answers, even a perfect retrieval system fails if it searches over poorly prepared data Resources: 1. Weaviate: Chunking Strategies for RAG (free) Link: https://weaviate.io/blog/chunking-strategies-for-rag The most practical guide. Covers fixed-size, recursive, and semantic chunking, with clear guidance on when to use each
2. Unstructured: Chunking for RAG Best Practices (free) Link: https://unstructured.io/blog/chunking-for-rag-best-practices A technical deep-dive on chunk sizes, overlap, and how the embedding model's context window imposes hard limits A good starting point for experimentation is a chunk size of around 250 tokens (approximately 1,000 characters), combined with a 10-20% overlap between consecutive chunks to avoid losing context at boundaries 3. LangChain Text Splitters Docs (official, free) Link: https://python.langchain.com/docs/concepts/text_splitters/ The practical reference for using RecursiveCharacterTextSplitter, MarkdownTextSplitter, and semantic splitters in code What to focus on: Fixed-size chunking with overlap as your baseline, recursive chunking for structured documents, semantic chunking for better boundary detection, and the core trade-off: chunks that are too large lose retrieval precision; chunks that are too small lose context
Beginner tip: Start with RecursiveCharacterTextSplitter from LangChain with chunk_size=500 and chunk_overlap=50. This is the most sensible default for most documents and gives you a working baseline to improve from 3. Vector Databases Once you have embeddings, you need somewhere to store and search them efficiently. This is what vector databases are for The right choice depends on your situation: use Chroma for fast local prototyping, Pinecone for managed turnkey scale, Weaviate for open-source flexibility with strong hybrid search, Qdrant for complex filters and cost-efficient self-hosting, and pgvector if you're already on PostgreSQL and want to avoid adding another system Resources: 1. Chroma Official Docs (free) Link: https://docs.trychroma.com/ Chroma is perfect for individual developers and small teams who prioritize development speed and simplicity, it runs in-memory or locally with no infrastructure to manage
2. Pinecone Learning Center (free) Link: https://www.pinecone.io/learn/ Excellent free tutorials covering vector search concepts, hybrid search, and RAG pipelines. Good provider-agnostic material even if you don't use Pinecone 3. Qdrant Documentation (free) Link: https://qdrant.tech/documentation/ Best open-source option for production with advanced filtering. Very fast, flexible, and free to self-host 4. pgvector (open source, free) Link: https://github.com/pgvector/pgvector
If you're building something that already uses PostgreSQL, pgvector adds vector search directly to your existing database with no new infrastructure What to focus on: Creating a collection, inserting embeddings with metadata, querying by similarity with top_k, and filtering by metadata at query time You don't need to understand the indexing algorithms (HNSW, IVF) – just understand how to use them Practice project: Index 50-100 pages from any public documentation (e.g. the Python docs, or a Wikipedia article dump) into Chroma with metadata (source URL, section title). Write a query function that retrieves the 5 most relevant chunks for any question 4. Metadata Filtering Raw similarity search alone isn't enough for real applications. Metadata filtering lets you constrain retrieval to a relevant subset – by date, source, document type, user, category, or any other attribute you store alongside each chunk Resources: 1. Pinecone: Metadata Filtering Guide (free)
Link: https://docs.pinecone.io/guides/data/filter-with-metadata Clear explanation with code examples of filtering vectors by metadata fields before or during similarity search 2. LlamaIndex: Metadata Filters Guide (official docs, free) Link: https://docs.llamaindex.ai/en/stable/module_guides/querying/node_postprocessors/node_postprocessors/ Explained how to apply filters at query time in LlamaIndex pipelines What to focus on: Tagging every chunk with relevant metadata at ingestion time (source filename, page number, section, date, category), and using those fields to filter results at query time. This is what makes the difference between a toy demo and a production system where users can ask "only show me results from Q4 2025-Q1 2026 reports" 5. Reranking Reranking is a technique that adds a semantic boost to the search quality of any keyword or vector search system
After first-stage retrieval returns a candidate set, a reranker re-scores those results based on true contextual relevance to the query – not just vector proximity The two-stage pattern is: embed and search (fast, approximate) → rerank top-k (slower, more accurate). The result is dramatically better retrieval quality with only a modest latency cost Resources: 1. Cohere Reranking Docs (official, free) Link: https://docs.cohere.com/docs/reranking-with-cohere The best place to start. Covers the full reranking workflow, including semi-structured data like emails and JSON documents. Requires just a single line of code to add to an existing retrieval pipeline 2. LangChain: Cohere Reranker Integration (official docs, free) Link: https://python.langchain.com/docs/integrations/retrievers/cohere-reranker/
Explained how to wire Cohere reranking into a LangChain retriever using ContextualCompressionRetriever What to focus on: The two-stage retrieve-then-rerank pattern, the difference between a bi-encoder (used for first-stage embedding search) and a cross-encoder (used for reranking), and the practical latency/quality trade-off of reranking top-20 vs top-5 results 6. Retrieval Quality Issues Most RAG failures aren't model failures, they're retrieval failures. Understanding the ways retrieval can go wrong is essential for debugging real systems Common issues to learn: Semantic drift: The query embedding doesn't match the relevant chunk embedding even though the information is there. Fix: try query rewriting or HyDE (Hypothetical Document Embeddings) Chunk boundary problems: The relevant information is split across two chunks. Fix: increase overlap or use semantic chunking Missing metadata context: Chunks are semantically similar to the query but belong to the wrong document, date, or user. Fix: use metadata filtering
Top-k too small: The right chunk exists but isn't in the top 5 retrieved results. Fix: increase top_k at retrieval and reduce after reranking Resources: 1. LangChain: Query Transformations (free) Link: https://python.langchain.com/docs/how_to/#query-analysis Covers query rewriting, step-back prompting, and HyDE 2. Pinecone: Improving Retrieval Quality (free) Link: https://www.pinecone.io/learn/retrieval-augmented-generation/#retrieval-quality Practical walkthrough of common failure modes with fixes
7. Hallucination Reduction RAG dramatically reduces hallucinations compared to a vanilla LLM, but it doesn't eliminate them By supplying the model with retrieved facts at runtime, RAG anchors its responses to real sources rather than relying on training data alone, and the model's output can even cite those sources, increasing transparency and trust But retrieval failures, bad chunks, and conflicting information can still cause the model to make things up Resources: 1. Zep: Reducing LLM Hallucinations – A Developer's Guide (free) Link: https://www.getzep.com/ai-agents/reducing-llm-hallucinations/ Practical developer-focused guide covering prompt grounding strategies, chain-of-thought for factual tasks, and output verification patterns
2. Voiceflow: 5 Ways to Reduce LLM Hallucinations (free) Link: https://www.voiceflow.com/blog/prevent-llm-hallucinations Good overview of the combined strategy: RAG + chain-of-thought + guardrails together outperform any single approach What to focus on: Prompting the model to answer only from provided context (and say "I don't know" when the answer isn't there), adding a confidence threshold before surfacing responses, and always validating retrieval quality before blaming the LLM 8. Citations and Grounding A grounded RAG system doesn't just answer – it tells you where the answer came from. This is critical for user trust and for debugging Resources: 1. Anthropic: Giving Claude Sources (docs, free)
Link: https://docs.anthropic.com/en/docs/build-with-claude/citations Explained how to prompt Claude to produce cited responses with source references 2. LangChain: RAG with Sources (free) Link: https://python.langchain.com/docs/how_to/qa_sources/ Explained how to return source documents alongside answers in a LangChain RAG pipeline What to focus on: Passing chunk metadata (source filename, page number, URL) into your prompt context, instructing the model to reference sources in its answer, and surfacing those sources in your UI or API response 9. Your RAG Framework: LangChain or LlamaIndex You don't need to build a RAG pipeline from scratch. Two frameworks dominate the space and are worth knowing:
LlamaIndex is optimized for putting search and indexing first it abstracts ingestion, chunking, embedding, and querying into a few lines of code, letting you build a working prototype in an afternoon LangChain shines when your application looks more like an orchestration engine – it excels with multi-agent workflows, tool calling, and conditional chains that query multiple LLMs or external APIs before generating an answer For Month 3, start with LlamaIndex for RAG. Move to LangChain when you hit Month 4's agents work Resources: 1. LlamaIndex: Introduction to RAG (official docs, free) Link: https://developers.llamaindex.ai/python/framework/understanding/rag/ Covers the five key stages of RAG: loading, indexing, storing, querying, and evaluating – and how LlamaIndex handles each one 2. LlamaIndex Starter Tutorial (official docs, free)
Link: https://developers.llamaindex.ai/python/framework/getting_started/starter_example/ The official quickstart. Build a working RAG system in under 30 lines 3. LangChain: Build a RAG Agent (official docs, free) Link: https://docs.langchain.com/oss/python/langchain/rag Shows how to build a Q&A app over unstructured text using a RAG agent, from a 40-line minimal version up to a full retrieval pipeline with reranking Practice project: Build a "chat with your docs" app. Ingest 10–20 PDF or text files (your own notes, a textbook chapter, product documentation – anything). Build a FastAPI endpoint that accepts a question, retrieves the top 5 most relevant chunks with reranking, and returns a cited answer from Claude or OpenAI. This is a real portfolio piece Month 3 Milestone By the end of this month you should be able to:
Explain what an embedding is and why similar text produces similar vectors Chunk any document intelligently using appropriate strategies Store and query embeddings in a vector database with metadata filtering Add a reranking step to improve retrieval quality Debug common retrieval failures systematically Build a complete end-to-end RAG pipeline using LlamaIndex or LangChain that ingests documents, retrieves relevant chunks, and returns grounded, cited answers ⏩------------------------------------------------------------------------⏪ Month 4: Agents, Tools, Workflows, and Evals
Your goal this month: Build AI systems that can take sequences of actions autonomously, wire together multi-step workflows, and critically evaluate whether they're working By the end you should be able to build a real agent from scratch, understand when agents are the wrong choice, and measure the performance of anything you build This is where AI engineering gets genuinely complex. The skills from Month 4 are what separate junior AI engineers from people who can own an entire AI feature end to end 1. Agent Loops An agent is not magic, it's a surprisingly simple pattern Think of agents as goal-driven systems that constantly cycle through observing, reasoning, and acting This loop allows them to tackle tasks that go beyond simple questions and answers, moving into real automation, tool usage, and adapting on the fly The "thinking" happens in the prompt, the "branching" is when the agent chooses between available tools, and the "doing" happens when we call external functions. Everything else is just plumbing
Once you internalize this, even the most complex agent frameworks become readable Resources: 1. Anthropic: Building Effective Agents (official, free) Link: https://www.anthropic.com/research/building-effective-agents The single best piece of writing on agents in production. Read this before writing a single line of agent code 2. OpenAI: A Practical Guide to Building Agents (official PDF, free) Link: https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf OpenAI's complementary guide covering agent patterns, guardrails, and safety patterns in production
3. freeCodeCamp: The Open Source LLM Agent Handbook (free) Link: https://www.freecodecamp.org/news/the-open-source-llm-agent-handbook/ A comprehensive practical guide covering the agent loop, LangGraph, CrewAI, planning, memory, and tool use. Good for getting hands-on quickly 4. LangChain Academy: Introduction to LangGraph (free course) Link: https://academy.langchain.com/courses/intro-to-langgraph The official free course for LangGraph, the most widely used agent orchestration framework. Covers state, memory, human-in-the-loop, and more What to focus on: The perceive → plan → act → observe cycle, how the agent loop terminates, what happens when a tool call fails inside a loop, and why agents are just while loops with an LLM making the branching decisions Practice: Build an agent from scratch without any framework – just the OpenAI or Anthropic API directly. Give it 3 tools, a goal, and a loop. This is the most valuable thing you can do to actually understand what frameworks are abstracting
2. Tool Selection Writing good tools is half the job. The descriptions for your tools and their parameters are the user manual for the LLM. If the manual is vague, the LLM will misuse the tool. Be painfully, relentlessly explicit A poorly described tool will be called wrong, called at the wrong time, or ignored entirely. A well-described tool behaves predictably and gets selected correctly across a wide range of inputs Resources: 1. OpenAI: Function Calling Best Practices (official docs, free) Link: https://platform.openai.com/docs/guides/function-calling/best-practices The canonical guide to writing tool descriptions that work reliably, with naming conventions and parameter documentation patterns 2. Anthropic: Tool Use Best Practices (official docs, free)
Link: https://docs.anthropic.com/en/docs/build-with-claude/tool-use/implement-tool-use#best-practices-for-tool-definitions Anthropic's equivalent. Pay particular attention to the guidance on when to let the model choose vs forcing a specific tool What to focus on: Writing tool names that are self-explanatory verbs, writing descriptions that explain when to call the tool (not just what it does), keeping parameters minimal and well-typed, and designing tools with the LLM as the caller Beginner tip: Test every tool description by asking yourself: "If I had no documentation and only this JSON schema, would I know exactly when and how to call this?" If not, it needs more work 3. State Management In LangGraph, state is a shared memory object that flows through the graph. It stores all the relevant information – messages, variables, intermediate results, and decision history – and is managed automatically throughout execution Understanding state is the key to building agents that can handle multi-turn tasks, recover from failures, and hand off between components cleanly Resources:
1. LangGraph Official Docs: State Management (free) Link: https://langchain-ai.github.io/langgraph/concepts/low_level/#state The definitive reference. Covers state schemas, reducers, and how state flows through nodes and edges 2. DataCamp: LangGraph Agents Tutorial (free) Link: https://www.datacamp.com/tutorial/langgraph-agents Covers the fundamentals of state, nodes, and edges with hands-on code, building up to stateful agents with persistent memory across sessions 3. Real Python: LangGraph in Python (free) Link: https://realpython.com/langgraph-python/
A thorough tutorial building a complete stateful LangGraph agent, with detailed explanations of the state graph and conditional edges What to focus on: Defining state schemas with TypedDict, how reducers work for merging parallel updates, the difference between in-memory state and persisted checkpointing, and how human-in-the-loop pauses work by inspecting and modifying state mid-execution 4. Retries and Failure Handling in Agents Agents fail differently to regular LLM calls. A bad tool call mid-loop can corrupt state, cause infinite loops, or silently produce wrong answers. You need explicit strategies for all of these Resources: 1. LangGraph: Error Handling and Retries (official docs, free) Link: https://langchain-ai.github.io/langgraph/how-tos/autofill-tool-errors/ Explained how to add automatic error handling and retry logic at the tool node level in LangGraph
2. OpenAI Practical Agents Guide: Guardrails section (free) Link: https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf Covers guardrails as a layered defense, combining LLM-based checks, rules-based filters like regex, and moderation APIs to vet both inputs and outputs at every stage of the agent loop What to focus on: Maximum iteration limits to prevent infinite loops, per-tool retry with exponential backoff, catching and logging exceptions at the tool execution layer without crashing the agent, and when to surface a failure to the user vs retry silently 5. When NOT to Use Agents This is one of the most important and most overlooked skills in AI engineering. Agents are exciting but they're also slow, expensive, unpredictable, and hard to debug. Knowing when to reach for something simpler is a sign of good judgment Anthropic recommends finding the simplest solution possible and only increasing complexity when needed – this might mean not building agentic systems at all Agentic systems trade latency and cost for better task performance, and you should carefully consider when this tradeoff makes sense
The decision framework is: Use a single LLM call if the task can be solved in one prompt with the right context Use a workflow if the steps are fixed and predictable Use an agent only if the number of steps is genuinely unpredictable and requires dynamic decision-making Resources: 1. Anthropic: Building effective agents, when to use agents (official, free) Link: https://www.anthropic.com/research/building-effective-agents The most authoritative answer to this question, straight from the team that builds the models
2. Simon Willison: Designing Agentic Loops (free) Link: https://simonwillison.net/2025/Sep/30/designing-agentic-loops/ A senior engineer's practical take on when agent complexity is justified and how to think about agentic loop design What to memorize: A chain of 3 fixed LLM calls will always be faster, cheaper, and more debuggable than an agent that could make 3 calls. Reserve agents for genuinely open-ended tasks 6. Multi-Step Workflows Between "single prompt" and "full agent" there is a vast productive middle ground: workflows. Workflows are ideal when the task can be cleanly decomposed into fixed subtasks – trading off latency for higher accuracy by making each individual LLM call an easier, more focused task Common patterns include prompt chaining (output of one call is input to the next), routing (classify input and send to specialized handlers), parallelization (run multiple calls simultaneously and aggregate), and orchestrator-subagent (one LLM plans, others execute) Resources:
1. Anthropic: Workflow Patterns (official, free) Link: https://www.anthropic.com/research/building-effective-agents#workflow-patterns Covers all the main patterns with diagrams and code examples. The parallelization and orchestration sections are particularly useful 2. LangGraph: Multi-Agent Networks (official docs, free) Link: https://langchain-ai.github.io/langgraph/concepts/multi_agent/ Explained how to wire multiple agents together as a network, with supervisor and handoff patterns Practice project: Build a 3-step content pipeline: Step 1 – an LLM extracts key facts from an article
Step 2 – another LLM call uses those facts to generate a tweet, a LinkedIn post, and a summary in parallel Step 3 – a final LLM call scores all three for quality and picks the best No agent required, pure workflow 7. Evaluation Harnesses Evals are how you know if your AI system is actually working — not just on the examples you tested by hand, but systematically across hundreds of inputs AI agents are powerful but complex to deploy because their probabilistic, multi-step behavior introduces many points of failure Different parts of an agent – the LLMs, tools, retrievers, and workflows – each need their own evaluation approach Resources:
1. DeepEval (open source, free) Link: https://deepeval.com/docs/getting-started An open-source LLM evaluation framework inspired by pytest. Write test cases with inputs and expected outputs, run them with 50+ built-in metrics including hallucination, answer relevancy, and factual consistency, and catch regressions between versions 2. Promptfoo (open source, free) Link: https://github.com/promptfoo/promptfoo A CLI and library for testing and evaluating LLM apps with automated test suites. Supports side-by-side comparison of multiple prompts across multiple models, CI/CD integration, and red teaming for security vulnerabilities 3. LangSmith (free tier) Link: https://smith.langchain.com/
Tracing, debugging, and evaluation for LangChain and LangGraph apps. The free tier is generous and the tracing UI makes debugging agent loops dramatically easier 4. Ragas (open source, free) Link: https://docs.ragas.io/ Specialized evaluation framework for RAG pipelines. Measures faithfulness, answer relevancy, context precision, and context recall. Essential if you're evaluating RAG systems from Month 3 What to focus on: Building a golden test set of 20-50 representative inputs with expected outputs or rubrics, writing eval functions that score outputs deterministically (string match, JSON schema validation) or with LLM-as-judge, and running evals automatically when you change a prompt or swap a model Critical mindset: Evals are not optional polish. Every prompt change, model swap, or retrieval tweak you make without running evals is a gamble. The engineers who ship reliable AI products run evals constantly 8. Task Success Metrics Beyond automated evals, you need metrics that tell you whether your agent is accomplishing its actual goal
Resources: 1. Hamel Husain: Your AI Product Needs Evals (free) Link: https://hamel.dev/blog/posts/evals/ One of the most practical pieces written on building eval pipelines for real production AI systems, by someone who has done it at scale 2. OpenAI Evals Framework (open source, free) Link: https://github.com/openai/evals OpenAI's own evaluation framework, with a large library of community-contributed eval patterns you can adapt What to focus on: The difference between process metrics (did the agent call the right tool?) and outcome metrics (did the task succeed?), defining clear success criteria before building anything, and using LLM-as-judge for evaluation of outputs that resist exact matching (like long-form answers or multi-step reasoning traces)
Practice project: Take your RAG pipeline from Month 3 and build a proper eval harness around it. Create 30 question-answer pairs from your documents, run them through your pipeline, and score each answer for relevance, faithfulness, and completeness using DeepEval. Then change one thing (chunk size, model, top-k) and re-run to see if it improved Month 4 Milestone By the end of this month you should be able to: Explain what an agent loop is and implement one from scratch without a framework Write tool descriptions that get selected correctly and reliably Manage agent state properly using LangGraph or equivalent Handle failures inside agent loops without crashing Decide confidently whether a task needs an agent, a workflow, or a single prompt
Build multi-step workflows that chain, route, and parallelize LLM calls Write automated evals that catch regressions when you change prompts or models Define and measure task success metrics for any AI system you build ⏩------------------------------------------------------------------------⏪ Month 5: Deployment, Product Thinking, and Reliability Your goal this month: Take everything you've built and make it production-ready By the end you should be able to deploy an AI app that handles real users, real traffic, and real failures without falling apart at 2am This is where most AI engineers stall. They can build a great demo but can't ship a product that survives contact with the real world
The skills here are what companies actually pay for: reliability, security, cost control, and the ability to keep things running when something inevitably breaks 1. FastAPI Production Patterns You already know how to build a FastAPI app from Month 1. Now you need to make it survive production traffic The difference between dev and prod is brutal. A single uvicorn process with --reload is fine for building. In production it becomes the bottleneck the moment real traffic arrives What you actually need: multi-worker ASGI configuration, proper error handling middleware, health check endpoints, and CORS policies Resources: 1. FastAPI Deployment Docs (official, free) Link: https://fastapi.tiangolo.com/deployment/
The official guide covering Uvicorn workers, Gunicorn, and Docker deployment. Start here before anything else 2. FastAPI Production Deployment Guide (CYS Docs, free) Link: https://craftyourstartup.com/cys-docs/fastapi-production-deployment/ Comprehensive production patterns: Gunicorn config, Nginx reverse proxy, health checks, rate limiting. Includes real config files you can adapt 3. FastAPI Best Practices for Production (FastLaunchAPI, free) Link: https://fastlaunchapi.dev/blog/fastapi-best-practices-production-2026 Covers async database pooling, Redis caching, JWT auth, and background tasks. Production-tested patterns from a real template used by 100+ developers What to focus on: Running Gunicorn with Uvicorn workers (not bare Uvicorn), setting up health check endpoints, adding CORS middleware, implementing proper async database sessions, and using background tasks for anything that doesn't need to block the response
2. Docker Docker is how you stop saying "it works on my machine" and start shipping consistent deployments If you're building AI apps, Docker solves dependency conflicts, ensures consistent environments, and makes scaling straightforward You don't need to become a Docker expert. You need to be able to containerize your FastAPI + LLM app and deploy it anywhere Resources: 1. Docker Official Getting Started Guide (free) Link: https://docs.docker.com/get-started/ The canonical starting point. Covers images, containers, Dockerfiles, and Docker Compose
2. freeCodeCamp: How to Build and Deploy a Multi-Agent AI System with Python and Docker (free) Link: https://www.freecodecamp.org/news/build-and-deploy-multi-agent-ai-with-python-and-docker/ Practical end-to-end tutorial building a real multi-agent pipeline with Docker Compose. Covers separation of concerns, cron scheduling, and security considerations 3. DataCamp: Deploy LLM Applications Using Docker (free) Link: https://www.datacamp.com/tutorial/deploy-llm-applications-using-docker Step-by-step guide specifically for LLM apps with RAG pipelines. Covers Dockerfile creation, environment management, and deployment 4. Docker Containerization for LLM Apps (ApXML, free) Link: https://apxml.com/courses/python-llm-workflows/chapter-10-deployment-operational-practices/containerization-docker-llm-apps
Covers base image selection, dependency management, multi-stage builds, and Docker Compose for multi-service LLM deployments What to focus on: Writing a Dockerfile for a Python/FastAPI app, using multi-stage builds to keep images small, Docker Compose for multi-service setups (app + database + Redis), environment variables for secrets, and .dockerignore to avoid leaking sensitive files Practice project: Containerize your RAG app from Month 3. Create a docker-compose.yml that runs your FastAPI app, a vector database (Chroma or Qdrant), and Redis for caching. Deploy it so that docker compose up starts everything 3. Background Jobs and Queues LLM calls are slow. If a user asks your app to process a document and you make them wait 30 seconds for a response, they'll leave Background jobs let you accept the request immediately, process it async, and notify the user when it's done Resources: 1. Celery Official Getting Started Guide (free)
Link: https://docs.celeryq.dev/en/stable/getting-started/introduction.html The standard Python task queue. Covers basic setup, task definition, and worker management 2. FastAPI Background Tasks Docs (official, free) Link: https://fastapi.tiangolo.com/tutorial/background-tasks/ Built-in lightweight background tasks for simple use cases. Use this for quick fire-and-forget tasks, Celery for anything heavier What to focus on: Understanding when to use FastAPI's built-in BackgroundTasks vs a proper task queue like Celery, setting up Redis as a message broker, handling task failures and retries, and returning job status to the user 4. Auth and API Key Security If your AI app has an API, it needs authentication. Without it, anyone can use your endpoints, burn through your LLM credits, and you'll wake up to a $5,000 bill
Resources: 1. FastAPI Security Docs (official, free) Link: https://fastapi.tiangolo.com/tutorial/security/ Covers OAuth2, JWT tokens, API keys, and dependency-based auth patterns. The official reference, work through the full tutorial 2. OWASP API Security Top 10 (free) Link: https://owasp.org/API-Security/ The authoritative list of API security risks. Understand broken authentication, injection, and mass assignment before shipping anything 3. Auth0: API Auth Best Practices (free)
Link: https://auth0.com/docs/get-started/authentication-and-authorization Practical guide to implementing authentication and authorization in APIs What to focus on: JWT tokens for user auth, API key management for service-to-service communication, rate limiting per user/key, never storing secrets in code (use environment variables), and understanding the difference between authentication (who are you) and authorization (what can you do) 5. Logging and Observability In production, if you can't see what's happening, you can't fix what's broken LLM apps have a unique challenge: the model can return a 200 status code and still produce a useless or hallucinated answer. Traditional monitoring doesn't catch this. You need LLM-specific observability Resources: 1. Langfuse (open source, free tier)
Link: https://langfuse.com/docs/observability/overview Open-source LLM observability platform. Traces every request: prompt sent, response received, token usage, latency, tool calls. Supports prompt versioning, evaluation, and LLM-as-judge scoring. Integrates with OpenAI, Anthropic, LangChain, LlamaIndex 2. LangSmith (free tier) Link: https://smith.langchain.com/ From the LangChain team. If you're using LangChain/LangGraph, setup is one environment variable. Tracing, debugging, monitoring dashboards, and online evals. The free tier is generous for development and small-scale production 3. Python Structlog (free) Link: https://www.structlog.org/ Structured logging for Python. Produces JSON logs that are actually searchable and parseable. Far better than print() or basic logging for production apps
What to focus on: Tracing every LLM call (input prompt, output, tokens, latency, cost), structured logging with JSON output, setting up dashboards that show request volume, error rates, and cost per day, and alerting when something breaks or costs spike 6. Prompt and Version Management In production, your prompts are code. They need version control, testing, and rollback ability Changing a prompt in production without tracking what you changed is how you break things and can't figure out why Resources: 1. Langfuse Prompt Management (free) Link: https://langfuse.com/docs/prompts Centralized prompt versioning with a built-in playground for testing. Version control your prompts separately from your application code. Deploy prompt changes without redeploying your app
2. Anthropic Prompt Management Best Practices (free) Link: https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview Best practices for organizing, iterating, and managing prompts at scale What to focus on: Storing prompts outside your application code, versioning every prompt change, A/B testing prompt variants in production, and having a rollback strategy when a new prompt performs worse 7. Cost Monitoring and Rate Limits LLM APIs charge per token. Without cost controls, a traffic spike or a bug in your prompt can burn through hundreds of dollars in minutes Resources: 1. OpenAI Usage Dashboard (official)
Link: https://platform.openai.com/usage Track spending by model, by day, and set usage limits 2. Anthropic Usage Dashboard (official) Link: https://console.anthropic.com/Same for Claude API usage 3. Helicone (free tier) Link: https://www.helicone.ai/ Proxy-based observability that captures every LLM call with automatic cost tracking. One line of code to set up: just change your base URL 4. LiteLLM (open source, free)
Link: https://github.com/BerriAI/litellm Unified interface for 100+ LLM providers. Includes budget management, rate limiting, and spend tracking across providers What to focus on: Setting hard spending limits per day/month, implementing per-user rate limits in your API, using cheaper models for simple tasks (don't use GPT-4/Opus for everything), caching repeated identical requests with Redis, and monitoring cost per request to catch expensive prompts early 8. Caching If 20% of your users ask similar questions, you're paying for the same LLM call 20 times Caching is the simplest way to reduce costs and latency simultaneously Resources: 1. Redis Official Docs (free)
Link: https://redis.io/docs/ The standard in-memory data store. Fast, simple, and works perfectly for LLM response caching 2. GPTCache (open source, free) Link: https://github.com/zilliztech/GPTCache Semantic caching specifically designed for LLM applications. Uses embedding similarity to find cached responses for semantically similar (not just identical) queries What to focus on: Exact-match caching for identical prompts, semantic caching for similar queries, cache invalidation strategies (TTL-based is simplest), and measuring cache hit rates to understand real cost savings Month 5 Milestone By the end of this month you should be able to:
Deploy a FastAPI + LLM app in Docker with proper production configuration Handle long-running tasks with background jobs and queues Secure your API with auth, rate limits, and API key management Trace and debug LLM calls using Langfuse or LangSmith Manage prompts with version control and rollback capability Monitor costs in real time and set spending limits Cache LLM responses to reduce latency and cost ⏩------------------------------------------------------------------------⏪
Month 6: Specialize and Become Hireable These knowledge and skills you gained can be applied in three directions (for sure it's only those which I see) You need to choose one of them and focus on practice Although everything mentioned above is also best learned purely through practice Direction 1: AI Product Engineer Best if you want startup jobs fast This is the most common path. You build AI-powered products that real users interact with You already have most of the skills from Months 1-5. Now go deeper on the product side
Focus on: LLM apps RAG agents deployment product UX What to learn this month: 1. End-to-End Product Building
Stop building tutorials. Build products people can use Resources: 1. Vercel AI SDK (free) Link: https://sdk.vercel.ai/docs The fastest way to build AI-powered UIs with streaming support. React, Next.js, and Vue integrations with built-in streaming UI components 2. Streamlit (free) Link: https://docs.streamlit.io/ Build data apps and AI demos in pure Python. Ideal for internal tools and MVPs, not production-scale UIs
3. Gradio (free) Link: https://www.gradio.app/docsQuick ML/AI interfaces with minimal code. Especially good for demoing models and building prototypes What to focus on: Building 2-3 complete projects this month that you can demo. A "chat with your docs" app, an AI-powered internal tool, or an agent that automates a real workflow. Ship them. Put them on GitHub. Deploy them somewhere people can try them 2. Product UX for AI AI products fail when the UX doesn't account for the model's limitations Resources: 1. Google: People + AI Guidebook (free) Link: https://pair.withgoogle.com/guidebook/
The best resource on designing human-AI interaction. Covers setting expectations, handling errors, and building trust 2. Nielsen Norman Group: AI UX Guidelines (free) Link: https://www.nngroup.com/topic/artificial-intelligence/ Research-backed guidelines for AI interfaces What to focus on: How to handle loading states with streaming, what to show when the model is wrong, how to let users give feedback, and designing for the fact that AI output is probabilistic – it will sometimes be wrong Direction 2: Applied ML / LLM Engineer Best if you want deeper technical roles This direction is for engineers who want to go beyond API calls and understand what's happening under the hood
Focus on: fine-tuning when to fine-tune vs prompt evaluation inference optimization open-source models training pipelines What to learn this month:
1. When to Fine-tune vs Prompt Engineer The most important decision in applied ML: do you need to change the model, or just change how you talk to it? Resources: 1. Google ML Crash Course: Fine-tuning, Distillation, and Prompt Engineering (free) Link: https://developers.google.com/machine-learning/crash-course/llm/tuning The clearest explanation of the three approaches and when to use each 2. Codecademy: Prompt Engineering vs Fine-Tuning (free) Link: https://www.codecademy.com/article/prompt-engineering-vs-fine-tuning
Practical decision framework with clear use cases for each approach 3. IBM: RAG vs Fine-Tuning vs Prompt Engineering (free) Link: https://www.ibm.com/think/topics/rag-vs-fine-tuning-vs-prompt-engineering Covers the complete decision space including when to combine approaches Decision framework to memorize: Start with prompt engineering (cheapest, fastest) Add RAG if the model needs access to specific data Fine-tune only when prompting + RAG can't achieve the required quality, consistency, or latency 2. Fine-tuning in Practice When you do need to fine-tune, here's how Resources:
1. OpenAI Fine-tuning Guide (official, free) Link: https://platform.openai.com/docs/guides/fine-tuning The easiest way to start fine-tuning. Upload a JSONL dataset, run a job, get a custom model. Good for learning the workflow even if you later move to open-source models 2. HuggingFace Transformers Fine-tuning Tutorial (free) Link: https://huggingface.co/docs/transformers/training The standard library for working with open-source models. Covers training, evaluation, and model saving 3. Unsloth (open source, free) Link: https://github.com/unslothai/unsloth
2x faster fine-tuning with 80% less memory. Supports LoRA and QLoRA out of the box. The fastest path to fine-tuning open-source models on consumer hardware 4. LLaMA-Factory (open source, free) Link: https://github.com/hiyouga/LLaMA-Factory Unified framework for fine-tuning 100+ LLMs. Includes a web UI for no-code fine-tuning. Supports LoRA, QLoRA, full fine-tuning, RLHF, and DPO What to focus on: Preparing training datasets (JSONL format), understanding LoRA and QLoRA (parameter-efficient fine-tuning), running a fine-tuning job on OpenAI or with HuggingFace, evaluating the fine-tuned model against the base model, and knowing when fine-tuning isn't worth the cost 3. Open-Source Models Not everything needs to go through OpenAI or Anthropic. Open-source models give you full control, no API costs, and the ability to run locally Resources:
1. Ollama (free) Link: https://ollama.ai/ Run open-source LLMs locally with one command. Supports Llama, Mistral, Gemma, and dozens of others. The fastest way to experiment with open-source models 2. HuggingFace Model Hub (free) Link: https://huggingface.co/models The largest repository of open-source models. Browse, download, and deploy models for any task 3. vLLM (open source, free) Link: https://github.com/vllm-project/vllm
High-throughput LLM inference engine. 2-4x faster than naive HuggingFace serving. The standard for production serving of open-source models What to focus on: Running models locally with Ollama for testing, understanding quantization (GGUF, GPTQ, AWQ) and why it matters for deployment, benchmarking open-source models against API models for your use case, and serving models in production with vLLM 4. Inference Optimization Making models run faster and cheaper in production Resources: 1. HuggingFace: Optimizing LLM Inference (free) Link: https://huggingface.co/docs/transformers/llm_optims Covers KV-cache optimization, quantization, and batching strategies
2. NVIDIA TensorRT-LLM (free) Link: https://github.com/NVIDIA/TensorRT-LLM Maximum inference performance on NVIDIA GPUs. Used by most production LLM serving at scale What to focus on: Batching strategies for throughput, quantization for reducing memory and cost, KV-cache optimization for faster generation, and choosing the right hardware for your inference workload Direction 3: AI Automation Engineer Best if you want to build for businesses immediately This direction is about automating real business workflows with AI. Less about building products, more about solving operational problems Focus on:
workflow orchestration business process automation multi-tool systems CRM, docs, email, support, ops use cases What to learn this month: 1. Workflow Orchestration Real business automation is almost never one LLM call. It's chains of actions across multiple systems Resources:
1. n8n (open source, free to self-host) Link: https://docs.n8n.io/ Visual workflow automation with AI nodes. Connect LLMs to 400+ integrations (Slack, Gmail, Notion, CRMs, etc.). The best no-code/low-code option for AI automation 2. LangGraph: Multi-Agent Workflows (free) Link: https://langchain-ai.github.io/langgraph/concepts/multi_agent/ Code-first orchestration for complex multi-agent systems. When n8n isn't enough and you need full programmatic control 3. Temporal (open source, free) Link: https://docs.temporal.io/
Durable workflow engine for long-running, fault-tolerant processes. When your automation needs to survive crashes, retries, and timeouts What to focus on: Designing workflows that handle failures gracefully, connecting AI to real business tools (email, CRM, databases, spreadsheets), building human-in-the-loop approval steps, and logging every automated action for audit trails 2. Business Process Automation The money in AI automation is in solving specific, expensive business problems Resources: 1. Zapier AI Actions (free tier) Link: https://zapier.com/ai Connect AI to 6,000+ apps without code. Good for prototyping automations before building custom solutions
2. Make (Integromat) (free tier) Link: https://www.make.com/ Visual automation platform with advanced logic and AI integrations. More powerful than Zapier for complex workflows What to focus on: Identifying the highest-ROI automation targets (usually tasks that are repetitive, time-consuming, and rules-based), building automations that augment humans rather than replace them, and measuring the actual time and money saved 3. CRM, Docs, Email, Support Automation The most common and most valuable AI automation use cases Resources: 1. OpenAI Cookbook: AI-Powered Email Processing (free)
Link: https://github.com/openai/openai-cookbook Patterns for classifying, routing, and responding to emails with AI 2. LangChain: Document Processing Pipelines (free) Link: https://python.langchain.com/docs/how_to/#document-loaders Ingesting and processing documents from 80+ sources What to focus on: Building an AI-powered email classifier and auto-responder, creating a document processing pipeline that extracts structured data, building a support chatbot that uses RAG over your knowledge base, and integrating AI into existing CRM workflows (HubSpot, Salesforce, etc.) Practice project for Direction 3: Build an end-to-end lead qualification system. It should: Scrape or import leads from a source (CSV, API, or form)
Use an LLM to research each lead (company info, fit assessment) Score and rank leads based on your ICP Draft personalized outreach messages Log everything to a spreadsheet or CRM This is a real, sellable automation that businesses actually pay for ⏩------------------------------------------------------------------------⏪ CONCLUSION What you can expect after these 6 months??? I'm going to be honest with you, without some money's mountains
This roadmap will not make you a senior AI engineer in 6 months But it will make you someone who can build, ship, and deploy real AI systems that solve real problems And right now, that is exactly what the market is paying for The demand for AI engineers is not slowing down. Job postings grew 25% year-over-year PwC found a 56% wage premium for roles that require AI skills vs the same roles without Only 1% of companies are considered "AI mature" which means 99% still need help. The US Bureau of Labor Statistics projects 26% job growth through 2034 These are not hype numbers. That's the real numbers based on analytics (took from Claude kek) If you go full-time in the US:
Junior AI engineers start at $90,000-$130,000 Mid-level (3-5 years) sits at $155,000-$200,000 Senior roles go $195,000-$350,000+ According to Glassdoor (March 2026), the average is $184,757 The mid-level band is growing the fastest at 9.2% year-over-year because companies desperately need people who can ship production AI without constant supervision If freelance is more your thing: AI agent development goes for $175-$300/hour RAG implementation $150-$250/hour
LLM integration $125-$200/hour One developer on Reddit built a document summarization tool for a legal firm in two weeks and made $8,000. A freelancer billing 25 hours/week at $150/hour pulls $195,000/year And if you go the consulting route, which is what I talked about in my earlier post, you can charge: $300-$5,000 to set up an AI agent for a business $500-$2,000/month for AI content management $1,000-$4,000 to automate customer support $500-$2,000 for cold outreach setup The service spectrum is even wider but once you master the skills from this roadmap, you are already a demanded specialist in 2026
These are real numbers from real people doing real work Now here is what I actually want you to take away from all of this: Pick one project from each month and build it. Not read about it. Not watch a tutorial. Build it, break it, fix it, deploy it, put it on GitHub. The engineers who get hired are the ones who show what they've built, not what they've studied Start sharing what you learn. Write about it on X, LinkedIn, anywhere. Teaching is the fastest way to learn and it builds your reputation at the same time. The best opportunities I've seen come from people who were visible, not from people who applied to 500 job listings And please don't wait until you feel ready. You will never feel ready. The gap between "I'm learning" and "I'm building" is where most people get stuck forever Start applying, start freelancing, start offering services the moment you have working projects. Even if they're not perfect. The market doesn't reward perfection. It rewards people who can ship 6 months is enough to change everything if you actually put in the work And I really believe each of you reading this can do it
Just never stop building and never stop learning Hope this was useful for you my fam ❤️