In partnership with

Today's Agenda

Good Afternoon!

Hello, fellow humans! Today, we’re exploring the abilities and limitations of AI agents. This is important to understand because right now, AI is a powerful tool for accelerating skilled humans, but it is still some distance from being able to effectively replace humans, especially when it comes to complex, high-skill work.

Personally, I don’t think we’ll ever get to a place where a human can outsource an entire operation to AI, and maybe that’s worth an issue all by itself, but it is valuable to be able to map where AI skills begin and end so that we can recognize where our greatest opportunities are for leveraging AI and defining our most valuable human skills.

So I’ll be talking about Claude Skills, an exciting new tool from Anthropic that can accelerate and automate parts of our workflows. A new report that measures the current state of fully autonomous work.

Wharton School Analysis of AI Agentic AI Adoption

This third annual study surveyed 800+ U.S. enterprise decision-makers from companies with 1,000+ employees and $50M+ revenue, tracking generative AI adoption from 2023's exploration phase through 2025's "accountable acceleration." The research examines usage patterns, investment strategies, ROI measurement, and human capital challenges across industries and functions.

Here are some surprising findings:

  • Mid-managers are more realistic than executives. 28% of managers say AI adoption is "much quicker" versus 56% of VP+, suggesting the C-suite has some wishful thinking going on.

  • Training investment is declining (-8pp) even as skill gaps grow, with confidence in training dropping 14 points while 43% warn of skill atrophy.

  • 30% of tech budgets now fund internal R&D, signaling enterprises are building custom AI solutions rather than relying on off-the-shelf tools.

  • Tier 1 enterprises ($2B+ revenue) report "too early" outcomes more often (34% neutral) despite bigger budgets, while smaller firms achieve faster ROI

Takeaways

  1. ROI measurement is now standard, making it more difficult to run informal pilots. 72% of orgs track structured metrics tied to profitability and productivity.

  2. AI is finding a niche where it performs best. Data analysis, document summarization, and editing dominate because they deliver consistent, measurable wins.

  3. Daily users jumped 17 points to 46%, with 82% using AI weekly. This is no longer experimental but operational.

  4. Budget reallocation is rising. 11% now cut legacy IT and HR programs to fund AI, up 7 points year-over-year.

  5. AI agents remain human-supervised. 58% use agents, but primarily for process automation and coordination, not autonomous decision-making.

5 Actionable Insights:

  1. Prioritize human capital over tools—hire CAIOs (60% now have them), invest in change management, and close the executive-manager perception gap before it becomes a culture crisis

  2. Focus on "productivity plus" use cases—document creation, data analysis, and code generation deliver immediate ROI while building toward more ambitious applications.

  3. Retail and Manufacturing must accelerate—these industries risk falling behind despite having numerous high-value use cases in operations, customer experience, and supply chain.

  4. Balance training and hiring strategically—with training confidence down 14 points and advanced AI talent scarce (49% cite this as top challenge), identify peak value AI skills and develop hybrid upskilling approaches to target them.

  5. Prepare for 2026's "performance at scale" phase—with 88% increasing budgets and 87% expecting positive returns within 2-3 years, establish clear benchmarks, proven playbooks, and trusted guardrails now to convert adoption into durable competitive advantage

Your career will thank you.

Over 4 million professionals start their day with Morning Brew—because business news doesn’t have to be boring.

Each daily email breaks down the biggest stories in business, tech, and finance with clarity, wit, and relevance—so you're not just informed, you're actually interested.

Whether you’re leading meetings or just trying to keep up, Morning Brew helps you talk the talk without digging through social media or jargon-packed articles. And odds are, it’s already sitting in your coworker’s inbox—so you’ll have plenty to chat about.

It’s 100% free and takes less than 15 seconds to sign up, so try it today and see how Morning Brew is transforming business media for the better.

Claude Skills

Claude Skills represent a major evolution in AI customization, allowing users to teach Claude specialized, reusable workflows that activate automatically when relevant. Skills are essentially folders that package expertise—containing instructions, scripts, and resources—making Claude a specialist on what matters most to the user. Unlike traditional repeated prompting, users define the workflow once in a skill.

I’m not going into a tutorial on how to use Skills, but I want to give you a rundown of what they are and why you should get up to speed on them.

  • Skills are reusable procedures that Claude automatically invokes based on task relevance. Projects, conversely, are workspaces organized by the user to keep related files and conversations together, and they must be manually selected.

  • Claude Code can invoke Subagents that are separate instances of code invoked by the main Claude agent to delegate and assign specific subtasks, with each Subagent maintaining its own self-managed context.

  • Skills offer a simpler, more accessible alternative to Model Context Protocol (MCP), providing similar functionality with improved efficiency and less complexity. Skills don’t replace MCP; MCP is still an integral part of the agent’s architecture for tool calling and connecting to external services. Skills can complement MCP by applying some nuance of how a task is accomplished.

What Can Skills Do?

  1. Skills Can Install Dependencies. Skills can encapsulate not just the execution code but also the instructions on how to install necessary libraries (like Manim, for visualization) that an agent might need to complete a task.

  2. Perform Tasks With No Coding. Despite their technical power in executing scripts and managing context, building a skill requires no coding; the core definition relies on describing the workflow in a markdown file.

  3. Only Loads into Context What it Needs. The way that Claude can use Skills so efficiently is that each Skill has a summary called “front matter,” which is just a quick note to tell the Large Language Model (LLM) what is in the Skill. If the Skill is relevenat, only then does the LLM load the main Skill and code resources after the skill has been identified as relevant to the current task.

  4. Skills Can Load, Create, and Manipulate Files. It can load a slide deck you’re working on, and automatically apply certain standard formatting, such as logos, typefaces, and data conventions throughout the presentation.

  5. Skills Can Have a Folder For Its Resources. Whatever files, images, datasets, code snippets you want the Skill to be able to use, you can put all those assets in a folder, and Skills will have consistent access to them.

5 Practical Takeaways

  1. Automation of Repetitive Tasks: Skills enable the automation of any repeatable workflow, eliminating the need to provide the same complex set of instructions, examples, and constraints repeatedly.

  2. Built-in Document Creation: Anthropic provides useful pre-built skills for immediately creating standard documents like Excel spreadsheets, PowerPoint presentations, Word documents, and PDFs.

  3. Cross-Platform Portability: Once created and uploaded, skills are portable and available everywhere, including Claude apps, Claude Code, API, desktop, and mobile.

  4. Expertise Packaging: Skills are effective containers for bundling comprehensive resources, including instructions, full documentation, examples, and code files, making the agent robustly self-sufficient for that specific workflow.

  5. Interactive Skill Building: Users can leverage the built-in "skill-creator" skill within Claude to interactively generate the structure and content needed for new custom skills.

5 Actionable Opportunities

  1. Define a Custom Analysis Framework: Create a custom skill to enforce internal analysis standards or reporting formats, ensuring every output adheres to specific brand guidelines or methodology.

  2. Leverage the Skill Creator: Immediately activate and use the "skill-creator" built-in skill to guide the creation of your first custom workflow (e.g., for YouTube script writing or brand voice adherence).

  3. Bundle Internal Code Libraries: If your team uses proprietary or specialized code scripts for data processing, bundle these scripts and the necessary execution instructions into a skill ZIP file for Claude to execute reliably.

  4. Standardize Context: Document complex, necessary context—such as legal disclaimers, compliance requirements, or extensive corporate history—into a skill so it is invoked only when relevant, saving tokens in general usage.

  5. Enable Built-in Skills: Go to Claude Settings > Capabilities and enable available built-in skills (like the Slack GIF creator or document creators) to familiarize yourself with how they operate.

Remote Labor Index Analysis

Scale AI and CAIS published The Remote Labor Index, a study that evaluates frontier AI agents on real-world freelance projects spanning game development, product design, architecture, data analysis, and video animation. The benchmark includes over 6,000 hours of work valued at $140,000+, with projects costing up to $10,000 and taking over 100 hours to complete.

While the findings are interesting, the bigger story is that this may be an important benchmark for evaluating autonomous AI labor capabilities, so it’s important to understand how this study measures performance. Here are the guidelines:

High Quality Standards

The benchmark requires AI agents to complete projects "at a quality level that would be accepted as commissioned work." The study uses an extremely strict threshold: AI deliverables must be "accepted by a reasonable client as commissioned work" to count as success. This binary metric doesn't capture partial completion or near-misses. The Elo scoring reveals models are making "steady progress" even when failing the binary threshold—suggesting many deliverables are approaching but not quite reaching acceptable quality.

No Revisions

This is an extremely high bar—human freelancers themselves have revision rates and quality issues.If the evaluation allowed for iterative feedback (as real freelance work does), success rates might improve significantly. The study appears to measure fully autonomous completion without human guidance. Real-world AI deployment typically involves human-AI collaboration, where humans provide clarification, course-correction, and quality checks.

High Complexity

The benchmark includes projects costing $10,000+ and requiring 100+ hours—these represent the most complex tier of freelance work. If the dataset is weighted toward these extremely difficult projects rather than typical freelance tasks, it may underrepresent AI's utility for more common, shorter-duration projects. While the study covers 23 work categories, it deliberately excluded categories where "most projects...can already be solved by AIs" (notably content writing). The median project cost is $200 with completion time of 11.5 hours—representing mid-to-high complexity professional work.

Domain Successes

AI succeeded on "audio editing, mixing and production tasks...image-generation tasks...report writing...code for interactive data visualization." These successes indicate strong domain-specific capability that might generalize if agents could better recognize project types and route to appropriate specialized models/tools—a systems integration problem rather than a fundamental capability limit.

Surprising Insights

  1. Despite AI saturating research benchmarks, the best-performing agent automated only 2.5% of projects—a 97.5% failure rate on economically valuable tasks

  2. Projects span such complexity and breadth that even state-of-the-art systems like GPT-5, Claude Sonnet 4.5, and Gemini 2.5 Pro perform near zero

Practical Takeaways

  1. Current AI agents cannot reliably replace freelancers for multi-step, creative remote work

  2. Organizations should temper expectations about immediate AI-driven workforce displacement

  3. Progress is measurable, enabling stakeholders to track automation trajectory proactively

Opportunities (Most Ambitious → Most Actionable)

  1. Develop hybrid human-AI workflows for complex projects exceeding 100 hours

  2. Create specialized agents for narrow task categories within these project types

  3. Build quality-control systems that identify which subtasks AI can reliably handle

  4. Establish benchmarking protocols in your organization using RLI methodology

  5. Audit current AI tools against actual project requirements before deployment decisions

Radical Candor

"By far, the greatest danger of Artificial Intelligence is that people conclude too early that they understand it."

Eliezer Yudkowsky

Thank You!

Keep Reading

No posts found