Skip to content

Research Methodology

This guide is the product of systematic research, not opinions. Here’s how we arrived at every recommendation.

We conducted extensive web research across:

  • Official documentation: Anthropic’s Claude Code docs, Microsoft Copilot Workspace docs, Cursor docs, context engineering guides, and the 2026 Agentic Coding Trends Report
  • Academic papers: TDAD (Test-Driven Agentic Development), TDFlow, ACM studies on AI code quality, ArXiv papers on multi-agent systems
  • Industry reports: McKinsey’s State of AI, CodeScene benchmarks, enterprise case studies (TELUS, Zapier, Rakuten)
  • Community knowledge: HumanLayer’s Advanced Context Engineering guide, Simon Willison’s Agentic Engineering Patterns, GitHub’s Spec-Driven Development toolkit
  • Practitioner blogs: Thoughtworks, InfoQ, The New Stack, Google Developers Blog

Over 20 authoritative sources were deeply analyzed, with key findings extracted and cross-referenced.

We ran controlled experiments using sub-agents to benchmark different approaches:

Experiment 1: Prompting Approaches

  • Compared minimal, context-rich, and spec-driven+TDD prompts on identical tasks
  • Measured completeness, edge case handling, test coverage, and code quality
  • Finding: Spec-driven+TDD outperformed minimal by 2-4x across all dimensions

Experiment 2: Context Management Strategies

  • Compared monolithic agent configuration files, hierarchical context, and progressive disclosure+FIC
  • Measured instruction adherence, context efficiency, and error rates
  • Finding: Progressive disclosure with FIC achieved the best balance of quality and efficiency

Experiment 3: Multi-Agent Orchestration

  • Compared single agent, hierarchical, and pipeline patterns
  • Measured token efficiency, quality, and context purity
  • Finding: Hierarchical is the best default; pipeline for quality-critical work

Research findings were cross-referenced and synthesized into actionable recommendations. Where sources disagreed, we noted the disagreement and provided guidance on when each approach applies.

Recommendations were validated against:

  • Anthropic’s official best practices documentation
  • Real-world case studies with published metrics
  • Academic benchmarks with reproducible results
  • Internal experiments with measurable outcomes
SourceTypeKey Contribution
Anthropic: Effective Context EngineeringOfficial documentationContext engineering principles, compaction strategies
Claude Code Best PracticesOfficial documentation (Claude Code)Agent configuration files, sub-agents, verification patterns
Anthropic: Eight Trends 2026Industry reportMarket data, enterprise case studies
TDAD PaperAcademic researchTDD regression data, prompting paradox discovery
HumanLayer: Advanced Context EngineeringCommunity guideFIC methodology, phase-based workflows
SourceTypeKey Contribution
CodeScene: Agentic AI PatternsIndustry researchSix operational patterns, code health metrics
HumanLayer: Writing a Good CLAUDE.mdCommunity guide (Claude Code-focused)Configuration file length research, progressive disclosure
GitHub: Spec-Driven DevelopmentIndustry guideMarkdown-as-code patterns
Tweag: Agentic TDD HandbookCommunity guideTDD workflow patterns for agents
InfoQ: Prompts to ProductionIndustry articleOrchestration patterns, capability matrices
Will Larson: Context CompactionPractitioner blogVirtual file abstraction, compaction triggering
Microsoft: Agent Orchestration PatternsOfficial documentationSequential, concurrent, hierarchical patterns
Google: Context-Aware Multi-AgentOfficial documentationProduction multi-agent architecture
  • Tool-specific: Many recommendations apply broadly to any agentic coding tool; others were validated specifically against particular tools and may need adaptation. See the Tool Configuration Reference for guidance on applying these practices in your tool.
  • Rapidly evolving: The agentic AI landscape changes monthly. Recommendations valid in March 2026 may need updating.
  • Context-dependent: No single approach works for all projects. Recommendations are guidelines, not rules.
  • Experiment scale: Our comparative experiments are illustrative, not statistically rigorous large-scale studies. They complement (not replace) the academic research cited.
  1. Start with the principles — they change least frequently
  2. Adapt the techniques — to your specific project, team, and tooling
  3. Measure your own results — using the metrics framework
  4. Update your practices — as the field evolves