Skill

SkillsAI & Agent Engineering › Agent frameworks & orchestration

agentic-engineering

Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing. Use when AI agents perform most implementation work and humans enforce quality and risk controls.

Freerisk: medium
agenticengineering

The full skill

— name: agentic-engineering description: > Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing. Use when AI agents perform most implementation work and humans enforce quality and risk controls. metadata: origin: ECC — # Agentic Engineering Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls. ## Operating Principles 1. Define completion criteria before execution. 2. Decompose work into agent-sized units. 3. Route model tiers by task complexity. 4. Measure with evals and regression checks. ## Eval-First Loop 1. Define capability eval and regression eval. 2. Run baseline and capture failure signatures. 3. Execute implementation. 4. Re-run evals and compare deltas. **Example workflow:** “` 1. Write test that captures desired behavior (eval) 2. Run test → capture baseline failures 3. Implement feature 4. Re-run test → verify improvements 5. Check for regressions in other tests “` ## Task Decomposition Apply the 15-minute unit rule: – Each unit should be independently verifiable – Each unit should have a single dominant risk – Each unit should expose a clear done condition **Good decomposition:** “` Task: Add user authentication ├─ Unit 1: Add password hashing (15 min, security risk) ├─ Unit 2: Create login endpoint (15 min, API contract risk) ├─ Unit 3: Add session management (15 min, state risk) └─ Unit 4: Protect routes with middleware (15 min, auth logic risk) “` **Bad decomposition:** “` Task: Add user authentication (2 hours, multiple risks) “` ## Model Routing Choose model tier based on task complexity: – **Haiku**: Classification, boilerplate transforms, narrow edits – Example: Rename variable, add type annotation, format code – **Sonnet**: Implementation and refactors – Example: Implement feature, refactor module, write tests – **Opus**: Architecture, root-cause analysis, multi-file invariants – Example: Design system, debug complex issue, review architecture **Cost discipline:** Escalate model tier only when lower tier fails with a clear reasoning gap. ## Session Strategy – **Continue session** for closely-coupled units – Example: Implementing related functions in same module – **Start fresh session** after major phase transitions – Example: Moving from implementation to testing – **Compact after milestone completion**, not during active debugging – Example: After feature complete, before starting next feature ## Review Focus for AI-Generated Code Prioritize: – Invariants and edge cases – Error boundaries – Security and auth assumptions – Hidden coupling and rollout risk Do not waste review cycles on style-only disagreements when automated format/lint already enforce style. **Review checklist:** – [ ] Edge cases handled (null, empty, boundary values) – [ ] Error handling comprehensive – [ ] Security assumptions validated – [ ] No hidden coupling between modules – [ ] Rollout risk assessed (breaking changes, migrations) ## Cost Discipline Track per task: – Model tier used – Token estimate – Retries needed – Wall-clock time – Success/failure outcome **Example tracking:** “` Task: Implement user login Model: Sonnet Tokens: ~5k input, ~2k output Retries: 1 (initial implementation had auth bug) Time: 8 minutes Outcome: Success “` ## When to Use This Skill – Managing AI-driven development workflows – Planning agent task decomposition – Optimizing model tier selection – Implementing eval-first development – Reviewing AI-generated code – Tracking development costs ## Integration with Other Skills – **tdd-workflow**: Combine with eval-first loop for test-driven development – **verification-loop**: Use for continuous validation during implementation – **search-first**: Apply before implementation to find existing solutions – **coding-standards**: Reference during code review phase