Project Plan: The Contextual Debt Evaluator - AgentBeats Green Phase

Welcome, team! This document is our strategic guide to winning the AgentBeats Green Phase. It outlines our vision, our plan to address critical gaps in our project, our new architecture, our roles, and our week-by-week timeline to the submission deadline. Let's build something amazing together.

1. Project Vision & Competitive Edge

Our project is a Green Agent: a sophisticated evaluator designed to solve a critical problem in AI-generated code: the lack of discernible human intent, or "Contextual Debt." While other benchmarks focus only on whether code works, our agent asks a more important question: is the code good? Is it well-architected, clearly-reasoned, and thoroughly tested?

Our competitive edge lies in our novel "Agent-as-a-Judge" philosophy. We are not just building a static tool; we are building an intelligent system that analyzes the process and quality of an AI agent's work. This innovative approach aligns perfectly with the competition's goal of creating "shared public goods" that push the entire field of agentic AI forward. We have a strong foundation, and with a focused effort, we can turn this powerful concept into a winning submission.

2. Our Mission: Closing the Gaps

The recent Gap Analysis Report gave us a clear, actionable roadmap. Our mission is to systematically address every "Area for Improvement" identified in that report. Here is our plan:

3. The New Architecture: From Monolith to Multi-Agent

To address the architectural gap, we are moving to a more powerful and scalable multi-agent design. The diagrams below illustrate this evolution.

Current Architecture

This is what our system looks like now: a single agent making a direct call to an LLM.


graph TD
    A[Green Agent Server] --> B{RationaleDebtAnalyzer};
    B --> C[LLM API];
    C --> B;
    B --> A;
        

Proposed Architecture

This is what we will build: a collaborative team of agents. This design is more innovative, robust, and aligns with advanced agentic principles.


graph TD
    subgraph Green Agent System
        A[API Endpoint] --> B[Orchestrator Agent];
        B --> |1. Decompose Task| C{Rationale Analyzer Worker};
        B --> |1. Decompose Task| D{Architecture Analyzer Worker};
        B --> |1. Decompose Task| E{Testing Analyzer Worker};

        C --> F[LLM API];
        D --> G[Static Analysis Tool];
        E --> H[Test Execution Sandbox];

        subgraph Security & Audit
            I(Auth0 FGA) -.-> H;
            J[Audit Logger] <-.-> C;
            J <-.-> D;
            J <-.-> E;
        end

        F --> C;
        G --> D;
        H --> E;

        C --> |2. Report Results| B;
        D --> |2. Report Results| B;
        E --> |2. Report Results| B;

        B --> |3. Aggregate & Finalize| K[Final Report];
    end
        

4. Our Team: Roles & Responsibilities

This project is a team effort. We have designed eight roles to give every member ownership over a critical area. Each role is a learning path, providing a unique opportunity to develop valuable skills.

Role Responsibilities Key Skills to Develop
Team Lead - Oversee the entire project, ensuring alignment with the plan.
- Manage the technical integration of all modules.
- Lead weekly check-ins and faculty advisor meetings.
Project Management, Technical Leadership, System Architecture
Narrative & Pitch Strategist
*(Non-Coding)*
- Own the project's story; develop the "Cyber-Sentinel Agent" narrative.
- Write the final submission document and pitch script.
- Research competing benchmarks.
Strategic Communication, Technical Writing, Market Analysis
Documentation & Project Coordinator
*(Non-Coding)*
- Manage and update the official PROJECT_PLAN.md and README.md.
- Document our architecture and API.
- Prepare agendas for weekly meetings.
Project Coordination, Documentation Management, Technical Writing
UX & Benchmark Designer
*(Non-Coding)*
- Define the specific cybersecurity tasks for our benchmark.
- Design the scoring rubric for our analyzers.
- Ensure the evaluation criteria are clear, fair, and robust.
User Experience (UX) Design, System Design, Critical Thinking
Auth0 Security Specialist
*(Coding)*
- Lead the integration of Auth0 for AI Agents.
- Implement the FGA model for securing agent tools.
- Become our team's expert on agent security.
API Integration, Security Engineering, Identity & Access Management
Core Logic Developer (Architecture)
*(Coding)*
- Fully implement the architecturalDebtAnalyzer.
- Integrate a static analysis library (e.g., escomplex).
- Develop the logic for scoring architectural quality.
TypeScript, Data Analysis, Software Quality Metrics
Core Logic Developer (Testing)
*(Coding)*
- Fully implement the testingDebtAnalyzer.
- Build the sandboxed environment for running tests.
- Develop the logic for scoring test coverage and quality.
TypeScript, Sandboxing (isolated-vm), Test Automation
Agent & Infrastructure Engineer
*(Coding)*
- Build the "Orchestrator Agent" and the worker communication protocol.
- Implement the structured audit logger.
- Manage the server infrastructure and API endpoints.
Node.js, System Architecture, API Design, Agentic Patterns

5. The Blueprint: Modular Work Breakdown

We will tackle the project in focused, modular workstreams.

Implementation status summary

Below is a short summary of where the repository stands relative to the plan (high level):

Timeline notes (quick status)

6. Our Path to Submission: Project Timeline

We have 6 weeks until the submission deadline on December 19, 2025. Here is our week-by-week plan to get us there.

Week Dates Primary Focus Key Goals & Milestones
Week 1 Nov 7 - Nov 13 Foundation & Planning - Team kickoff: review and adopt this project plan.
- Assign all team roles.
- Finalize the "Cyber-Sentinel" narrative and benchmark tasks.
- Weekly Faculty Advisor Check-in
Week 2 Nov 14 - Nov 20 Core Logic & Security Prototyping - Begin implementation of core analyzers.
- Auth0 Specialist: complete Auth0 tutorials and create a proof-of-concept FGA integration.
- Weekly Faculty Advisor Check-in
Week 3 Nov 21 - Nov 27 Full Implementation - Complete v1 of all three analyzer services.
- Agent Engineer: build the first version of the Orchestrator agent.
- Weekly Faculty Advisor Check-in
Week 4 Nov 28 - Dec 4 Integration & Testing - Integrate the core analyzers with the Orchestrator.
- Integrate the Auth0 FGA security model into the benchmark.
- Begin end-to-end testing.
- Weekly Faculty Advisor Check-in
Week 5 Dec 5 - Dec 11 Finalization & Documentation - Freeze new feature development.
- Focus on bug fixing, testing, and refinement.
- Complete all project documentation.
- Draft the submission paper and pitch script.
- Weekly Faculty Advisor Check-in
Week 6 Dec 12 - Dec 19 Submission & Pitch Prep - Final review of the submission package.
- Record project demo video.
- Rehearse the final pitch.
- SUBMIT PROJECT (Dec 19)

7. Next Steps

Our immediate next step is for the entire team to read this document thoroughly. In our kickoff meeting, we will discuss this plan, confirm our roles, and officially begin our journey. Let's get ready to build a benchmark that will shape the future of agentic AI.