Ai Mini Challenge: Crucial Things We Need to Know Before Creating AI Tools

Ai Mini Challenge: Crucial Things We Need to Know Before Creating AI Tools

Ai Mini Challenge: Crucial Things We Need to Know Before Creating AI Tools

1. Introduction

Hey everyone! I'm Dante, and I'm part of the Bante team, a group of five enthusiastic developers – Barry, Jett, Andre, Chein, and myself. We are all Backend Developers.

2. Situation

Before this challenge, most of us were primarily AI users. We'd interact with AI tools and marvel at their capabilities, but the inner workings remained a bit of a black box. This sparked a deep curiosity: we wanted to peel back the layers and understand everything about AI, from Large Language Models (LLMs) and embeddings to indexing, agents, and post-processing. The mini-challenge presented the perfect opportunity to learn by building, pushing us to create a practical AI tool from the ground up.

Our initial assumption was that building AI tools would be a relatively straightforward process of integrating existing APIs and models. We quickly learned that the reality was far more complex and challenging, especially when it came to achieving high accuracy for a specific use case.

3. Task

For the challenge, we chose a practical and highly relevant topic: creating a knowledge base for our GitHub repository. The goal was to build an AI tool that could answer questions about the codebase and review Pull Requests (PRs). We leveraged llama-index as our core framework, specifically for its strong support for code indexing and its powerful agent capabilities.

The problem we aimed to solve was twofold:

  1. Accelerate Newcomer Onboarding: Help new team members understand the project faster.

  2. Streamline Development Workflow: Assist technical and non-technical members with code reviews and documentation.

4. Actions/Crucial Steps

Our development journey was carefully structured around these crucial initial steps:

  1. Enhance Knowledge About AI: We dedicated significant time to learning the core principles of AI. This was a non-negotiable step, as we needed a solid foundation to build, troubleshoot, and improve our solution effectively.

  2. Define the Problem and Solution: We carefully scoped our project to avoid getting sidetracked. We chose to build our solution without relying on overly simplistic, off-the-shelf components, forcing us to learn more about the underlying mechanisms.

  3. Organize and Plan: With five members, meticulous planning was essential. We broke down the project into manageable tasks, assigned responsibilities, and set clear goals for each meeting, ensuring we maintained a steady and organized development pace.

5. Results

Our methodical approach paid off. We found that our team operated smoothly and efficiently, and our knowledge base continuously improved over time. The key lesson here is to take action and start building, even if you don't feel 100% confident. With a collaborative spirit, you can build something substantial.

The reality, however, was a stark contrast to our initial expectations. We discovered a significant accuracy gap between our custom-built tool and commercial AI products. Achieving consistent, high-quality responses for complex code-related queries was a major challenge, and we quickly learned that building a perfect AI tool is a journey of continuous refinement, not a one-time event.

How We Built Our AI Agent

Our architecture, as shown in the diagram you provided, is centered around a robust AI agent. Here’s a brief overview of our system, which we meticulously designed to optimize for accuracy and context:

  • Data Ingestion and Indexing: This was a core focus of our project. We didn't just dump raw code into an index. Instead, we developed a sophisticated ingestion pipeline:

    • We used PyGithub to pull source code and documentation from our repository.

    • For code files, we leveraged tree-sitter-languages to parse the code's Abstract Syntax Tree (AST). This allowed us to understand the actual structure of the code, rather than treating it as plain text. This semantic understanding was key to generating effective code-related answers.

    • For other documentation, such as Markdown files and PDFs, we prepared them for chunking and transformation into llama-index documents.

    • The processed documents were then used to create a Vector Index, which is essential for semantic search.

    • Additionally, we created a Keyword Index to capture important terms and concepts within the codebase, which was crucial for increasing search accuracy for specific queries. We also indexed the repository's file structure to provide the agent with a better understanding of the project's layout.

  • Core Services: Our system includes several key services orchestrated by our AI agent:

    • Memory Service: Manages conversation memory, storing chat history in a JSON format to ensure context across sessions.

    • Vector Search Service: Handles retrieving relevant code snippets from the vector store based on a user's query.

    • LLM Service: Integrates with an LLM (like Azure OpenAI) to generate responses.

  • Agentic Capabilities: We built a ReActAgent that intelligently orchestrates these services. The agent's tools include the ability to perform both vector and keyword searches, analyze the repository structure, and access a memory store. For example, when asked to "analyze a PR," the agent can use its tools to fetch the PR data, search both the vector and keyword indexes for relevant code and context, and then synthesize a detailed analysis.

  • API and Web UI: We exposed our functionalities through a fastapi backend and a simple web interface for easy interaction. This included endpoints for general questions, memory-based chat, agent-enhanced chat, and PR analysis.

Measuring Progress: Our Evaluation Strategy

One of the most valuable aspects of our project was implementing our own evaluation system. This proved to be an excellent way to track if our AI agent was indeed getting "smarter" over time.

Our evaluation process involved:

  • Building a Question Bank: We created a comprehensive set of questions relevant to our codebase and project documentation.

  • Defining Expected Answers and Scoring: For each question, we set up sample answers and assigned a point system to them, allowing for nuanced scoring.

  • Automated Testing: We used our query engine core (which is built directly on our sophisticated retriever and indexes) to answer the questions in the bank.

  • LLM-Based Scoring: A simple LLM was then employed to rank and assign points to the answers generated by our agent, comparing them against our predefined sample answers.

We were incredibly happy to see our engine core's average score increase over time. However, the journey wasn't a straight line. As you can see from our evaluation logs, the scores sometimes went up and down:

This fluctuation was incredibly insightful! Sometimes, when our team collaboratively upgraded parts of the code, it inadvertently affected the "IQ" of our engine core. This evaluation system helped us immediately realize these dips in performance and work together to fix them, ensuring our agent continuously improved.

Key Takeaways/Recommendations for Others

For anyone looking to embark on their own AI tool development journey, here are three crucial takeaways:

  • Build Your AI Knowledge Foundation: Before you write a single line of code, invest time in truly understanding the core concepts of AI. This includes learning about LLMs, embeddings, prompt engineering, and agentic design. A strong theoretical background will save you countless headaches down the line.

  • Embrace Iteration and Manage Accuracy Expectations: Don't aim for perfection from day one. Start with a small, functional MVP and iterate continuously. Be prepared for the significant challenge of achieving high accuracy with custom AI tools; it's a journey of continuous refinement, not a one-time build.

  • Prioritize Collaboration, Data Quality, and Continuous Evaluation: When working in a team, clear communication and task organization are vital. Simultaneously, recognize that the quality of your data directly dictates your AI's performance. Investing time in advanced ingestion strategies, like using parser for code and multiple indexing types, will yield far better results than any amount of clever coding alone. Crucially, implement a robust, automated evaluation system from the start. It's your best tool for tracking progress, identifying regressions caused by new features or changes, and ensuring your agent truly gets smarter over time.
     

More like this

MFV AI Mini Challenge: Lessons Learned
Jul 23, 2025

MFV AI Mini Challenge: Lessons Learned

AI Mini Challenge: From Idea to Real-World Impact
Aug 25, 2025

AI Mini Challenge: From Idea to Real-World Impact

How to create your own gem in Ruby
May 24, 2024

How to create your own gem in Ruby