7 LLM Projects to Enhance Your Machine Learning Portfolio

Large language models (LLMs) are incredibly useful for a range of tasks. While developing applications powered by LLMs may seem intimidating at first, you really only need:

Basic programming skills, ideally in Python or TypeScript.
A handful of mundane tasks or challenges you’d like to simplify.

Building LLM applications entails running and interacting with LLMs while connecting to various data sources, such as local files, APIs, and databases. Below is a collection of tools and frameworks that can aid you in creating LLM applications:

Programming Languages: Python, TypeScript
Frameworks: LangChain, LlamaIndex
APIs: OpenAI API, Cohere API
Running LLMs: Ollama, Llamafile
Vector Databases: ChromaDB, Weaviate, Pinecone, among others

In this guide, we’ll explore seven exciting projects that you can develop using LLMs. As you progress, you’ll gain experience with vector databases, frameworks, and essential APIs, supported by learning resources and project examples to help you get started. Let’s dive in!

Table of Contents

1. Retrieval-Based Q&A App for Technical Documentation

Create a Q&A system tailored for developers that utilizes RAG (Retrieval-Augmented Generation) to extract information from various technical documents, Stack Overflow, or internal knowledge bases. Such an application can summarize complex concepts and answer specific queries efficiently.

Key components include:

The RAG framework for document retrieval.
Open-source LLMs for interpreting queries and generating responses.
API integrations for external resources like Stack Overflow and Confluence.

This application aims to provide developers with immediate, trustworthy answers without the need to sift through extensive documentation, especially useful for frameworks with comprehensive documentation like Django.

To learn about RAG, refer to LangChain: Chat with Your Data by DeepLearning.AI and Learn RAG From Scratch.

2. LLM-Powered Workflow Automation Agent

Develop an agent capable of simplifying repetitive tasks based on natural language instructions. This agent should navigate a sequence of predefined or autonomous steps to fulfill desired outcomes.

Possible tasks include:

Creating project folders.
Setting up Git repositories.
Generating project dependency files.

Essential components, aside from the LLM, are:

API integrations with tools such as Docker, Git, and AWS.
An engine to run LLM-generated scripts.

Enhancing your initial version can lead to a more helpful application that alleviates administrative tasks, allowing developers and teams to concentrate on more valuable work.

3. Text-to-SQL Query Generator

Creating business queries in natural language is often straightforward, but translating them into SQL can be quite complex. Building a text-to-SQL generator can bridge this gap.

Your app should:

Convert user input into SQL queries conforming to a given database schema.
Execute these queries against a connected database to retrieve relevant results.

For a practical approach, follow the End-To-End Text-To-SQL LLM App walkthrough by Krish Naik.

4. AI-Powered Documentation Generator for Codebases

Construct a tool capable of scanning code repositories to automatically generate detailed documentation, including summaries of functions, module explanations, and architecture overviews. This could be developed as a CLI tool or a GitHub Action.

Requirements involve:

Integration with repository services for code scanning.
Features for reviewing and providing feedback on generated documentation.

This tool can significantly streamline the documentation process for development teams, saving countless hours, even if achieving perfection remains a challenge.

5. AI Coding Assistant

Design an LLM-powered coding assistant that operates as a real-time pair programmer. This tool should offer suggestions, code snippets, debug existing code, and deliver real-time explanations of complex logic during coding sessions.

Key features to focus on include:

A good choice of LLMs proficient in code generation.
Integration with IDEs, such as a VS Code extension.
Contextual awareness of the current coding environment.

Explore the ADVANCED Python AI Agent Tutorial – Using RAG for a comprehensive guide on building a coding assistant.

6. Text-Based Data Pipeline Builder

Create an LLM application that allows users to define data pipelines using natural language. For instance, a user might say, “Write an ETL script to ingest a CSV file from S3, clean the data, and load it into a PostgreSQL database.” The app then generates the code for a complete ETL pipeline using tools like Apache Airflow or Prefect.

Focus areas should include:

Support for various data sources and destinations.
Automation of pipeline creation and scheduling.

This application will enable users to build and manage complex data pipelines with minimal coding effort, providing a significant advantage compared to building pipelines from scratch.

7. LLM-Powered Code Migration Tool

Consider developing a code migration tool that uses LLMs to analyze code written in one programming language and convert it into another. For example, you might migrate Python code to Go or Rust.

Key elements to experiment with include:

Choosing the right LLMs for language translation.
Utilizing static analysis tools to verify logical correctness post-translation.
Supporting different programming paradigms and constructs.

This tool can greatly facilitate the migration of legacy codebases to more modern programming languages with less manual effort.

Conclusion

That wraps it up! I hope these project ideas spark your interest.

These suggestions provide a solid starting point for generating more innovative and practical ideas. Once you develop a functional application, you can explore additional avenues such as a financial statement analyzer or a personalized research assistant using RAG.

Feel free to adjust any part further or let me know if you need additional modifications!