Discord LLM Bot


Retrieval Augmented Genration (RAG) powered Discord Bot that works seamlessly on CPU. Powered by LanceDB and Llama.cpp. This Discord bot is designed to helps answer questions based on a knowledge base (vector db).

graph LR
    A((User Query)) --> B((Convert to Embedding))
    B --> C((Find Similar Document<br>from Vector Database))
    C --> D((Use Retrieved Document<br>as Context to Answer Question<br>using Mistral 7B LLM))

LLM Inference


Large Language Model (LLM) Inference API and Chatbot 🦙

Build and run LLM Chatbot under 7 GB GPU memory in 5 lines of code.

from llm_chain import LitGPTConversationChain, LitGPTLLM
from llm_inference import prepare_weights

path = str(prepare_weights("meta-llama/Llama-2-7b-chat-hf"))
llm = LitGPTLLM(checkpoint_dir=path, quantize="bnb.nf4")  # 7GB GPU memory
bot = LitGPTConversationChain.from_llm(llm=llm, prompt=llama2_prompt_template)

print(bot.send("hi, what is the capital of France?"))



An open-source AutoML Library based on PyTorch

model training image



A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.