▮ Augmenting LLMs
One of the key questions that everyone has when utilizing LLMs is “How do we best augment LLMs with our own private data?”.
So for this post, I’d like to share the approach we can take and the tools we can utilize to achieve it.
▮ Methods
Fine-Tuning
One way to achieve this is “Fine-Tuning”. “Fine-Tuning” bakes your private data’s knowledge into the weights of the network. Before LLMs, most algorithms were optimized with this method. However, this method has the following setbacks when it comes to LLMs.
- Requires A Lot of effort to data preparation
- Lack of Transparency
In-Context-Learning
Another way to achieve this is “In-context-learning”. Instead of “pre-training” everything, you feed in your own private information via prompt to the LLM. This seems to become more mainstream compared to “fine-tuning” when it comes to LLMs, but this also have several setbacks.
- Input Token inevitably gets too long -> LLM’s input length are limited
- May be hard to find the right context for the prompt
- Hard to deal with unstructured/semi-structures/structured data
To overcome these drawbacks, we can use tools such as LlamaIndex.
▮ LlamaIndex
LlamaIndex provides an interface between your data and LLM. It consists of the following 3 components.
- Data Connectors: Connect your existing data source
- Data Indices: Structure your data
- Query Interface: Feed in an input prompt and obtain a knowledge-augmented output
Data Connectors
You can easily ingest any kind of data with just several lines of codes.
For the example below let’s connect our local data under the /book
directory
from llama_index import SimpleDirectoryReader
# Load documents from a directory
documents = SimpleDirectoryReader('book').load_data()
Data Indices & Query Interface
LlamaIndex provides a feature that ingests your connected data to abstract away common features for learning.
The data ingestion takes the following steps.
- Break your private data into “Chunks”
- Turn each chunk into Nodes(AKA INDEXES) with an associated embeddings information.
After the ingestion is complete, the query interface will..
- Receive Natural Language Query
- Convert Query into embeddings
- Get similarity_top_k nodes
- Synthesize response with the selected nodes
The overview of the whole work flow would look like the image below.
This workflow can be achieved with just 3 lines of code.
# Create an index from the previously ingested documents
index = VectorStoreIndex.from_documents(documents)
# Create a query engine from the index
query_engine = index.as_query_engine()
# Query the engine
response = query_engine.query("What is this text about?")
print(response)