Skip to content

LLM customizations: Prompt Engineering, RAG & Fine-Tuning

In the rapidly evolving landscape of Large Language Models, knowing how to tailor LLMs to meet specific objectives is indispensable. These models are great at answering questions, generating texts and images. But how to make them respond in the tone and style you want and make them answer questions about facts that they are not trained on requires some tweaking.

We will discuss three techniques that allow you to tame these LLMs.

Prompt Engineering: Guiding the LLM’s response with instructions and sample data in the query i.e. the prompt.
Retrieval Augmented Generation (RAG): Improving LLM’s response by dynamically pulling in information from external sources.
Fine-Tuning : Training the model using your specific examples to refine and tailor its outputs.

Lets delve deeper into these techniques.

Prompt Engineering

A prompt can be very simple like:

Prompt: What are the biggest tourist sights in Tanzania?

A prompt can also be advanced like:

Prompt: You are a travel assistant. You explain things in a way that kids can understand. If the question is not about travel don't respond. Respond in a valid JSON object. What are the biggest tourist sights in Tanzania?

Lets is dissect this advanced prompt which has several components.

Priming the LLM: You are a travel assistant.
Tone instructions: You explain things in a way that kids can understand.
Edge case handling: If the question is not about travel don’t respond.
Output styling: Respond in a valid JSON object.
Actual query: What are the biggest tourist sights in Tanzania?

As you can see above, crafting a thoughtful prompt allows you to furnish LLMs with the essential context needed to generate the best possible responses. There are three prompting techniques:

Zero-Shot Prompting

Zero-shot prompting involves supplying a prompt not present in the model’s training data, yet the model can still produce the desired result. This impactful technique enhances the versatility of large language models, making them valuable across various tasks.

Prompt: Who wrote The Beatles song, "Here comes the sun"?

One-Shot Prompting

In one-shot prompts you provide a single example

Prompt: Tell me who wrote The Beatles song "Dear Prudence" in the following style: "Here comes the sun" was written by George Harrison.

Few-Shot Prompting

Here you provide several examples in the prompt helping the model adapt and respond.

Prompt: I like The Beatles songs "A Day in the Life", "Eleanor Rigby" and "Hey Bulldog". Which other Beatles songs would you recommend for me?

Retrieval Augmented Generation (RAG)

RAG is linking the LLMs with an external knowledge sources. It merges generative abilities with the capability to search for and integrate pertinent information from an external knowledge base. It can be likened to empowering the model with the ability to reference a book for information while responding to a question, rather than relying solely on its internal memory.

Here is how RAG works.

  1. Preparation
    • The external knowledge base is split into chunks of information.
    • These chunks are then converted into vectors (embeddings)
    • These embeddings are then saved into a vector database
  2. Retrieval
    • The user prompt is converted into a vector (embedding)
    • This embedding is searched in the vector db. A few (3-5) relevant chunks are retrieved.
    • User prompt and the retrieved chunks are combined into a new prompt and sent to the LLM.
    • LLM generates a response using the information in the chunks.


Fine-tuning involves refining a pre-trained LLM by subjecting it to additional training using a more targeted and specific dataset. This process is undertaken to adapt the model for a particular task or enhance its performance. Through fine-tuning, we make adjustments to the model’s weights based on our specific data, thereby customizing it to better suit the unique requirements of our application.

Fine-tuning can thought of as an extension to Few-Shot Prompting where you do not have to provide the examples in every prompt. You fine-tune the model once and then use it with simple prompts.

Here is how fine-tuning works

  1. Pre-training: Train a language model on a massive dataset. Or pick a pre-trained model like GPT-3.5 (gpt-3.5-turbo-1106)
  2. Prepare your domain specific dataset. For example: To fine-tuning an LLM to detect movie genre from the ploy summary create a dataset with labeled plot summaries. Split the dataset into two parts one part to train the model, the other to evaluate the model.
  3. Perform the actual tuning of the LLM with the training dataset.
  4. Iterate and evaluate: Test the fine-tuned LLM with the evaluate dataset. Measure the accuracy, precision and recall. If the results are not satisfactory perform more iterations. In these iterations you can expand the training dataset, refine the hyper-parameters, and adjust optimization methods

Comparing these techniques

Prompt EngineeringRetrieval Augmented
Generation (RAG)
Easy, IntuitiveBrings in external data sourcesControl the model’s behavior in style, tone and format
Ideal for quick experimentsModel becomes more dynamic and real timeModel outputs becomes more predictable

The three techniques we’ve explored are not mutually exclusive; you don’t have to choose one over the others. Consider them as three essential tools in your toolkit. Depending on the specific problem you’re tackling, you can select the most fitting tool for the task at hand. Certain challenges might benefit from the use of two tools, while others may necessitate the application of all three.

When you want to control the model’s behavior in style, tone and format you can use Prompt Engineering or Fine-Tuning or both.

When you want to enhance the model’s knowledge with external data sources, you can use Retrieval Augmented Generation or Fine-Tuning or both

When you to restrict the context window of the model’s output, use Prompt Engineering or Retrieval Augmented Generation or both.

For many use-cases all three three techniques can be used for producing better outputs.