Phi3-mini model fine-tuned on the python_code_instructions_18k_alpaca Code instructions dataset using LoRA or QLoRA with PEFT and bitsandbytes library.
Notebook: phi3-finetune-lora-pycoder.ipynb
Notebook: phi3-finetune-qlora-pycoder.ipynb
Our goal is to fine-tune the pretrained model, Phi3-mini, a LLM with 3.8B parameters, using both the PEFT method, and LoRA or a 4-bit quantization QLoRA to produce a Python coder. Then we evaluate the performance of both models. We fine-tune the model using a NVIDIA A100 GPU to get better performance. Alternatively, you can try out to run the fine-tuning on for example a T4 in Google Colab by adjusting some parameters (like batch size) to reduce memory consumption.
For our fine-tuning process, we use this dataset that contains about 18,000 examples where the model is asked to build a Python code that solves a given task. This is an subset of this other original dataset from which only the Python language examples are selected. Each row contains the description of the task to be solved, an example of data input to the task if applicable, and the generated code fragment that solves the task is provided.
The Phi-3-Mini-4k-Instruct is a 3.8B parameters, lightweight, state-of-the-art open LLM trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Mini version (3.8B parameters) in two variants of the context length (in tokens) that it can support: 4K and 128K.
The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. When assessed against benchmarks testing common sense, language understanding, math, code, long context and logical reasoning, Phi-3 Mini-4k-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "alexrodpas/phi3-mini-4k-qlora-pycode-18k"
device_map = "cuda"
tokenizer = AutoTokenizer.from_pretrained(model_id,trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype="auto", device_map=device_map)
input="'Create a function to calculate the sum of a sequence of integers.\n Input: [1, 2, 3, 4, 5]'"
# Create the pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
# Prepare the prompt or input to the model
prompt = pipe.tokenizer.apply_chat_template([{"role": "user", "content": input}], tokenize=False, add_generation_prompt=True)
# Run the pipe to get the answer
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, num_beams=1, temperature=0.3, top_k=50, top_p=0.95,
max_time= 180)
print(outputs[0]['generated_text'][len(prompt):].strip())