5.4 C
United States of America
Wednesday, January 29, 2025

How you can Superb-Tune Phi-4 Regionally?


Unlocking the ability of domain-specific Massive Language Fashions like Microsoft Phi-4 requires the flexibility to fine-tune these fashions for specialised duties. Superb-tuning Phi-4 on customized datasets helps tailor the mannequin to carry out optimally in particular domains, reminiscent of buyer assist, medical recommendation, or technical documentation. By leveraging LoRA (Low-Rank Adaptation) adapters, this course of turns into extra environment friendly, permitting for quicker coaching and lowered useful resource consumption. This information will stroll you thru the important steps to fine-tune Phi-4 utilizing LoRA adapters, combine the mannequin with Hugging Face for simple sharing, and apply the most recent strategies to get probably the most out of your customized LLM.

Studying Targets

  • Discover ways to fine-tune Microsoft Phi-4 utilizing LoRA adapters for domain-specific duties.
  • Perceive the setup course of and configuration for loading Phi-4 effectively with 4-bit quantization.
  • Acquire proficiency in making ready datasets and reworking them for fine-tuning with Hugging Face and unsloth.
  • Grasp coaching strategies utilizing Hugging Face’s SFTTrainer to optimize mannequin efficiency.
  • Discover the way to monitor GPU utilization and save/add fine-tuned fashions to Hugging Face for deployment.

This text was revealed as part of the Knowledge Science Blogathon.

Conditions

Earlier than diving into fine-tuning Phi-4, guarantee you have got the mandatory instruments and setting configured. This consists of putting in Python 3.8+, PyTorch with CUDA assist for GPU acceleration, and the unsloth library, together with Hugging Face Transformers and Datasets for seamless dataset dealing with and mannequin integration. Having these stipulations in place will guarantee a easy and environment friendly fine-tuning course of.

Guarantee you have got the next put in:

  • Python 3.8+
  • PyTorch (with CUDA assist for GPU acceleration)
  • unsloth
  • Hugging Face Transformers and Datasets

Set up the required libraries with:

pip set up unsloth
pip set up --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

Superb-Tuning Phi-4: A Step-by-Step Information

This part covers all of the important steps concerned in fine-tuning Microsoft Phi-4, from organising the setting to pushing the fine-tuned mannequin to Hugging Face. It consists of configuring the mannequin, making ready the dataset, coaching, monitoring GPU utilization, producing responses, and saving/importing the mannequin.

Step 1: Setting Up the Mannequin

Under we can be organising the mannequin by loading the mannequin and importing the dependencies:

Load the Mannequin with LoRA Adapters

LoRA adapters allow parameter-efficient fine-tuning by coaching solely a small subset of mannequin parameters.

Importing Dependencies

from unsloth import FastLanguageModel
import torch
  • FastLanguageModel: A utility class from the unsloth library for working with language fashions, together with loading and fine-tuning.
  • torch: PyTorch library for deep studying operations, offering GPU acceleration.

Configuration Settings

max_seq_length = 2048
load_in_4bit = True
  • max_seq_length: Specifies the utmost size of enter sequences. Fashions like Phi-4 are designed to deal with lengthy sequences effectively, making this significant.
  • load_in_4bit: This setting hundreds the mannequin with 4-bit quantization, decreasing reminiscence utilization and enhancing inference velocity.

Loading the Phi-4 Mannequin

mannequin, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Phi-4",
    max_seq_length=max_seq_length,
    load_in_4bit=load_in_4bit,
)
  • model_name: Refers back to the pre-trained Phi-4 mannequin hosted by unsloth.
  • from_pretrained: Downloads and initializes the mannequin and tokenizer with the required configurations.
Loading the Phi-4 Model

Making use of LoRA Adapters

mannequin = FastLanguageModel.get_peft_model(
    mannequin,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)
Applying LoRA Adapters
  • get_peft_model: A way to combine LoRA adapters into the mannequin for parameter-efficient fine-tuning.
  • r=16: Units the rank of the LoRA layers, controlling the dimensionality of the extra trainable parameters.
  • target_modules: Specifies the mannequin layers the place LoRA adapters can be utilized. These layers correspond to key elements of the mannequin’s transformer structure.
  • lora_alpha: A scaling issue for the LoRA layers to stabilize coaching.
  • lora_dropout: Dropout likelihood for regularization; set to 0 for no dropout.
  • bias=”none”: Signifies that no extra bias phrases are launched.
  • use_gradient_checkpointing: Prompts gradient checkpointing to scale back reminiscence utilization throughout backpropagation.
  • random_state=3407: Ensures reproducibility by fixing the random seed.

Step 2: Getting ready the Dataset

We use the FineTome-100k dataset in ShareGPT format. The unsloth library gives utilities to transform this format into Hugging Face’s generic format for multi-turn conversations.

Load the Dataset

from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt, get_chat_template

dataset = load_dataset("mlabonne/FineTome-100k", break up="practice")
Preparing the Dataset

The Hugging Face’s datasets library hundreds the mlabonne/FineTome-100k dataset and ensures that solely the coaching break up is loaded with the break up=”practice” argument.

Standardize the Dataset

dataset = standardize_sharegpt(dataset)

The standardize_sharegpt perform from the unsloth.chat_templates module standardizes the dataset to the ShareGPT format. This ensures that the dataset adheres to the anticipated format for multi-turn conversations.

Apply Phi-4 Chat Template

tokenizer = get_chat_template(tokenizer, chat_template="phi-4")

The get_chat_template perform customizes the tokenizer to make use of the “phi-4” chat template. This ensures the prompts and conversations align with Phi-4’s format.

Format Prompts for Coaching

def formatting_prompts_func(examples):
    texts = [
        tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False)
        for convo in examples["conversations"]
    ]
    return {"textual content": texts}

The formatting_prompts_func processes every instance within the dataset:

  • The examples[“conversations”] subject comprises dialog information.
  • Every dialog (convo) is handed by tokenizer.apply_chat_template.
  • tokenize=False ensures the output is just not tokenized but.
  • add_generation_prompt=False avoids appending generation-specific tokens to the prompts at this stage.
  • The formatted textual content is saved underneath the “textual content” subject.

Map Operate to Dataset

dataset = dataset.map(formatting_prompts_func, batched=True)
Map Function to Dataset

The map perform applies formatting_prompts_func to the whole dataset in batches. This effectively preprocesses the dataset to arrange it for fine-tuning.

We take a look at how the conversations are structured for merchandise 5:

dataset[5]["conversations"]
output

Step 3: Superb-Tuning the Mannequin

Superb-tuning the Mannequin entails coaching Phi-4 with Hugging Face’s SFTTrainer, optimizing the method with customized settings and environment friendly information dealing with.

Coaching with SFTTrainer

We use Hugging Face’s SFTTrainer to coach the mannequin. Under is a minimal setup for environment friendly coaching:

from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

Masking Person Inputs

To coach solely on assistant responses, we masks person inputs utilizing the train_on_responses_only utility:

from unsloth.chat_templates import train_on_responses_only

coach = train_on_responses_only(
    coach,
    instruction_part="<|im_start|>person<|im_sep|>",
    response_part="<|im_start|>assistant<|im_sep|>",
)
  • SFTTrainer: A specialised coach for supervised fine-tuning of language fashions.
  • TrainingArguments: Defines coaching hyperparameters reminiscent of batch dimension, studying fee, and variety of steps.
  • DataCollatorForSeq2Seq: Prepares enter information for sequence-to-sequence fashions.
  • is_bfloat16_supported: Checks if the system helps bfloat16, a mixed-precision format.
coach = SFTTrainer(
    mannequin=mannequin,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="textual content",
    max_seq_length=max_seq_length,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=30,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        output_dir="outputs",
        report_to="none",
    ),
)

Coach Initialization:

  • mannequin and tokenizer: The language mannequin and its tokenizer are handed in, usually pre-configured.
  • train_dataset: The dataset used for coaching, preprocessed and tokenized earlier.
  • dataset_text_field: Specifies the sector within the dataset containing the textual content.
  • max_seq_length: The utmost sequence size for tokenized inputs.
  • data_collator: Ensures enter information is correctly batched and padded.
  • dataset_num_proc: Parallelizes dataset processing for effectivity.

Coaching Arguments:

  • per_device_train_batch_size: Batch dimension for every gadget throughout coaching (set to 2 right here).
  • gradient_accumulation_steps: Simulates a bigger batch dimension by accumulating gradients over a number of steps.
  • warmup_steps: Steps for studying fee warmup, serving to stabilize coaching.
  • max_steps: Whole variety of coaching steps (30 right here, indicating a brief coaching run).
  • learning_rate: Studying fee for the optimizer.
  • fp16 and bf16: Allow combined precision (FP16 or BF16) based mostly on {hardware} assist for quicker and memory-efficient coaching.
  • logging_steps: Frequency of logging throughout coaching.
  • optim: Optimizer selection; adamw_8bit reduces reminiscence utilization.
  • weight_decay: Regularization parameter to forestall overfitting.
  • output_dir: This listing saves the mannequin checkpoints and logs.
  • report_to: Disables reporting to exterior monitoring instruments (e.g., WandB).

Goal:

This setup effectively fine-tunes a big mannequin on a customized dataset, specializing in:

  • Reminiscence optimization (e.g., combined precision, 8-bit optimizers).
  • Environment friendly coaching configurations with a small batch dimension and gradient accumulation.
  • Brief, light-weight coaching for fast experimentation or area adaptation.

We are able to additionally use Unsloth’s train_on_completions methodology to solely practice on the assistant outputs and ignore the loss on the person’s inputs.

from unsloth.chat_templates import train_on_responses_only

coach = train_on_responses_only(
    coach,
    instruction_part="<|im_start|>person<|im_sep|>",
    response_part="<|im_start|>assistant<|im_sep|>",
)

Let’s confirm masking is definitely finished:

tokenizer.decode(coach.train_dataset[5]["input_ids"])
tokenizer output
house = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])
fine tune model

Step 4: Monitoring GPU Utilization

Verify GPU reminiscence utilization earlier than and after coaching:

import torch

gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = spherical(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = spherical(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.identify}. Max reminiscence = {max_memory} GB.")
print(f"{start_gpu_memory} GB of reminiscence reserved.")
Monitoring GPU Usage: Fine-Tune Phi-4

Step 5: Inference

Generate responses utilizing the fine-tuned mannequin:

Defining the Enter Messages:

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
tokenizer,
chat_template = "phi-4",
)
FastLanguageModel.for_inference(mannequin) # Allow native 2x quicker inference
messages = [
    {"role": "user", "content": "Continue the Fibonacci sequence: 1, 1, 2, 3, 5, 8,"},
]

The enter is structured as a listing of message dictionaries. Every dictionary specifies the position (e.g., “person”) and the content material (e.g., the person’s question).

This strategy helps multi-turn conversations, aligning with the mannequin’s chat-based performance

Preprocessing Inputs with the Tokenizer

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to("cuda")
  • apply_chat_template: Prepares the enter for the Phi-4 mannequin utilizing the tokenizer and ensures compatibility with the chat format.

Parameters:

  • tokenize=True: Converts textual content into token IDs.
  • add_generation_prompt=True: Provides a particular immediate token to information the mannequin’s response era.
  • return_tensors=”pt”: Converts the processed information into PyTorch tensors for GPU processing.
  • .to(“cuda”): Strikes the information to the GPU for accelerated computation.

Producing Textual content:

outputs = mannequin.generate(
    input_ids=inputs, max_new_tokens=64, use_cache=True, temperature=1.5, min_p=0.1
)

Parameters:

  • input_ids=inputs: The tokenized enter.
  • max_new_tokens=64: Limits the size of the generated output to 64 tokens.
  • use_cache=True: Quickens era by utilizing cached activations.
  • temperature=1.5: Controls randomness in output (larger values = extra inventive, much less deterministic).
  • min_p=0.1: Ensures variety by setting a minimal likelihood threshold for token sampling.

Decoding and Displaying the Output:

print(tokenizer.batch_decode(outputs))
Decoding and Displaying the Output Fine-Tune Phi-4
  • Decodes the generated token IDs again into human-readable textual content utilizing the tokenizer.
  • We use batch_decode as a result of the outputs may include a number of sequences.
FastLanguageModel.for_inference(mannequin) # Allow native 2x quicker inference

messages = [
{"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Should add for era
return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = mannequin.generate(
input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
use_cache = True, temperature = 1.5, min_p = 0.1
Output

Step 6: Saving and Importing the Mannequin

Save Regionally or Push to Hugging Face:

mannequin.save_pretrained("lora_model")
tokenizer.save_pretrained("lora_model")
Saving and Uploading the Model Fine-Tune Phi-4

To add to Hugging Face:

mannequin.push_to_hub_merged("hf/mannequin", tokenizer, save_method="lora", token="<your_hf_token>")

This code pushes a mannequin to the Hugging Face Hub, utilizing the LoRA methodology for environment friendly saving, and it additionally consists of the related tokenizer. You would wish a sound Hugging Face authentication token (<your_hf_token>) to execute the motion efficiently.

Conclusion

Superb-tuning Microsoft Phi-4 domestically and pushing it to Hugging Face permits builders to create extremely specialised fashions effectively. With instruments like Unsloth, LoRA Adapters, and Hugging Face, the method turns into accessible and scalable. Strive it out together with your dataset at this time!

Key Takeaways

  • Superb-tuning Microsoft Phi-4 with LoRA adapters optimizes domain-specific efficiency whereas saving computational sources.
  • The Unsloth library simplifies the method of integrating LoRA adapters and dealing with Hugging Face datasets.
  • Environment friendly dataset transformation and tokenization are vital for making ready information for fine-tuning Phi-4 on customized duties.
  • Coaching with Hugging Face’s SFTTrainer and superior settings permits for quick, memory-efficient fine-tuning.
  • Importing fine-tuned fashions to Hugging Face allows straightforward sharing and deployment for specialised functions.

Regularly Requested Questions

Q1. What’s Microsoft Phi-4, and why fine-tune it?

A. Microsoft Phi-4 is a big language mannequin (LLM) optimized for language understanding and era duties. Superb-tuning it on a customized dataset allows domain-specific efficiency, tailoring the mannequin to specialised functions reminiscent of customer support, technical documentation, or area of interest industries.

Q2. What are LoRA adapters, and why are they used right here?

A. LoRA (Low-Rank Adaptation) adapters permit environment friendly fine-tuning by coaching solely a subset of mannequin parameters as an alternative of the whole mannequin. This reduces computational necessities and reminiscence utilization, making it perfect for giant fashions like Phi-4.

Q3. Which libraries and instruments are required for fine-tuning Phi-4?

A. Key necessities embody Python 3.8+, PyTorch with CUDA assist, the unsloth library for streamlined workflows, and Hugging Face Transformers and Datasets for dataset dealing with and coaching.

This fall. How do I deal with datasets for fine-tuning?

A. Use a dataset like FineTome-100k in ShareGPT format. Convert and standardize the dataset utilizing unsloth utilities to make sure compatibility with Hugging Face’s multi-turn dialog template.

Q5. How can I push my fine-tuned mannequin to Hugging Face?

A. Save your fine-tuned mannequin and tokenizer domestically, then use the .push_to_hub_merged() methodology from unsloth to add the mannequin and tokenizer to Hugging Face together with your authentication token

The media proven on this article is just not owned by Analytics Vidhya and is used on the Creator’s discretion.

Hello there! I’m Himanshu a Knowledge Scientist at KPMG, and I’ve a deep ardour for information all the pieces from crunching numbers to discovering patterns that inform a narrative. For me, information is extra than simply numbers on a display screen; it’s a device for discovery and perception. I’m all the time excited by the potential of what information can reveal and the way it can resolve real-world issues.

However it’s not simply information that grabs my consideration. I like exploring new issues, whether or not that’s studying a brand new talent, experimenting with new applied sciences, or diving into matters outdoors my consolation zone. Curiosity drives me, and I’m all the time in search of contemporary challenges that push me to suppose in a different way and develop. At coronary heart, I consider there’s all the time extra to study, and I’m on a relentless journey to broaden my information and perspective.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles