How to Fine-Tune LLMs with Kubeflow

Overview of LLM fine-tuning API in Training Operator

This page describes how to use a train API from Training Python SDK that simplifies the ability to fine-tune LLMs with distributed PyTorchJob workers.

If you want to learn more about how the fine-tuning API fit in the Kubeflow ecosystem, head to explanation guide.

Prerequisites

You need to install Training Python SDK with fine-tuning support to run this API.

How to use Fine-Tuning API ?

You need to provide the following parameters to use the train API:

  • Pre-trained model parameters.
  • Dataset parameters.
  • Trainer parameters.
  • Number of PyTorch workers and resources per workers.

For example, you can use train API as follows to fine-tune BERT model using Yelp Review dataset from HuggingFace Hub:

import transformers
from peft import LoraConfig

from kubeflow.training import TrainingClient
from kubeflow.storage_initializer.hugging_face import (
    HuggingFaceModelParams,
    HuggingFaceTrainerParams,
    HuggingFaceDatasetParams,
)

TrainingClient().train(
    name="fine-tune-bert",
    # BERT model URI and type of Transformer to train it.
    model_provider_parameters=HuggingFaceModelParams(
        model_uri="hf://google-bert/bert-base-cased",
        transformer_type=transformers.AutoModelForSequenceClassification,
    ),
    # Use 3000 samples from Yelp dataset.
    dataset_provider_parameters=HuggingFaceDatasetParams(
        repo_id="yelp_review_full",
        split="train[:3000]",
    ),
    # Specify HuggingFace Trainer parameters. In this example, we will skip evaluation and model checkpoints.
    trainer_parameters=HuggingFaceTrainerParams(
        training_parameters=transformers.TrainingArguments(
            output_dir="test_trainer",
            save_strategy="no",
            evaluation_strategy="no",
            do_eval=False,
            disable_tqdm=True,
            log_level="info",
        ),
        # Set LoRA config to reduce number of trainable model parameters.
        lora_config=LoraConfig(
            r=8,
            lora_alpha=8,
            lora_dropout=0.1,
            bias="none",
        ),
    ),
    num_workers=4, # nnodes parameter for torchrun command.
    num_procs_per_worker=2, # nproc-per-node parameter for torchrun command.
    resources_per_worker={
        "gpu": 2,
        "cpu": 5,
        "memory": "10G",
    },
)

After you execute train, Training Operator will orchestrate appropriate PyTorchJob resources to fine-tune LLM.

Next Steps

Feedback

Was this page helpful?


Last modified June 12, 2024: fix a minor typo (#3759) (b81d9b4)