Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
ModelScope Community Website
中文   |   English  
Paper   | English Documentation   |   中文文档  
Swift2.x En Doc   |   Swift2.x中文文档  
You can contact us and communicate with us by adding our group:
Discord Group | WeChat Group |
---|---|
![]() |
![]() |
🍲 ms-swift is an official framework provided by the ModelScope community for fine-tuning and deploying large language models and multi-modal large models. It currently supports the training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment of 450+ large models and 150+ multi-modal large models. These large language models (LLMs) include models such as Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, DeepSeek-R1, Yi1.5, TeleChat2, Baichuan2, and Gemma2. The multi-modal LLMs include models such as Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, and GOT-OCR2.
🍔 Additionally, ms-swift incorporates the latest training technologies, including lightweight techniques such as LoRA, QLoRA, Llama-Pro, LongLoRA, GaLore, Q-GaLore, LoRA+, LISA, DoRA, FourierFt, ReFT, UnSloth, and Liger, as well as human alignment training methods like DPO, GRPO, RM, PPO, KTO, CPO, SimPO, and ORPO. ms-swift supports acceleration of inference, evaluation, and deployment modules using vLLM and LMDeploy, and it supports model quantization with technologies like GPTQ, AWQ, and BNB. Furthermore, ms-swift offers a Gradio-based Web UI and a wealth of best practices.
Why choose ms-swift?
--use_lmdeploy true
. Please check this scriptsample
command, this is a very important feature for complex CoT and RFT. Meanwhile, we support an Reinforced Fine-tuning script.--infer_backend vllm/lmdeploy
.To install using pip:
pip install ms-swift -U
To install from source:
# pip install git+https://github.com/modelscope/ms-swift.git
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .
Running Environment:
Range | Recommended | Notes | |
---|---|---|---|
python | >=3.9 | 3.10 | |
cuda | cuda12 | No need to install if using CPU, NPU, MPS | |
torch | >=2.0 | ||
transformers | >=4.33 | 4.49 | |
modelscope | >=1.19 | ||
peft | >=0.11,<0.15 | ||
trl | >=0.13,<0.17 | 0.15 | RLHF |
deepspeed | >=0.14 | 0.14.5 | Training |
vllm | >=0.5.1 | 0.7.3 | Inference/Deployment/Evaluation |
lmdeploy | lmdeploy>=0.5,<0.6.5 | 0.6.4 | Inference/Deployment/Evaluation |
evalscope | >=0.11 | Evaluation |
For more optional dependencies, you can refer to here.
10 minutes of self-cognition fine-tuning of Qwen2.5-7B-Instruct on a single 3090 GPU:
# 22GB
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model Qwen/Qwen2.5-7B-Instruct \
--train_type lora \
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
'AI-ModelScope/alpaca-gpt4-data-en#500' \
'swift/self-cognition#500' \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--learning_rate 1e-4 \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--gradient_accumulation_steps 16 \
--eval_steps 50 \
--save_steps 50 \
--save_total_limit 5 \
--logging_steps 5 \
--max_length 2048 \
--output_dir output \
--system 'You are a helpful assistant.' \
--warmup_ratio 0.05 \
--dataloader_num_workers 4 \
--model_author swift \
--model_name swift-robot
Tips:
--dataset <dataset_path>
.--model_author
and --model_name
parameters are only effective when the dataset includes swift/self-cognition
.--model <model_id/model_path>
.--use_hf true
.After training is complete, use the following command to infer with the trained weights:
--adapters
should be replaced with the last checkpoint folder generated during training. Since the adapters folder contains the training parameter file args.json
, there is no need to specify --model
, --system
separately; Swift will automatically read these parameters. To disable this behavior, you can set --load_args false
.# Using an interactive command line for inference.
CUDA_VISIBLE_DEVICES=0 \
swift infer \
--adapters output/vx-xxx/checkpoint-xxx \
--stream true \
--temperature 0 \
--max_new_tokens 2048
# merge-lora and use vLLM for inference acceleration
CUDA_VISIBLE_DEVICES=0 \
swift infer \
--adapters output/vx-xxx/checkpoint-xxx \
--stream true \
--merge_lora true \
--infer_backend vllm \
--max_model_len 8192 \
--temperature 0 \
--max_new_tokens 2048
Finally, use the following command to push the model to ModelScope:
CUDA_VISIBLE_DEVICES=0 \
swift export \
--adapters output/vx-xxx/checkpoint-xxx \
--push_to_hub true \
--hub_model_id '<your-model-id>' \
--hub_token '<your-sdk-token>' \
--use_hf false
The Web-UI is a zero-threshold training and deployment interface solution based on Gradio interface technology. For more details, you can check here.
SWIFT_UI_LANG=en swift web-ui
ms-swift also supports training and inference using Python. Below is pseudocode for training and inference. For more details, you can refer to here.
Training:
# Retrieve the model and template, and add a trainable LoRA module
model, tokenizer = get_model_tokenizer(model_id_or_path, ...)
template = get_template(model.model_meta.template, tokenizer, ...)
model = Swift.prepare_model(model, lora_config)
# Download and load the dataset, and encode the text into tokens
train_dataset, val_dataset = load_dataset(dataset_id_or_path, ...)
train_dataset = EncodePreprocessor(template=template)(train_dataset, num_proc=num_proc)
val_dataset = EncodePreprocessor(template=template)(val_dataset, num_proc=num_proc)
# Train the model
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
data_collator=template.data_collator,
train_dataset=train_dataset,
eval_dataset=val_dataset,
template=template,
)
trainer.train()
Inference:
# Perform inference using the native PyTorch engine
engine = PtEngine(model_id_or_path, adapters=[lora_checkpoint])
infer_request = InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}])
request_config = RequestConfig(max_tokens=max_new_tokens, temperature=temperature)
resp_list = engine.infer([infer_request], request_config)
print(f'response: {resp_list[0].choices[0].message.content}')
Here is a minimal example of training to deployment using ms-swift. For more details, you can check the examples.
--model
to specify the corresponding model’s ID or path, and modify --dataset
to specify the corresponding dataset’s ID or path.--use_hf true
.Useful Links |
---|
🔥Command Line Parameters |
Supported Models and Datasets |
Custom Models, 🔥Custom Datasets |
LLM Tutorial |
Supported Training Methods:
Method | Full-Parameter | LoRA | QLoRA | Deepspeed | Multi-Node | Multi-Modal |
---|---|---|---|---|---|---|
Pre-training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Instruction Supervised Fine-tuning | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
DPO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
GRPO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Reward Model Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
PPO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
KTO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
CPO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
SimPO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
ORPO Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Classification Model Training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Embedding Model Training | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
Pre-training:
# 8*A100
NPROC_PER_NODE=8 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift pt \
--model Qwen/Qwen2.5-7B \
--dataset swift/chinese-c4 \
--streaming true \
--train_type full \
--deepspeed zero2 \
--output_dir output \
--max_steps 100000 \
...
Fine-tuning:
CUDA_VISIBLE_DEVICES=0 swift sft \
--model Qwen/Qwen2.5-7B-Instruct \
--dataset AI-ModelScope/alpaca-gpt4-data-en \
--train_type lora \
--output_dir output \
...
RLHF:
CUDA_VISIBLE_DEVICES=0 swift rlhf \
--rlhf_type dpo \
--model Qwen/Qwen2.5-7B-Instruct \
--dataset hjh0119/shareAI-Llama3-DPO-zh-en-emoji \
--train_type lora \
--output_dir output \
...
CUDA_VISIBLE_DEVICES=0 swift infer \
--model Qwen/Qwen2.5-7B-Instruct \
--stream true \
--infer_backend pt \
--max_new_tokens 2048
# LoRA
CUDA_VISIBLE_DEVICES=0 swift infer \
--model Qwen/Qwen2.5-7B-Instruct \
--adapters swift/test_lora \
--stream true \
--infer_backend pt \
--temperature 0 \
--max_new_tokens 2048
CUDA_VISIBLE_DEVICES=0 swift app \
--model Qwen/Qwen2.5-7B-Instruct \
--stream true \
--infer_backend pt \
--max_new_tokens 2048
CUDA_VISIBLE_DEVICES=0 swift deploy \
--model Qwen/Qwen2.5-7B-Instruct \
--infer_backend vllm
CUDA_VISIBLE_DEVICES=0 swift sample \
--model LLM-Research/Meta-Llama-3.1-8B-Instruct \
--sampler_engine pt \
--num_return_sequences 5 \
--dataset AI-ModelScope/alpaca-gpt4-data-zh#5
CUDA_VISIBLE_DEVICES=0 swift eval \
--model Qwen/Qwen2.5-7B-Instruct \
--infer_backend lmdeploy \
--eval_backend OpenCompass \
--eval_dataset ARC_c
CUDA_VISIBLE_DEVICES=0 swift export \
--model Qwen/Qwen2.5-7B-Instruct \
--quant_bits 4 --quant_method awq \
--dataset AI-ModelScope/alpaca-gpt4-data-zh \
--output_dir Qwen2.5-7B-Instruct-AWQ
CUDA_VISIBLE_DEVICES=0 swift export \
--model <model-path> \
--push_to_hub true \
--hub_model_id '<model-id>' \
--hub_token '<sdk-token>'
This framework is licensed under the Apache License (Version 2.0). For models and datasets, please refer to the original resource page and follow the corresponding License.
@misc{zhao2024swiftascalablelightweightinfrastructure,
title={SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning},
author={Yuze Zhao and Jintao Huang and Jinghan Hu and Xingjun Wang and Yunlin Mao and Daoze Zhang and Zeyinzi Jiang and Zhikai Wu and Baole Ai and Ang Wang and Wenmeng Zhou and Yingda Chen},
year={2024},
eprint={2408.05517},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2408.05517},
}