Bijo Benjamin Thomas, 2025. "Efficient Fine-Tuning Techniques for Transformer-Based NLP Models" ESP International Journal of Advancements in Computational Technology (ESP-IJACT) Volume 3, Issue 3: 1-7.
Transformer-based language models have achieved state-of-the-art performance on NLP classification tasks, but full fine-tuning of all model parameters is resource-intensive. This article surveys efficient alternatives to full fine-tuning for smaller transformer models (e.g., BERT-base or similar) in classification settings. We compare parameter-efficient tuning methods (such as adapter modules and LoRA), model pruning (e.g., removing transformer layers), and quantization. We discuss how each technique affects training and inference speed, memory footprint, and model performance. Experiments on representative classification tasks (such as sentiment analysis and topic classification) illustrate that these methods can dramatically reduce computational requirements with minimal loss in prediction performance compared to full fine-tuning. Based on the results, we offer recommendations for deploying transformer models under resource constraints.
[1] Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-Efficient Transfer Learning for NLP. Proceedings of the 36th International Conference on Machine Learning. https://arxiv.org/abs/1902.00751
[2] Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv preprint arXiv:2106.09685. https://ar5iv.labs.arxiv.org/html/2106.09685
[3] Pfeiffer, J., & Vulić, I. (2021). GitHub issue: "Why I would use the houlsby adapter instead of the pfeiffer one?" Adapter-Hub/adapter-transformers. https://github.com/Adapter-Hub/adapter-transformers/issues/168
[4] He, S., Chen, W., Li, X., & Li, L. (2023). Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5, 861-873. https://www.nature.com/articles/s42256-023-00626-4
[5] Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. https://arxiv.org/abs/1910.01108
[6] Sanh, V., Wolf, T., & Rush, A. M. (2020). Movement Pruning: Adaptive Sparsity by Fine-Tuning. GitHub - huggingface/block_movement_pruning. https://github.com/huggingface/block_movement_pruning
[7] Chen, T., Jiang, N., Li, B., Gómez, A. N., Grefenstette, E., & Theodoridis, S. (2023). ZipLM: Inference-Aware Structured Pruning of Language Models. arXiv preprint arXiv:2302.04089. https://arxiv.org/pdf/2302.04089
[8] Michel, P., Levy, O., & Neubig, G. (2019). Are Sixteen Heads Really Better than One? arXiv preprint arXiv:1905.10650. https://arxiv.org/pdf/1910.06188
[9] Zafrir, O., Boudoukh, G., Izsak, P., & Wasserblat, M. (2019). Q8BERT: Quantized 8Bit BERT. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2957-2966. https://aclanthology.org/2021.emnlp-main.627.pdf
Fine-tuning, Parameter-efficient methods, Transformers, Model pruning, Quantization.