How accurate are open source LLM models for text classification?

Insight from top 10 papers

Accuracy of Open Source LLM Models for Text Classification

Overview of Open Source LLMs

Open source Large Language Models (LLMs) have gained significant attention in the field of natural language processing, offering alternatives to proprietary models. These models vary in size and complexity, typically measured by billions of parameters. (Almeida & Caminha, 2024)

Entry-Level Open Source LLMs

Models with 7 to 14 billion parameters are considered entry-level and are suitable for simpler tasks like document classification and information extraction. (Almeida & Caminha, 2024)

Accuracy in Text Classification Tasks

Traditional Machine Learning Techniques

Several machine learning algorithms have been applied to detect LLM-generated text, which can be considered a form of text classification:

Random Forest

Effective for capturing complex patterns in text data. (Su & Wu, 2024)

Logistic Regression

Favored for its simplicity and interpretability. Performance metrics:

  • Precision: 0.86
  • Recall: 0.84
  • F1-Score: 0.85 (Su & Wu, 2024)

Gaussian Naive Bayes

Suited for scenarios with Gaussian-distributed features. Performance metrics:

  • Precision: 0.96
  • Recall: 0.81
  • F1-Score: 0.87 (Su & Wu, 2024)

Support Vector Machines (SVM)

Effective for high-dimensional data. Performance metrics:

  • Precision: 0.97
  • Recall: 0.97
  • F1-Score: 0.97 (Su & Wu, 2024)

LLM-Specific Detection Methods

DetectGPT

A zero-shot machine-generated text detection method using probability curvature. It generates minor perturbations of the original text and compares log probabilities. (Su & Wu, 2024)

Single-revise

A faster approach inspired by DetectGPT, utilizing LLMs for text detection. (Su & Wu, 2024)

Comparative Performance

Based on the provided data, SVM shows the highest accuracy for text classification tasks among traditional machine learning techniques. However, LLM-specific methods may offer more targeted approaches for detecting LLM-generated content. (Su & Wu, 2024)

Factors Affecting Accuracy

Model Size and Complexity

Larger models with more parameters generally perform better on complex tasks, while smaller models are suitable for simpler classification tasks. (Almeida & Caminha, 2024)

Data Quality and Preprocessing

The effectiveness of text classification can be influenced by data quality, including issues such as OCR errors in digitized documents. (Almeida & Caminha, 2024)

Feature Engineering

Techniques like Word2Vec for word embedding can significantly impact the performance of classification models. (Su & Wu, 2024)

Recent Advancements

Domain-Specific LLMs

Models like Medical mT5 have been developed for specific domains, potentially improving accuracy in specialized text classification tasks. (Kavi & Anne, 2024)

Ensemble Methods

Treating token generation as a classification task for ensembling has shown promise in improving performance beyond individual LLMs. (Yu et al., 2024)

Conclusion

Open source LLM models demonstrate varying levels of accuracy for text classification tasks. While traditional machine learning techniques like SVM show high accuracy, LLM-specific methods and ensemble approaches are pushing the boundaries of performance. The choice of model and technique depends on the specific task, data quality, and computational resources available. Continued research in this field is likely to further improve the accuracy and applicability of open source LLMs for text classification.

Source Papers (10)
Evaluation of Entry-Level Open-Source Large Language Models for Information Extraction from Digitized Documents
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling
Robust Detection of LLM-Generated Text: A Comparative Analysis
Enhanced Discriminative Fine-Tuning of Large Language Models for Chinese Text Classification
MedSyn: LLM-based Synthetic Medical Text Generation Framework
Large language models for extracting histopathologic diagnoses from electronic health records
Improving Medical Abstract Classification Using PEFT-LoRA Fine-Tuned Large and Small Language Models
Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy
A systematic evaluation of large language models for biomedical natural language processing: benchmarks, baselines, and recommendations
A Comparative Analysis of Privacy-Preserving Large Language Models For Automated Echocardiography Report Analysis