WhiteLightning distills massive, state-of-the-art language models into lightweight, hyper-efficient text classifiers. It's a command-line tool that lets you create specialized models that run anywhere—from the cloud to the edge—using the universal ONNX format for maximum compatibility.
The LLM Distillation Tool
WhiteLightning distills massive, state-of-the-art language models into lightweight, hyper-efficient text classifiers. It’s a command-line tool that lets you create specialized models that run anywhere—from the cloud to the edge—using the universal ONNX format for maximum compatibility.
We use large, powerful frontier models as “teachers” to train much smaller, task-specific “student” models. WhiteLightning automates this process for text classification, allowing you to create high-performance classifiers with a fraction of the computational footprint.
WhiteLightning exports every trained model to ONNX (Open Neural Network Exchange). This standard format makes your models instantly portable. Run them natively in Python, JavaScript, C++, Rust, Java, and more, ensuring total flexibility for any project. Learn more at onnx.ai.
WhiteLightning is designed as a “generic” Docker image that works seamlessly across macOS, Linux, and Windows with identical commands:
--user
flags or platform-specific commandsdocker run
command works everywhereClone the repository:
git clone https://github.com/Inoxoft/whitelightning.git
cd whitelightning
Get an OpenRouter API key at openrouter.ai/settings/keys.
Run the Docker image:
Mac:
docker run --rm \
-v "$(pwd)":/app/models \
-e OPEN_ROUTER_API_KEY="YOUR_OPEN_ROUTER_KEY_HERE" \
ghcr.io/inoxoft/whitelightning:latest \
python -m text_classifier.agent \
-p "Categorize customer reviews as positive, neutral, or negative"
Linux:
docker run --rm \
-v "$(pwd)":/app/models \
-e OPEN_ROUTER_API_KEY="YOUR_OPEN_ROUTER_KEY_HERE" \
ghcr.io/inoxoft/whitelightning:latest \
python -m text_classifier.agent \
-p "Categorize customer reviews as positive, neutral, or negative"
Windows (PowerShell):
docker run --rm \
-v "${PWD}:/app/models" \
-e OPEN_ROUTER_API_KEY="YOUR_OPEN_ROUTER_KEY_HERE" \
ghcr.io/inoxoft/whitelightning:latest \
python -m text_classifier.agent \
-p "Categorize customer reviews as positive, neutral, or negative"
That’s it! You’ll see the generation process in your terminal.
When it’s finished, list the files in your directory (ls -l
). You’ll find all the assets for your new model, ready to go:
🎮 Try your trained model right here: WhiteLightning Playground
NEW! Skip LLM data generation and train directly on your existing datasets. WhiteLightning automatically analyzes your data structure and creates optimized models from real domain data.
# Create folder for your data
mkdir own_data
cp your_dataset.csv own_data/
# Train on your data (faster, cheaper, more accurate!)
docker run --rm \
-v "$(pwd)":/app/models \
-e OPEN_ROUTER_API_KEY="YOUR_OPEN_ROUTER_KEY_HERE" \
ghcr.io/inoxoft/whitelightning:latest \
python -m text_classifier.agent \
-p "Categorize customer reviews as positive, neutral, or negative" \
--use-own-dataset="/app/models/own_data/your_dataset.csv"
Benefits:
config.json # Configuration and analysis
training_data.csv # Generated training data
edge_case_data.csv # Challenging test cases
model.onnx # ONNX model file
model_scaler.json # StandardScaler parameters
model_vocab.json # TF-IDF vocabulary
See our Complete Documentation for guides on how to use these files in your language of choice (C++, Rust, iOS, Android, and more).
The power of WhiteLightning is the -p
(prompt) argument. You can create a classifier for almost anything just by describing it. Here are some ideas to get you started:
Spam Filter:
-p "Classify emails as 'spam' or 'not_spam'"
Topic Classifier:
-p "Determine if a news headline is about 'tech', 'sports', 'world_news', or 'finance'"
Toxicity Detector:
-p "Detect whether a user comment is 'toxic' or 'safe'"
Urgency Detection:
-p "Categorize a support ticket's urgency as 'high', 'medium', or 'low'"
Intent Recognition:
-p "Classify the user's intent as 'book_flight', 'check_status', or 'customer_support'"
The possibilities are endless. For more inspiration and advanced prompt engineering techniques, check out our Complete Documentation.
Don’t want to manually construct Docker commands? Use our Interactive Command Generator to build your personalized WhiteLightning commands with a user-friendly interface:
Features:
Perfect for:
Want to test your ONNX models across multiple programming languages? Check out our WhiteLightning Test Framework - a comprehensive cross-language testing suite that validates your models in:
Perfect for ensuring your WhiteLightning models work consistently across all target platforms and deployment environments.
Need comprehensive guides and documentation? Check out our WhiteLightning Site - this repository hosts the official website for WhiteLightning at https://whitelightning.ai, a cutting-edge LLM distillation tool with detailed documentation, tutorials, and implementation guides.
Looking for pre-trained models or want to share your own? Visit our WhiteLightning Model Library - a centralized repository for uploading, downloading, and managing trained machine learning models. Perfect for sharing community contributions and accessing ready-to-use classifiers.
Train your models directly in GitHub Actions! This repository includes a pre-configured workflow that lets you:
To use:
2. Go to Actions → “Test Model Training” → “Run workflow”"
3. Customize training parameters or use defaults
4. Download generated models from the workflow artifacts
Perfect for teams, CI/CD pipelines, or when you need cloud-based model training!
File Permissions:
WhiteLightning automatically handles all file permission issues across platforms. Generated files will have correct ownership on your host system without any additional configuration.
Windows Path Issues:
Use PowerShell and ${PWD}
instead of $(pwd)
in your commands.
Container Access Issues:
If you encounter any Docker-related issues, ensure Docker is running and you have proper permissions to run Docker commands.
Want to build from source or customize the Docker image? Check out the Local Setup Guide.
We welcome all contributions! The best way to start is by joining our Discord Server and chatting with the team. We’re happy to help you get started.
This project is licensed under the GPLv3 License - see the LICENSE file for details.
# Basic usage (automatic activation detection)
python text_classifier/agent.py -p "Classify movie reviews as positive, negative, or neutral"
# Using your own dataset (automatic detection)
python text_classifier/agent.py -p "Emotion classifier" --use-own-dataset=data/emotions.csv
# Override activation function (advanced users)
python text_classifier/agent.py -p "Emotion classifier" --use-own-dataset=data/emotions.csv --activation sigmoid
# Available activation options
--activation auto # Smart automatic detection (default)
--activation sigmoid # For multi-label classification
--activation softmax # For single-label classification
Sigmoid (--activation sigmoid
):
"action,comedy,drama"
Softmax (--activation softmax
):
"positive"
OR "negative"
OR "neutral"
Auto (--activation auto
):