Beimingwu is the first systematic open-source implementation of the learnware dock system, providing a preliminary research platform for learnware studies and enabling effective learnware search and reuse without building machine learning models from scratch.
Learnware was proposed by Professor Zhi-Hua Zhou in 2016 [1, 2]. In the learnware paradigm, developers worldwide can share models with the learnware dock system, which effectively searches for and reuse learnware(s) to help users solve machine learning tasks efficiently without starting from scratch.
Beimingwu is the first systematic open-source implementation of learnware dock system, providing a preliminary research platform for learnware studies. Developers worldwide can submit their models freely to the learnware dock. They can generate specifications for the model with the help of Beimingwu without disclosing their raw data, and then the model and specification can be assembled into a learnware, which will be accommodated in the learnware dock. Future users can solve their tasks by submitting their requirements and reusing helpful learnwares returned by Beimingwu, while also not disclosing their own data. It is anticipated that after Beimingwu accumulates millions of learnwares, an “emergent” behavior may occur: machine learning tasks that have never been specifically tackled may be solved by assembling and reusing some existing learnwares.
A learnware is a well-performed trained model with a specification that describes its capabilities, enabling it to be readily identified and reused in the future based on user requirements. The specification includes a semantic specification in text and a statistical specification sketching the model’s statistical information.
[1] Zhi-Hua Zhou. Learnware: on the future of machine learning. Frontiers of Computer Science, 2016, 10(4): 589–590
[2] Zhi-Hua Zhou. Machine Learning: Development and Future. Communications of CCF, 2017, vol.13, no.1 (2016 CNCC keynote)
As shown in the diagram below, the Beimingwu learnware dock system, serving as a preliminary research platform for learnware, systematically implements the core processes of the learnware paradigm for the first time:
In addition, the Beimingwu system also has the following features:
learnware
Python package, supporting various data types (tables, images, and text) for efficient local generation.learnware
Python package, facilitating users’ convenient deployment and reuse of arbitrary learnwares.learnware
Python package and frontend/backend code. The learnware
package is highly extensible, making it easy to integrate new specification designs, learnware system designs, and learnware reuse methods in the future.As depicted in the figure below, Beimingwu’s architecture consists of four hierarchical layers, from the learnware storage layer to the user interaction layer, systematically implementing the learnware paradigm for the first time from the ground up.
The functionalities of the four layers are described as follows:
Learnware Storage Layer
: Manage the storage of learnwares in zip packages and provides access to them through the learnware database.Core Engine Layer
: Encompass all processes within the learnware paradigm, including learnware uploading, searching, reusing, and deployment, and operate independently of the backend and frontend, offering rich algorithmic interfaces for learnware-related tasks and research experiments.System Backend Layer
: Enable industrial-level deployment of Beimingwu, offering stable online deployment and providing extensive backend APIs for frontend and client interactions.User Interface Layer
: Comprise a web-based frontend and a command-line client for user convenience and interaction.Based on the system architecture, Beimingwu is developed with five sub-projects:
Engine
: Encompassing core components and algorithms within the learnware paradigm, and providing a command-line client for user interaction, it has been releasead as the learnware package.Frontend
: Provide the interface and functionality for user interaction with the learnware dock system, including the main system and administrator system.Backend
: Responsible for handling the dock system’s operation logic and data operations, it ensures system stability and high performance.Docs
: Maintain system documentation, including user guides, development guides, etc., ensuring system usability.Deploy
: Manage the system deployment configuration, including frontend and backend deployment files.Welcome to experience Beimingwu. The following instructions will assist you in quickly exploring the search functionality on the system website and provide two demo cases from learnware search to learnware deployment using the learnware package.
The installation instructions for the learnware package can be found here: Installation Guide.
In Beimingwu, learnwares can be searched using both semantic information and statistical information.
When searching with semantic information, you can fill in the information about your target learnware, and the system will search in the names and descriptions of learnwares. You can also filter by tags.
When searching with statistical information, you need to generate and submit a statistical specification, which captures the data distribution while not disclosing your original data. Using the API we provided, you can easily generate this statistical specification locally.
from learnware.specification import generate_stat_spec
data_type = "table" # Data types: ["table", "image", "text"]
spec = generate_stat_spec(type=data_type, X=test_x)
spec.save("stat.json")
By uploading the JSON file containing statistical information, the system will match learnware with similar statistical information. You can download the learnware zip by clicking on the download button in the lower left corner of the learnware card.
In some cases, assembling multiple helpful learnwares may be more beneficial for your task. The system will accordingly recommend a combination of these learnwares as a package. You can download the package using the “Download All” button in the upper right corner.
Beimingwu offers a complete workflow from learnware search to learnware deployment. Below are two specific examples.
Please note that to execute the following examples, you need to first register in the Beimingwu system and obtain a user email and client token.
The following demo illustrates the complete process of using Beimingwu to search for a single learnware for predicting the classic machine learning dataset Iris. This process includes statistical specification generation, single learnware search, learnware deployment, and the final calculation of prediction accuracy.
from learnware.market import BaseUserInfo
from learnware.specification import generate_stat_spec
from learnware.client import LearnwareClient
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
# User prepare
client = LearnwareClient()
client.login(your_email, your_token)
data, target = load_iris(return_X_y=True)
# Generate statistical specification
rkme = generate_stat_spec(type="table", X=data)
user_info = BaseUserInfo(stat_info={rkme.type: rkme})
# Search a single learnware
learnware_id = client.search_learnware(user_info)["single"]["learnware_ids"][0]
print(f"Search result: {learnware_id}")
# Load learnware
learnware = client.load_learnware(learnware_id=learnware_id, runnable_option="conda")
# Reuse learnware
y_pred = learnware.predict(data)
print(f"Classification accuracy: {accuracy_score(target, y_pred)}")
The following demo illustrates the complete process of using Beimingwu to search for multiple learnwares for predicting the classic machine learning dataset Digits. This process includes statistical specification generation, multiple learnware search, learnware deployment, and the final calculation of prediction accuracy.
from learnware.market import BaseUserInfo
from learnware.specification import generate_stat_spec
from learnware.client import LearnwareClient
from learnware.reuse import AveragingReuser
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
# User prepare
client = LearnwareClient()
client.login(your_email, your_token)
data, target = load_digits(return_X_y=True)
# Generate statistical specification
rkme = generate_stat_spec(type="table", X=data)
user_info = BaseUserInfo(stat_info={rkme.type: rkme})
# Search multiple learnwares
learnware_ids = client.search_learnware(user_info)["multiple"]["learnware_ids"]
print(f"Search result: {learnware_ids}")
# Load learnware
learnware_list = client.load_learnware(learnware_id=learnware_ids, runnable_option="conda")
# Reuse learnware
y_pred = AveragingReuser(learnware_list, mode="vote_by_label").predict(data)
print(f"Classification accuracy: {accuracy_score(target, y_pred)}")
If you use our project in your research or work, we kindly request that you cite the following papers:
@article{zhou2024learnware,
title = {Learnware: Small models do big},
author = {Zhou, Zhi-Hua and Tan, Zhi-Hao},
journal = {Science China Information Sciences},
volume = {67},
number = {1},
pages = {112102},
year = {2024}
}
@article{tan2024beimingwu,
title = {Beimingwu: A learnware dock system},
author = {Tan, Zhi-Hao and Liu, Jian-Dong and Bi, Xiao-Dong and Tan, Peng and Zheng, Qin-Cheng and Liu, Hai-Tian and Xie, Yi and Zou, Xiao-Chuan and Yu, Yang and Zhou, Zhi-Hua},
journal = {arXiv preprint arXiv:2401.14427},
year = {2024}
}
Building the learnware paradigm requires collective efforts from the community. As the first learnware dock system, Beimingwu is still in its early stages and may contain bugs and issues. We sincerely invite the community to upload models, collaborate in system development, and engage in research and enhancements in learnware algorithms. For detailed development guidelines, please consult our Developer Guide. We kindly request that contributors adhere to the provided Development Standards when participating in the project. Your valuable contributions are greatly appreciated.
The Beimingwu repository is developed and maintained by the LAMDA Beimingwu R&D (Research and Development) Team. To learn more about our team, please visit the Team Overview.