🔗 Report | Homepage | BgoFace UI
🤝🤝🤝 Please star ⭐️ this project to support open-source development! For questions or collaboration, contact: Dr. Bin Cao ([email protected])
Bgolearn is a lightweight and extensible Python package for Bayesian global optimization, built for accelerating materials discovery and design. It provides out-of-the-box support for regression and classification tasks, implements various acquisition strategies, and offers a seamless pipeline for virtual screening, active learning, and multi-objective optimization.
📦 Official PyPI:
pip install Bgolearn
🎥 Code tutorial (BiliBili): Watch here 🚀 Colab Demo: Run it online
pip install Bgolearn
pip install --upgrade Bgolearn
pip show Bgolearn
import Bgolearn.BGOsampling as BGOS
import pandas as pd
# Load characterized dataset
data = pd.read_csv('data.csv')
x = data.iloc[:, :-1] # features
y = data.iloc[:, -1] # response
# Load virtual samples
vs = pd.read_csv('virtual_data.csv')
# Instantiate and run model
Bgolearn = BGOS.Bgolearn()
Mymodel = Bgolearn.fit(data_matrix=x, Measured_response=y, virtual_samples=vs)
# Get result using Expected Improvement
Mymodel.EI()
Install the extension toolkit:
pip install BgoKit
from BgoKit import ToolKit
Model = ToolKit.MultiOpt(vs, [score_1, score_2])
Model.BiSearch()
Model.plot_distribution()
📓 See detailed demo: Multi-objective Example

- Expected Improvement (EI)
- Augmented Expected Improvement (AEI)
- Expected Quantile Improvement (EQI)
- Upper Confidence Bound (UCB)
- Probability of Improvement (PI)
- Predictive Entropy Search (PES)
- Knowledge Gradient (KG)
- Reinterpolation EI (REI)
- Expected Improvement with Plugin
- Least Confidence
- Margin Sampling
- Entropy-based approach
The graphical frontend of Bgolearn is developed as BgoFace, providing no-code access to its backend algorithms.
Supports a broad range of acquisition strategies (EI, UCB, KG, PES, etc.) for both single and multi-objective optimization. Works well with sparse and high-dimensional datasets common in material science.
Use BgoKit and MultiBgolearn to implement Pareto optimization across multiple target properties (e.g., strength & ductility), enabling parallel evaluation across virtual samples.
Incorporates adaptive sampling in an active learning loop—experiment → prediction → update—to accelerate optimization using fewer experiments.
-
Nano Letters: Self-Driving Laboratory under UHV Link
-
Small: ML-Engineered Nanozyme System for Anti-Tumor Therapy Link
-
Computational Materials Science: Mg-Ca-Zn Alloy Optimization Link
-
Measurement: Foaming Agent Optimization in EPB Shield Construction Link
-
Intelligent Computing: Metasurface Design via Bayesian Learning Link
-
Materials & Design: Lead-Free Solder Alloys via Active Learning Link
-
npj Computational Materials: MLMD Platform with Bgolearn Backend Link
Released under the MIT License. 💼 Free for academic and commercial use. Please cite relevant publications if used in research.
We welcome community contributions and research collaborations:
- Submit issues for bug reports, ideas, or suggestions
- Submit pull requests for code contributions
- Contact Bin Cao ([email protected]) for collaborations
Signature:
Bgolearn.fit(
data_matrix,
Measured_response,
virtual_samples,
Mission='Regression',
Classifier='GaussianProcess',
noise_std=None,
Kriging_model=None,
opt_num=1,
min_search=True,
CV_test=False,
Dynamic_W=False,
seed=42,
)
================================================================
:param data_matrix: data matrix of training dataset, X .
:param Measured_response: response of tarining dataset, y.
:param virtual_samples: designed virtual samples.
:param Mission: str, default 'Regression', the mission of optimization. Mission = 'Regression' or 'Classification'
:param Classifier: if Mission == 'Classification', classifier is used.
if user isn't applied one, Bgolearn will call a pre-set classifier.
default, Classifier = 'GaussianProcess', i.e., Gaussian Process Classifier.
five different classifiers are pre-setd in Bgolearn:
'GaussianProcess' --> Gaussian Process Classifier (default)
'LogisticRegression' --> Logistic Regression
'NaiveBayes' --> Naive Bayes Classifier
'SVM' --> Support Vector Machine Classifier
'RandomForest' --> Random Forest Classifier
:param noise_std: float or ndarray of shape (n_samples,), default=None
Value added to the diagonal of the kernel matrix during fitting.
This can prevent a potential numerical issue during fitting, by
ensuring that the calculated values form a positive definite matrix.
It can also be interpreted as the variance of additional Gaussian.
measurement noise on the training observations.
if noise_std is not None, a noise value will be estimated by maximum likelihood
on training dataset.
:param Kriging_model (default None):
str, Kriging_model = 'SVM', 'RF', 'AdaB', 'MLP'
The machine learning models will be implemented: Support Vector Machine (SVM),
Random Forest(RF), AdaBoost(AdaB), and Multi-Layer Perceptron (MLP).
The estimation uncertainity will be determined by Boostsrap sampling.
or
a user defined callable Kriging model, has an attribute of <fit_pre>
if user isn't applied one, Bgolearn will call a pre-set Kriging model
atribute <fit_pre> :
input -> xtrain, ytrain, xtest ;
output -> predicted mean and std of xtest
e.g. (take GaussianProcessRegressor in sklearn):
class Kriging_model(object):
def fit_pre(self,xtrain,ytrain,xtest):
# instantiated model
kernel = RBF()
mdoel = GaussianProcessRegressor(kernel=kernel).fit(xtrain,ytrain)
# defined the attribute's outputs
mean,std = mdoel.predict(xtest,return_std=True)
return mean,std
e.g. (MultiModels estimations):
class Kriging_model(object):
def fit_pre(self,xtrain,ytrain,xtest):
# instantiated model
pre_1 = SVR(C=10).fit(xtrain,ytrain).predict(xtest) # model_1
pre_2 = SVR(C=50).fit(xtrain,ytrain).predict(xtest) # model_2
pre_3 = SVR(C=80).fit(xtrain,ytrain).predict(xtest) # model_3
model_1 , model_2 , model_3 can be changed to any ML models you desire
# defined the attribute's outputs
stacked_array = np.vstack((pre_1,pre_2,pre_3))
means = np.mean(stacked_array, axis=0)
std = np.sqrt(np.var(stacked_array), axis=0)
return mean, std
:param opt_num: the number of recommended candidates for next iteration, default 1.
:param min_search: default True -> searching the global minimum ;
False -> searching the global maximum.
:param CV_test: 'LOOCV' or an int, default False (pass test)
if CV_test = 'LOOCV', LOOCV will be applied,
elif CV_test = int, e.g., CV_test = 10, 10 folds cross validation will be applied.
:return: 1: array; potential of each candidate. 2: array/float; recommended candidate(s).
File: ~/miniconda3/lib/python3.9/site-packages/Bgolearn/BGOsampling.py
Type: method