Fit a Gaussian Process surrogate model
Here we define a surrogate model using Gaussian Processes. We use the GP model from ScikitLearn - we compared it to other models like GPFlow but observed better speed and better code maintenance in this model.
import warnings
import chart_studio
from besos import eppy_funcs as ef, sampling
from besos.evaluator import EvaluatorEP, EvaluatorGeneric
from besos.problem import EPProblem
from chart_studio import plotly as py
from plotly import graph_objs as go
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, Matern, RationalQuadratic
from sklearn.model_selection import GridSearchCV, train_test_split
from parameter_sets import parameter_set
We begin by: + getting a predefined list of 7 parameters from
parameter_sets.py
+ making these into a problem
with electricty
use as the objective + and making an evaluator
using the default
EnergyPlus building.
parameters = parameter_set(7)
problem = EPProblem(parameters, ["Electricity:Facility"])
building = ef.get_building()
evaluator = EvaluatorEP(problem, building)
Then we get 50 samples across this design space and evaluate them.
inputs = sampling.dist_sampler(sampling.lhs, problem, 5)
outputs = evaluator.df_apply(inputs)
inputs
HBox(children=(FloatProgress(value=0.0, description='Executing', max=5.0, style=ProgressStyle(description_widt…
Conductivity | Thickness | U-Factor | Solar Heat Gain Coefficient | ElectricEquipment | Lights | Window to Wall Ratio | |
---|---|---|---|---|---|---|---|
0 | 0.110159 | 0.191814 | 3.365800 | 0.983485 | 11.383518 | 13.646694 | 0.735013 |
1 | 0.046016 | 0.127066 | 0.356207 | 0.614671 | 13.463401 | 12.900007 | 0.954627 |
2 | 0.186359 | 0.251974 | 1.175726 | 0.368073 | 12.436931 | 10.900630 | 0.338765 |
3 | 0.073534 | 0.170320 | 4.906628 | 0.189328 | 10.613030 | 11.837185 | 0.496910 |
4 | 0.152748 | 0.265122 | 2.522534 | 0.447570 | 14.663626 | 14.946526 | 0.167042 |
Train-test split
Next we split the data into a training set (80%) and a testing set (20%).
train_in, test_in, train_out, test_out = train_test_split(
inputs, outputs, test_size=0.2
)
Hyper-parameters
Before fitting the GP model we define the set of hyperparameters we want to optimize. Here we use :raw-latex:`\textit{3}` folds in the k-fold cross validation scheme. We select a set of Kernel functions, which must fit the characteristics of a problem - details and examples may be found in the Kernel cookbook. Note that the parameters of the Kernel itself are optimized during each model fitting run.
hyperparameters = {
"kernel": [
None,
1.0 * RBF(length_scale=1.0, length_scale_bounds=(1e-1, 10.0)),
1.0 * RationalQuadratic(length_scale=1.0, alpha=0.5),
# ConstantKernel(0.1, (0.01, 10.0))*(DotProduct(sigma_0=1.0, sigma_0_bounds=(0.1, 10.0))**2),
1.0 * Matern(length_scale=1.0, length_scale_bounds=(1e-1, 10.0)),
]
}
folds = 3
Model fitting
Here we fit the model using these hyperparameters.
gp = GaussianProcessRegressor(normalize_y=True)
clf = GridSearchCV(gp, hyperparameters, iid=True, cv=folds)
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=FutureWarning)
clf.fit(inputs, outputs)
print(f"Best performing model $R^2$ score on training set: {clf.best_score_}")
print(f"Model $R^2$ parameters: {clf.best_params_}")
print(
f"Best performing model $R^2$ score on a separate test set: {clf.best_estimator_.score(test_in, test_out)}"
)
Best performing model $R^2$ score on training set: nan
Model $R^2$ parameters: {'kernel': None}
Best performing model $R^2$ score on a separate test set: nan
/home/user/.local/lib/python3.7/site-packages/sklearn/metrics/_regression.py:594: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
warnings.warn(msg, UndefinedMetricWarning)
/home/user/.local/lib/python3.7/site-packages/sklearn/metrics/_regression.py:594: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
warnings.warn(msg, UndefinedMetricWarning)
/home/user/.local/lib/python3.7/site-packages/sklearn/metrics/_regression.py:594: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
warnings.warn(msg, UndefinedMetricWarning)
/home/user/.local/lib/python3.7/site-packages/sklearn/metrics/_regression.py:594: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
warnings.warn(msg, UndefinedMetricWarning)
/home/user/.local/lib/python3.7/site-packages/sklearn/metrics/_regression.py:594: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
warnings.warn(msg, UndefinedMetricWarning)
Surrogate Modelling Evaluator object
We can wrap the fitted model in a BESOS Evaluator
.
def evaluation_func(ind):
return (clf.predict([ind])[0][0],)
GP_SM = EvaluatorGeneric(evaluation_func, problem)
This has identical behaviour to the original EnergyPlus Evaluator object. In the next cells we generate a single input sample and evaluate it using the surrogate model and EnergyPlus.
sample = sampling.dist_sampler(sampling.lhs, problem, 1)
values = sample.values[0]
print(values)
[ 0.15579269 0.26956427 0.43244217 0.68968071 12.15878908 13.37350072
0.20776428]
GP_SM(values)[0]
2076892603.2093327
evaluator(values)[0]
2096737467.8634224
Running a large surrogate evaluation
inputs = sampling.dist_sampler(sampling.lhs, problem, 5000)
outputs = GP_SM.df_apply(inputs)
results = inputs.join(outputs)
results.head()
HBox(children=(FloatProgress(value=0.0, description='Executing', max=5000.0, style=ProgressStyle(description_w…
Conductivity | Thickness | U-Factor | Solar Heat Gain Coefficient | ElectricEquipment | Lights | Window to Wall Ratio | Electricity:Facility | |
---|---|---|---|---|---|---|---|---|
0 | 0.174172 | 0.170220 | 4.092570 | 0.139609 | 12.716906 | 11.659730 | 0.121882 | 2.049882e+09 |
1 | 0.153848 | 0.158066 | 0.642023 | 0.544972 | 12.237855 | 11.648225 | 0.294227 | 1.998581e+09 |
2 | 0.107891 | 0.187473 | 4.980959 | 0.211675 | 13.198086 | 10.719210 | 0.888578 | 2.062038e+09 |
3 | 0.052859 | 0.182125 | 1.620017 | 0.533700 | 11.440617 | 10.386052 | 0.340614 | 2.008998e+09 |
4 | 0.171637 | 0.214592 | 1.243917 | 0.685709 | 10.228580 | 14.148959 | 0.352664 | 2.066421e+09 |
Generate an idf/epJSON file with data in dataframe
Generate an idf/epJSON file with selected row of data in dataframe and save it in current directory.
# generate_building(dataframe, index, filename)
evaluator.generate_building(results, 2, "output")
Visualization
chart_studio.tools.set_credentials_file(
username="besos", api_key="Kb2G2bjOh5gmwh1Midwq"
)
df = inputs.round(3)
# generate list if dictionaries
l = list()
for i in df.columns:
l.extend([dict(label=i, values=df[i])])
l.extend([dict(label=outputs.columns[0], values=outputs.round(-5))])
data = [
go.Parcoords(
line=dict(
color=outputs["Electricity:Facility"],
colorscale=[[0, "#D7C16B"], [0.5, "#23D8C3"], [1, "#F3F10F"]],
),
dimensions=l,
)
]
layout = go.Layout(plot_bgcolor="#E5E5E5", paper_bgcolor="#E5E5E5")
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename="parcoords-basic")