Fit a Gaussian Process surrogate model

Here we define a surrogate model using Gaussian Processes. We use the GP model from ScikitLearn - we compared it to other models like GPFlow but observed better speed and better code maintenance in this model.

import warnings

import chart_studio
from besos import eppy_funcs as ef, sampling
from besos.evaluator import EvaluatorEP, EvaluatorGeneric
from besos.problem import EPProblem
from chart_studio import plotly as py
from plotly import graph_objs as go
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, Matern, RationalQuadratic
from sklearn.model_selection import GridSearchCV, train_test_split

from parameter_sets import parameter_set

We begin by: + getting a predefined list of 7 parameters from parameter_sets.py + making these into a problem with electricty use as the objective + and making an evaluator using the default EnergyPlus building.

parameters = parameter_set(7)
problem = EPProblem(parameters, ["Electricity:Facility"])
building = ef.get_building()
evaluator = EvaluatorEP(problem, building)

Then we get 50 samples across this design space and evaluate them.

inputs = sampling.dist_sampler(sampling.lhs, problem, 5)
outputs = evaluator.df_apply(inputs)
inputs

HBox(children=(FloatProgress(value=0.0, description='Executing', max=5.0, style=ProgressStyle(description_widt…

	Conductivity	Thickness	U-Factor	Solar Heat Gain Coefficient	ElectricEquipment	Lights	Window to Wall Ratio
0	0.110159	0.191814	3.365800	0.983485	11.383518	13.646694	0.735013
1	0.046016	0.127066	0.356207	0.614671	13.463401	12.900007	0.954627
2	0.186359	0.251974	1.175726	0.368073	12.436931	10.900630	0.338765
3	0.073534	0.170320	4.906628	0.189328	10.613030	11.837185	0.496910
4	0.152748	0.265122	2.522534	0.447570	14.663626	14.946526	0.167042

Train-test split

Next we split the data into a training set (80%) and a testing set (20%).

train_in, test_in, train_out, test_out = train_test_split(
    inputs, outputs, test_size=0.2
)

Hyper-parameters

Before fitting the GP model we define the set of hyperparameters we want to optimize. Here we use :raw-latex:`\textit{3}` folds in the k-fold cross validation scheme. We select a set of Kernel functions, which must fit the characteristics of a problem - details and examples may be found in the Kernel cookbook. Note that the parameters of the Kernel itself are optimized during each model fitting run.

hyperparameters = {
    "kernel": [
        None,
        1.0 * RBF(length_scale=1.0, length_scale_bounds=(1e-1, 10.0)),
        1.0 * RationalQuadratic(length_scale=1.0, alpha=0.5),
        # ConstantKernel(0.1, (0.01, 10.0))*(DotProduct(sigma_0=1.0, sigma_0_bounds=(0.1, 10.0))**2),
        1.0 * Matern(length_scale=1.0, length_scale_bounds=(1e-1, 10.0)),
    ]
}

folds = 3

Model fitting

Here we fit the model using these hyperparameters.

gp = GaussianProcessRegressor(normalize_y=True)

clf = GridSearchCV(gp, hyperparameters, iid=True, cv=folds)

with warnings.catch_warnings():
    warnings.simplefilter("ignore", category=FutureWarning)
    clf.fit(inputs, outputs)

print(f"Best performing model $R^2$ score on training set: {clf.best_score_}")
print(f"Model $R^2$ parameters: {clf.best_params_}")
print(
    f"Best performing model $R^2$ score on a separate test set: {clf.best_estimator_.score(test_in, test_out)}"
)

Best performing model $R^2$ score on training set: nan
Model $R^2$ parameters: {'kernel': None}
Best performing model $R^2$ score on a separate test set: nan

/home/user/.local/lib/python3.7/site-packages/sklearn/metrics/_regression.py:594: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)
/home/user/.local/lib/python3.7/site-packages/sklearn/metrics/_regression.py:594: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)
/home/user/.local/lib/python3.7/site-packages/sklearn/metrics/_regression.py:594: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)
/home/user/.local/lib/python3.7/site-packages/sklearn/metrics/_regression.py:594: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)
/home/user/.local/lib/python3.7/site-packages/sklearn/metrics/_regression.py:594: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)

Surrogate Modelling Evaluator object

We can wrap the fitted model in a BESOS Evaluator.

def evaluation_func(ind):
    return (clf.predict([ind])[0][0],)


GP_SM = EvaluatorGeneric(evaluation_func, problem)

This has identical behaviour to the original EnergyPlus Evaluator object. In the next cells we generate a single input sample and evaluate it using the surrogate model and EnergyPlus.

sample = sampling.dist_sampler(sampling.lhs, problem, 1)
values = sample.values[0]
print(values)

[ 0.15579269  0.26956427  0.43244217  0.68968071 12.15878908 13.37350072
  0.20776428]

GP_SM(values)[0]

2076892603.2093327

evaluator(values)[0]

2096737467.8634224

Running a large surrogate evaluation

inputs = sampling.dist_sampler(sampling.lhs, problem, 5000)
outputs = GP_SM.df_apply(inputs)
results = inputs.join(outputs)
results.head()

HBox(children=(FloatProgress(value=0.0, description='Executing', max=5000.0, style=ProgressStyle(description_w…

	Conductivity	Thickness	U-Factor	Solar Heat Gain Coefficient	ElectricEquipment	Lights	Window to Wall Ratio	Electricity:Facility
0	0.174172	0.170220	4.092570	0.139609	12.716906	11.659730	0.121882	2.049882e+09
1	0.153848	0.158066	0.642023	0.544972	12.237855	11.648225	0.294227	1.998581e+09
2	0.107891	0.187473	4.980959	0.211675	13.198086	10.719210	0.888578	2.062038e+09
3	0.052859	0.182125	1.620017	0.533700	11.440617	10.386052	0.340614	2.008998e+09
4	0.171637	0.214592	1.243917	0.685709	10.228580	14.148959	0.352664	2.066421e+09

Generate an idf/epJSON file with data in dataframe

Generate an idf/epJSON file with selected row of data in dataframe and save it in current directory.

# generate_building(dataframe, index, filename)
evaluator.generate_building(results, 2, "output")

Visualization

chart_studio.tools.set_credentials_file(
    username="besos", api_key="Kb2G2bjOh5gmwh1Midwq"
)
df = inputs.round(3)

# generate list if dictionaries
l = list()
for i in df.columns:
    l.extend([dict(label=i, values=df[i])])

l.extend([dict(label=outputs.columns[0], values=outputs.round(-5))])

data = [
    go.Parcoords(
        line=dict(
            color=outputs["Electricity:Facility"],
            colorscale=[[0, "#D7C16B"], [0.5, "#23D8C3"], [1, "#F3F10F"]],
        ),
        dimensions=l,
    )
]

layout = go.Layout(plot_bgcolor="#E5E5E5", paper_bgcolor="#E5E5E5")

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename="parcoords-basic")