Fit feedforward Neural Network model¶
import warnings
import chart_studio
from besos import eppy_funcs as ef, sampling
from besos.evaluator import EvaluatorEP, EvaluatorGeneric
from besos.problem import EPProblem
from chart_studio import plotly as py
from plotly import graph_objs as go
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import StandardScaler
from parameter_sets import parameter_set
We begin by: + getting a predefined list of 7 parameters from
parameter_sets.py
+ making these into a problem
with electricty
use as the objective + and making an evaluator
using the default
EnergyPlus building.
parameters = parameter_set(7)
problem = EPProblem(parameters, ["Electricity:Facility"])
building = ef.get_building()
evaluator = EvaluatorEP(problem, building)
Then we get 20 samples across this design space and evaluate them.
inputs = sampling.dist_sampler(sampling.lhs, problem, 20)
outputs = evaluator.df_apply(inputs)
inputs
HBox(children=(FloatProgress(value=0.0, description='Executing', max=20.0, style=ProgressStyle(description_wid…
Conductivity | Thickness | U-Factor | Solar Heat Gain Coefficient | ElectricEquipment | Lights | Window to Wall Ratio | |
---|---|---|---|---|---|---|---|
0 | 0.165451 | 0.150655 | 1.543389 | 0.759667 | 10.846373 | 14.112927 | 0.024352 |
1 | 0.132154 | 0.147326 | 4.990325 | 0.476966 | 12.153939 | 13.535152 | 0.253947 |
2 | 0.080456 | 0.190067 | 2.774125 | 0.715199 | 10.749949 | 11.099823 | 0.929049 |
3 | 0.033565 | 0.287872 | 0.251303 | 0.886388 | 12.717156 | 14.578163 | 0.122789 |
4 | 0.118116 | 0.162183 | 3.689406 | 0.619588 | 10.248964 | 13.919604 | 0.822831 |
5 | 0.104531 | 0.127132 | 4.077076 | 0.528881 | 12.324394 | 14.488989 | 0.670414 |
6 | 0.196692 | 0.139876 | 0.506217 | 0.030411 | 11.374182 | 11.665950 | 0.375758 |
7 | 0.144890 | 0.267075 | 4.432675 | 0.172185 | 13.053027 | 10.926282 | 0.750233 |
8 | 0.085398 | 0.102225 | 1.615485 | 0.298316 | 10.455626 | 12.998602 | 0.455249 |
9 | 0.188407 | 0.252228 | 2.493945 | 0.833956 | 14.747260 | 12.044832 | 0.615270 |
10 | 0.043554 | 0.244529 | 0.894356 | 0.324217 | 12.875479 | 10.282988 | 0.196747 |
11 | 0.176928 | 0.180441 | 2.198896 | 0.669148 | 14.493702 | 13.226416 | 0.343713 |
12 | 0.122820 | 0.110766 | 2.953618 | 0.929649 | 14.949231 | 10.096406 | 0.529761 |
13 | 0.155939 | 0.237080 | 0.744055 | 0.102866 | 13.478119 | 12.425914 | 0.968424 |
14 | 0.066703 | 0.219910 | 1.295623 | 0.554399 | 11.631811 | 14.876004 | 0.588699 |
15 | 0.061973 | 0.179125 | 4.654101 | 0.375086 | 11.081292 | 10.521093 | 0.717606 |
16 | 0.092996 | 0.276388 | 3.934188 | 0.448308 | 13.756615 | 12.745539 | 0.291697 |
17 | 0.154490 | 0.228798 | 3.155262 | 0.209700 | 14.145472 | 13.320453 | 0.867700 |
18 | 0.047826 | 0.295456 | 1.831015 | 0.117507 | 13.649584 | 11.321448 | 0.441672 |
19 | 0.025519 | 0.207379 | 3.353283 | 0.983222 | 11.805350 | 11.781888 | 0.105139 |
Train-test split¶
Next we split the data into a training set (80%) and a testing set (20%).
train_in, test_in, train_out, test_out = train_test_split(
inputs, outputs, test_size=0.2
)
Normalization of inputs¶
To ensure an equal weighting of inputs and outputs in the backpropagation algorithm fitting the neural network, we have to normalize the input values. For example window-to-wall ratio is in the range of 0 to 1 while the \(W/\)m^2$ are in a range of 10 to 15. Different options for normalization exist. Here we bring all features (input variables) to have zero mean and a standarddeviation of 1. Note that we fit the normalizer on training data only.
scaler = StandardScaler()
inputs = scaler.fit_transform(X=train_in)
scaler_out = StandardScaler()
outputs = scaler_out.fit_transform(X=train_out)
Hyper-parameters¶
Before we start fitting the NN model we define the set of hyperparameters we want to analyse in our cross-validation to optimize the model. Here, we select the number of layers of the network as well as the regularization parameter alpha as parameter value. A larger number of layers and a lower value of the regularizer lead to higher variance of the network. This may lead to overfitting. The best selection may be found using an optimizer like Bayesian Optimization. In this example we use a simple grid search.
hyperparameters = {
"hidden_layer_sizes": (
(len(parameters) * 16,),
(len(parameters) * 16, len(parameters) * 16),
),
"alpha": [1, 10, 10 ** 3],
}
neural_net = MLPRegressor(max_iter=1000, early_stopping=False)
folds = 3
Model fitting¶
Here, we use the NN model from ScikitLearn. In a different example we use TensorFlow (with and without the Keras wrapper).
clf = GridSearchCV(neural_net, hyperparameters, iid=True, cv=folds)
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=FutureWarning)
clf.fit(inputs, outputs.ravel())
print(f"Best performing model $R^2$ score on training set: {clf.best_score_}")
print(f"Model $R^2$ parameters: {clf.best_params_}")
print(
f"Best performing model $R^2$ score on a separate test set: {clf.best_estimator_.score(scaler.transform(test_in), scaler_out.transform(test_out))}"
)
Best performing model $R^2$ score on training set: 0.7459884525525834
Model $R^2$ parameters: {'alpha': 1, 'hidden_layer_sizes': (112, 112)}
Best performing model $R^2$ score on a separate test set: 0.9416874254967937
Surrogate Modelling Evaluator object¶
We can wrap the fitted model in a BESOS Evaluator
. This has
identical behaviour to the original EnergyPlus Evaluator object.
def evaluation_func(ind, scaler=scaler):
ind = scaler.transform(X=[ind])
return (scaler_out.inverse_transform(clf.predict(ind))[0],)
NN_SM = EvaluatorGeneric(evaluation_func, problem)
This has identical behaviour to the original EnergyPlus Evaluator object. In the next cells we generate a single input sample and evaluate it using the surrogate model and EnergyPlus.
sample = sampling.dist_sampler(sampling.lhs, problem, 1)
values = sample.values[0]
print(values)
[ 0.11669079 0.17961374 4.25848042 0.13397506 10.48584932 10.17177171
0.35134885]
NN_SM(values)[0]
1814729300.020491
evaluator(values)[0]
1751712292.4908545
Running a large surrogate evaluation¶
inputs = sampling.dist_sampler(sampling.lhs, problem, 5000)
outputs = NN_SM.df_apply(inputs)
results = inputs.join(outputs)
results.head()
HBox(children=(FloatProgress(value=0.0, description='Executing', max=5000.0, style=ProgressStyle(description_w…
Conductivity | Thickness | U-Factor | Solar Heat Gain Coefficient | ElectricEquipment | Lights | Window to Wall Ratio | Electricity:Facility | |
---|---|---|---|---|---|---|---|---|
0 | 0.092928 | 0.238438 | 4.718053 | 0.301978 | 14.540408 | 13.699712 | 0.718998 | 2.198494e+09 |
1 | 0.127584 | 0.228582 | 0.544851 | 0.249709 | 11.915124 | 13.472015 | 0.221293 | 2.061339e+09 |
2 | 0.039634 | 0.158802 | 3.416324 | 0.086649 | 11.756280 | 10.131401 | 0.758322 | 1.823225e+09 |
3 | 0.175974 | 0.152255 | 4.880733 | 0.328792 | 12.571576 | 10.447726 | 0.787714 | 1.922154e+09 |
4 | 0.192895 | 0.221227 | 4.634923 | 0.534569 | 12.375788 | 14.908257 | 0.197574 | 2.200912e+09 |
Visualization¶
chart_studio.tools.set_credentials_file(
username="besos", api_key="Kb2G2bjOh5gmwh1Midwq"
)
df = inputs.round(3)
# generate list if dictionaries
l = list()
for i in df.columns:
l.extend([dict(label=i, values=df[i])])
l.extend([dict(label=outputs.columns[0], values=outputs.round(-5))])
data = [
go.Parcoords(
line=dict(
color=outputs["Electricity:Facility"],
colorscale=[[0, "#D7C16B"], [0.5, "#23D8C3"], [1, "#F3F10F"]],
),
dimensions=l,
)
]
layout = go.Layout(plot_bgcolor="#E5E5E5", paper_bgcolor="#E5E5E5")
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename="parcoords-basic")