Fit feedforward Neural Network model With Dask

This notebook takes the “Fit feedforward Neural Network model” notebook and parallelizes the processes using Dask. It will skip over explanation of code unrelated to Dask. Refer to the “Fit feedforward Neural Network model” notebook for more details on this notebook.

First import packages, and initialize the scheduler

import joblib
from besos import eppy_funcs as ef, sampling
from besos.evaluator import EvaluatorEP, EvaluatorGeneric
from besos.problem import EPProblem
from dask.distributed import Client
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import StandardScaler
import warnings
from parameter_sets import parameter_set
from dask.distributed import Client
client = Client()
client

Client

Cluster

  • Workers: 4
  • Cores: 16
  • Memory: 68.72 GB

We gather the parameters and the building, then create the problem and evaluator.

parameters = parameter_set(7)
problem = EPProblem(parameters, ["Electricity:Facility"])
building = ef.get_building()
problem = EPProblem(parameters, ['Electricity:Facility'])
evaluator = EvaluatorEP(problem, building)

When df_apply is called, the dataframe will be processed concurrently. By passing in the processes parameter you can define the number of paritions the dataframe will be divided into. If you are running this notebook locally, you can open the Dask dashboard. A link is provided by the client object (refer to the first cell in the notebook where we initialized Client). On the dashboard, you can see what processes are running.

%%time
inputs = sampling.dist_sampler(sampling.lhs, problem, 50)
outputs = evaluator.df_apply(inputs, processes=4)
inputs
CPU times: user 1.47 s, sys: 287 ms, total: 1.75 s
Wall time: 23.3 s
Conductivity Thickness U-Factor Solar Heat Gain Coefficient ElectricEquipment Lights Window to Wall Ratio
0 0.055023 0.289414 0.971157 0.453890 11.879346 12.800983 0.566172
1 0.161703 0.151686 0.248658 0.433111 14.650401 14.239511 0.888224
2 0.146857 0.293566 3.995612 0.905855 12.620180 14.457510 0.120768
3 0.193907 0.242702 0.570291 0.814085 12.934642 13.796069 0.085228
4 0.050311 0.180439 3.084850 0.061345 10.312514 13.480864 0.677188
5 0.132940 0.119451 1.384960 0.863318 12.853962 14.055337 0.160043
6 0.059651 0.106822 3.443967 0.919450 10.687633 10.351821 0.366714
7 0.098363 0.170353 4.623611 0.548621 13.039621 10.172123 0.729395
8 0.166916 0.231849 4.274812 0.371251 12.335803 14.189433 0.395333
9 0.185077 0.298956 2.575012 0.737015 14.540488 13.018182 0.592655
10 0.039453 0.239708 1.237301 0.714868 13.738186 13.961779 0.019614
11 0.155364 0.120459 3.664835 0.128758 13.968844 10.897971 0.348100
12 0.114585 0.261586 1.784999 0.489818 11.676686 11.817922 0.958709
13 0.074132 0.113085 0.168243 0.361055 12.403512 10.030331 0.780304
14 0.125671 0.172620 2.243686 0.386179 14.729687 13.321995 0.977003
15 0.197555 0.145545 4.553300 0.625993 11.044226 10.291775 0.702575
16 0.111633 0.209954 2.939609 0.465666 14.208762 11.182091 0.763889
17 0.099961 0.138505 3.765241 0.030199 12.058338 10.654501 0.038554
18 0.032217 0.235183 4.716094 0.224655 13.895373 12.428741 0.442455
19 0.087545 0.250956 2.493906 0.972490 13.501133 11.628705 0.629117
20 0.079181 0.109904 3.152474 0.801282 13.471743 11.307619 0.925028
21 0.186814 0.212415 1.760755 0.766256 14.987510 14.726486 0.176318
22 0.088964 0.160811 1.490295 0.123413 10.849416 13.278679 0.263240
23 0.122988 0.284068 2.429327 0.592769 10.470285 12.325289 0.405024
24 0.026182 0.204374 4.931655 0.156479 11.135790 12.527733 0.235172
25 0.151733 0.256406 0.770869 0.517453 10.936935 12.189044 0.105411
26 0.129251 0.226350 4.456843 0.238534 11.440466 14.341981 0.146293
27 0.158677 0.279158 1.049345 0.082118 11.503272 13.661504 0.675995
28 0.044414 0.133110 0.489562 0.682752 11.953141 12.231597 0.484533
29 0.047032 0.220143 2.046230 0.402257 13.301214 14.895942 0.830231
30 0.139758 0.201843 4.852871 0.572439 12.562844 14.610093 0.608437
31 0.145235 0.185034 3.405676 0.536250 11.372711 11.752122 0.288744
32 0.022745 0.124947 1.601113 0.655625 10.097336 11.988407 0.871940
33 0.105943 0.158020 3.567968 0.662704 12.759740 14.547573 0.646999
34 0.067095 0.281144 2.657010 0.932130 14.058247 13.580063 0.810015
35 0.095066 0.266337 0.605780 0.248594 13.143416 12.748950 0.318998
36 0.178696 0.142271 3.887490 0.299538 14.449008 10.709081 0.277599
37 0.189534 0.100527 2.833761 0.179466 11.209249 12.931609 0.503187
38 0.136791 0.154748 3.021646 0.276294 14.889010 10.416115 0.432814
39 0.176826 0.270708 4.078908 0.845699 12.276264 11.480877 0.202349
40 0.106603 0.274593 0.875880 0.101857 11.742971 13.101020 0.221728
41 0.059221 0.130004 4.122003 0.606201 10.706087 12.061205 0.892999
42 0.169417 0.164906 1.134970 0.872666 14.375552 11.061248 0.323963
43 0.073724 0.219424 2.096114 0.957559 10.143611 11.256390 0.836476
44 0.035543 0.191936 1.891513 0.013676 10.211157 11.525980 0.531643
45 0.081527 0.198758 3.251619 0.192598 13.253799 14.983523 0.934755
46 0.064496 0.176742 0.329414 0.715988 14.191205 12.682700 0.744238
47 0.028722 0.255675 2.270806 0.791544 12.183257 13.852691 0.059637
48 0.118118 0.195794 4.325877 0.304809 13.637606 10.525400 0.555531
49 0.172703 0.244258 1.308726 0.340048 10.511005 10.924541 0.476248

Set up model parameters

In this cell, we setup the model. More detail can be found in the “Fit feedforward Neural Network model” notebook

train_in, test_in, train_out, test_out = train_test_split(
    inputs, outputs, test_size=0.2
)

scaler = StandardScaler()
inputs = scaler.fit_transform(X=train_in)

scaler_out = StandardScaler()
outputs = scaler_out.fit_transform(X=train_out)

hyperparameters = {
    "hidden_layer_sizes": (
        (len(parameters) * 16,),
        (len(parameters) * 16, len(parameters) * 16),
    ),
    "alpha": [1, 10, 10 ** 3],
}

neural_net = MLPRegressor(max_iter=1000, early_stopping=False)
folds = 3

Model fitting with Dask

Here, we use the NN model from ScikitLearn. In a different example we use TensorFlow (with and without the Keras wrapper).

Below we parallelize the model fit. Normally, SciketLearn uses joblib to parallelize model fitting. By specifying the parrallel backend to be Dask, joblib switches over to using the Dask scheduler. For this example, using Dask may not be any faster. This is because joblib also has the ability to parrallelize accross cores. An example where this tool would be useful is when Dask is using a ditributed network with access to more cores.

%%time
with joblib.parallel_backend("dask"):
    clf = GridSearchCV(neural_net, hyperparameters, iid=True, cv=folds)
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=FutureWarning)
        clf.fit(inputs, outputs.ravel())

print(f"Best performing model $R^2$ score on training set: {clf.best_score_}")
print(f"Model $R^2$ parameters: {clf.best_params_}")
print(
    f"Best performing model $R^2$ score on a separate test set: {clf.best_estimator_.score(scaler.transform(test_in), scaler_out.transform(test_out))}"
)
Best performing model $R^2$ score on training set: 0.9859359222733858
Model $R^2$ parameters: {'alpha': 1, 'hidden_layer_sizes': (112,)}
Best performing model $R^2$ score on a separate test set: 0.9973443498852271
CPU times: user 565 ms, sys: 69.8 ms, total: 635 ms
Wall time: 4.38 s

Surrogate Modelling Evaluator object

We can wrap the fitted model in a BESOS Evaluator. This has identical behaviour to the original EnergyPlus Evaluator object.

The parrallelization occurs when calling the df_apply function.

def evaluation_func(ind, scaler=scaler):
    ind = scaler.transform(X=[ind])
    return (scaler_out.inverse_transform(clf.predict(ind))[0],)


NN_SM = EvaluatorGeneric(evaluation_func, problem)

Running a large surrogate evaluation

Here we bump up the sample count to 50,000 and partition the data into 4. (if you have more cores available, feel free to try increasing the proccesses)

%%time
inputs = sampling.dist_sampler(sampling.lhs, problem, 50000)
outputs = NN_SM.df_apply(inputs, processes=4)
results = inputs.join(outputs)
results.head()
CPU times: user 724 ms, sys: 149 ms, total: 873 ms
Wall time: 9.38 s
Conductivity Thickness U-Factor Solar Heat Gain Coefficient ElectricEquipment Lights Window to Wall Ratio Electricity:Facility
0 0.192527 0.153194 4.481314 0.567995 11.835818 13.052947 0.857530 2.057846e+09
1 0.080337 0.110154 2.589947 0.811769 11.163278 13.104261 0.966630 1.995323e+09
2 0.095729 0.285002 4.567989 0.364190 14.112162 12.881720 0.017776 2.154193e+09
3 0.156267 0.214472 2.800968 0.401643 11.264160 11.086406 0.247729 1.882376e+09
4 0.196672 0.273954 2.064289 0.267733 10.874480 10.699428 0.980151 1.845306e+09