Fit feedforward Neural Network model With Dask

This notebook takes the “Fit feedforward Neural Network model” notebook and parallelizes the processes using Dask. It will skip over explanation of code unrelated to Dask. Refer to the “Fit feedforward Neural Network model” notebook for more details on this notebook.

First import packages, and initialize the scheduler

import joblib
from besos import eppy_funcs as ef, sampling
from besos.evaluator import EvaluatorEP, EvaluatorGeneric
from besos.problem import EPProblem
from dask.distributed import Client
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import StandardScaler
import warnings
from parameter_sets import parameter_set

from dask.distributed import Client
client = Client()
client

Client

Scheduler: tcp://127.0.0.1:36583
Dashboard: /user/peterrwilson99/proxy/8787/status

Cluster

Workers: 4
Cores: 16
Memory: 68.72 GB

We gather the parameters and the building, then create the problem and evaluator.

parameters = parameter_set(7)
problem = EPProblem(parameters, ["Electricity:Facility"])
building = ef.get_building()
problem = EPProblem(parameters, ['Electricity:Facility'])
evaluator = EvaluatorEP(problem, building)

When df_apply is called, the dataframe will be processed concurrently. By passing in the processes parameter you can define the number of paritions the dataframe will be divided into. If you are running this notebook locally, you can open the Dask dashboard. A link is provided by the client object (refer to the first cell in the notebook where we initialized Client). On the dashboard, you can see what processes are running.

%%time
inputs = sampling.dist_sampler(sampling.lhs, problem, 50)
outputs = evaluator.df_apply(inputs, processes=4)
inputs

CPU times: user 1.47 s, sys: 287 ms, total: 1.75 s
Wall time: 23.3 s

	Conductivity	Thickness	U-Factor	Solar Heat Gain Coefficient	ElectricEquipment	Lights	Window to Wall Ratio
0	0.055023	0.289414	0.971157	0.453890	11.879346	12.800983	0.566172
1	0.161703	0.151686	0.248658	0.433111	14.650401	14.239511	0.888224
2	0.146857	0.293566	3.995612	0.905855	12.620180	14.457510	0.120768
3	0.193907	0.242702	0.570291	0.814085	12.934642	13.796069	0.085228
4	0.050311	0.180439	3.084850	0.061345	10.312514	13.480864	0.677188
5	0.132940	0.119451	1.384960	0.863318	12.853962	14.055337	0.160043
6	0.059651	0.106822	3.443967	0.919450	10.687633	10.351821	0.366714
7	0.098363	0.170353	4.623611	0.548621	13.039621	10.172123	0.729395
8	0.166916	0.231849	4.274812	0.371251	12.335803	14.189433	0.395333
9	0.185077	0.298956	2.575012	0.737015	14.540488	13.018182	0.592655
10	0.039453	0.239708	1.237301	0.714868	13.738186	13.961779	0.019614
11	0.155364	0.120459	3.664835	0.128758	13.968844	10.897971	0.348100
12	0.114585	0.261586	1.784999	0.489818	11.676686	11.817922	0.958709
13	0.074132	0.113085	0.168243	0.361055	12.403512	10.030331	0.780304
14	0.125671	0.172620	2.243686	0.386179	14.729687	13.321995	0.977003
15	0.197555	0.145545	4.553300	0.625993	11.044226	10.291775	0.702575
16	0.111633	0.209954	2.939609	0.465666	14.208762	11.182091	0.763889
17	0.099961	0.138505	3.765241	0.030199	12.058338	10.654501	0.038554
18	0.032217	0.235183	4.716094	0.224655	13.895373	12.428741	0.442455
19	0.087545	0.250956	2.493906	0.972490	13.501133	11.628705	0.629117
20	0.079181	0.109904	3.152474	0.801282	13.471743	11.307619	0.925028
21	0.186814	0.212415	1.760755	0.766256	14.987510	14.726486	0.176318
22	0.088964	0.160811	1.490295	0.123413	10.849416	13.278679	0.263240
23	0.122988	0.284068	2.429327	0.592769	10.470285	12.325289	0.405024
24	0.026182	0.204374	4.931655	0.156479	11.135790	12.527733	0.235172
25	0.151733	0.256406	0.770869	0.517453	10.936935	12.189044	0.105411
26	0.129251	0.226350	4.456843	0.238534	11.440466	14.341981	0.146293
27	0.158677	0.279158	1.049345	0.082118	11.503272	13.661504	0.675995
28	0.044414	0.133110	0.489562	0.682752	11.953141	12.231597	0.484533
29	0.047032	0.220143	2.046230	0.402257	13.301214	14.895942	0.830231
30	0.139758	0.201843	4.852871	0.572439	12.562844	14.610093	0.608437
31	0.145235	0.185034	3.405676	0.536250	11.372711	11.752122	0.288744
32	0.022745	0.124947	1.601113	0.655625	10.097336	11.988407	0.871940
33	0.105943	0.158020	3.567968	0.662704	12.759740	14.547573	0.646999
34	0.067095	0.281144	2.657010	0.932130	14.058247	13.580063	0.810015
35	0.095066	0.266337	0.605780	0.248594	13.143416	12.748950	0.318998
36	0.178696	0.142271	3.887490	0.299538	14.449008	10.709081	0.277599
37	0.189534	0.100527	2.833761	0.179466	11.209249	12.931609	0.503187
38	0.136791	0.154748	3.021646	0.276294	14.889010	10.416115	0.432814
39	0.176826	0.270708	4.078908	0.845699	12.276264	11.480877	0.202349
40	0.106603	0.274593	0.875880	0.101857	11.742971	13.101020	0.221728
41	0.059221	0.130004	4.122003	0.606201	10.706087	12.061205	0.892999
42	0.169417	0.164906	1.134970	0.872666	14.375552	11.061248	0.323963
43	0.073724	0.219424	2.096114	0.957559	10.143611	11.256390	0.836476
44	0.035543	0.191936	1.891513	0.013676	10.211157	11.525980	0.531643
45	0.081527	0.198758	3.251619	0.192598	13.253799	14.983523	0.934755
46	0.064496	0.176742	0.329414	0.715988	14.191205	12.682700	0.744238
47	0.028722	0.255675	2.270806	0.791544	12.183257	13.852691	0.059637
48	0.118118	0.195794	4.325877	0.304809	13.637606	10.525400	0.555531
49	0.172703	0.244258	1.308726	0.340048	10.511005	10.924541	0.476248

Set up model parameters

In this cell, we setup the model. More detail can be found in the “Fit feedforward Neural Network model” notebook

train_in, test_in, train_out, test_out = train_test_split(
    inputs, outputs, test_size=0.2
)

scaler = StandardScaler()
inputs = scaler.fit_transform(X=train_in)

scaler_out = StandardScaler()
outputs = scaler_out.fit_transform(X=train_out)

hyperparameters = {
    "hidden_layer_sizes": (
        (len(parameters) * 16,),
        (len(parameters) * 16, len(parameters) * 16),
    ),
    "alpha": [1, 10, 10 ** 3],
}

neural_net = MLPRegressor(max_iter=1000, early_stopping=False)
folds = 3

Model fitting with Dask

Here, we use the NN model from ScikitLearn. In a different example we use TensorFlow (with and without the Keras wrapper).

Below we parallelize the model fit. Normally, SciketLearn uses joblib to parallelize model fitting. By specifying the parrallel backend to be Dask, joblib switches over to using the Dask scheduler. For this example, using Dask may not be any faster. This is because joblib also has the ability to parrallelize accross cores. An example where this tool would be useful is when Dask is using a ditributed network with access to more cores.

%%time
with joblib.parallel_backend("dask"):
    clf = GridSearchCV(neural_net, hyperparameters, iid=True, cv=folds)
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=FutureWarning)
        clf.fit(inputs, outputs.ravel())

print(f"Best performing model $R^2$ score on training set: {clf.best_score_}")
print(f"Model $R^2$ parameters: {clf.best_params_}")
print(
    f"Best performing model $R^2$ score on a separate test set: {clf.best_estimator_.score(scaler.transform(test_in), scaler_out.transform(test_out))}"
)

Best performing model $R^2$ score on training set: 0.9859359222733858
Model $R^2$ parameters: {'alpha': 1, 'hidden_layer_sizes': (112,)}
Best performing model $R^2$ score on a separate test set: 0.9973443498852271
CPU times: user 565 ms, sys: 69.8 ms, total: 635 ms
Wall time: 4.38 s

Surrogate Modelling Evaluator object

We can wrap the fitted model in a BESOS Evaluator. This has identical behaviour to the original EnergyPlus Evaluator object.

The parrallelization occurs when calling the df_apply function.

def evaluation_func(ind, scaler=scaler):
    ind = scaler.transform(X=[ind])
    return (scaler_out.inverse_transform(clf.predict(ind))[0],)


NN_SM = EvaluatorGeneric(evaluation_func, problem)

Running a large surrogate evaluation

Here we bump up the sample count to 50,000 and partition the data into 4. (if you have more cores available, feel free to try increasing the proccesses)

%%time
inputs = sampling.dist_sampler(sampling.lhs, problem, 50000)
outputs = NN_SM.df_apply(inputs, processes=4)
results = inputs.join(outputs)
results.head()

CPU times: user 724 ms, sys: 149 ms, total: 873 ms
Wall time: 9.38 s

	Conductivity	Thickness	U-Factor	Solar Heat Gain Coefficient	ElectricEquipment	Lights	Window to Wall Ratio	Electricity:Facility
0	0.192527	0.153194	4.481314	0.567995	11.835818	13.052947	0.857530	2.057846e+09
1	0.080337	0.110154	2.589947	0.811769	11.163278	13.104261	0.966630	1.995323e+09
2	0.095729	0.285002	4.567989	0.364190	14.112162	12.881720	0.017776	2.154193e+09
3	0.156267	0.214472	2.800968	0.401643	11.264160	11.086406	0.247729	1.882376e+09
4	0.196672	0.273954	2.064289	0.267733	10.874480	10.699428	0.980151	1.845306e+09