Design of Experiments (DOE) with Excel
Download this Excel file DOE_with_Excel.xlsm from OneDrive
Factorial DesignsFull factorial designs are constructed simply counting in binary to obtain all the different combinations of 0 and 1 for all factors. In our designs we just replace 0 with -1 to meet the balance property.
1/2 Fraction DesignWith our 5 factors to run a full factorial set of experiments we would need 2 ^ 5 = 32 experiments. This could involve time and money before we are sure that all factors really affect our process. Therefore we might start with a subset of the full factorial as a detection experiment. A 1/2 fraction will involve 1/2 the number of experiments: 16.
We can build a 1/2 fraction 5 factor design from a 4 factor full factorial:
We do this by replacing the 4 factor interaction (which is very unlikely to happen) with the 5th factor (Flow). We construct therefore the Flow column multiplying the other 4 columns.
Run the Experiments
To run the experiments you need to convert the coded values to uncoded as seen in sheet Run experiments:
In order to randomize the order in which to run the experiments use a macro pressing Ctrl + r and then follow the order indicated in column M:
In this example we are going to use a simulator with the coded values to simulate the running of our experiments in the Experiments simulation sheet:
We have added 2 additional experiments with the values called central points (all zeros) which will correspond to the average value of the high and low levels in each factor.
We do this to make a more robust design without going all the way to the full 32 experiments.
The simulator has provided the process loss outputs for each experiment.
Factor InteractionsWe want to be able to estimate the influence of each individual factor in the result but also the interactions among them so before we analyse the results we will build a column for each interaction among 2 factors. If we wanted to include 3 factor interactions we would also add them to this matrix.
You obtain each interaction column just by multiplying the corresponding factor columns. Just enter the formula in the first line and replicate all the way down.
Regression with Excel Data AnalysisEnter in Excel: Data/ Data Analysis/ Regression
And the result will be:
We notice that R^2 = 0.99 which validates our mathematical model.
The Probability column (p) tells us which factors and interactions are significant: those below 0.05.
Factors Temp and Press have a p value above 0.05 but their interaction is below (0.02) therefore we must keep them.
We run the Regression again keeping only the factors and interactions not removed:
This gives us the formula for the estimation of Process Loss as a function of 3 factors (out of the original 5 we thought relevant):
This equation is with factors in coded values.
Loss = 20.24 + 6.85 pH -1.4 Temp - 0.06 Pres + 4.41 Temp * Pres
This equation is with factors in coded values.
Process OptimizationWe now want to calculate with this mathematical model of our process which are the pH, Temp and Pres values which will minimize Process Loss.
In order to do this we will use Excel Solver. (See Theory of Constraints )
We add cells M17:M21 to house the results we are looking for. M17 is the factor to multiply with the interception so it must be 1. Cell M21 should have de product Temp*Pres so we input the formula (=M19*M20).
So we will just ask Solver to give us the values for the yellow cells.
In M23 we put the formula to calculate loss with the coefficients in column I: (=SUMPRODUCT(I17:I21;M17:M21)
We now call Solver to minimize this value changing pH, Temp and Pres with the restrictions that they must be between -1 and +1.
This gives the result:
The result is that the minimum loss of 7.64 can be obtained with pH = 2 (coded -1), Temp = 150 (coded 1) and Pres =1 (coded -1)
Confirmation ExperimentsWe would now run confirmation experiments to validate these optimal values. With our simulator:
We obtain a confidence interval for the process loss with our optimal factor values.
Graphical RepresentationWe can view a graphical representation of our mathematical model to help us understand the process behavior:
Entering a value for pH we can see the Loss for each Press-Temp combination.
With the color code we clearly identify the minimum value in dark green: 7.6 which corresponds to Press = -1 and Temp = 1 as calculated by Solver.
Which in 3D would be:
In spite of the curvature of this shape the behavior of each factor is linear. The twisting of the shape is due to the interaction between Pressure and Temperature.
Residuals Analysis to Validate the ModelWe want to estimate the error we make by replacing our process behavior with this mathematical model.
We subtract the actual Loss values measured in our process from the estimates of our formula to obtain the residuals.
We see no trends of the residuals along the estimate range so this is OK.
Test for Normality of the ResidualsThe next check is to make sure that the average of the residuals is close to 0 and they follow a normal distribution:
If the P value for this normality check was below 0.05 this would indicate lack of normality. Since it is 0.983 well above, it passes the test.
Test for Stability of the Residuals
We want to check that the process has not significantly changed during the time of our experiments:
See SPC Analysis
No out of control alerts appear in this SPC chart of the residuals so no evidence of significant changes.
Notice that we have a timestamp next to each residual. That would be the timestamp of the moment each experiment was performed. It is essential to collect these timestamps to investigate any anomalies detected during the analysis. Timestamp
- Design of Experiments (DOE) is a powerful methodology for process improvement
- It enables the identification of critical process factors based on data rather than impressions
- We can estimate the optimal values of these critical factors to optimize the process
- Excel provides some useful Data Analysis tools to achieve this
- Some processes outputs don't have a linear relation with the critical factors. In this case we will need a more sophisticated formula such as RSM: Response Surface DOE with Excel
- Another non-linear example: The Catapult exercise