Value Stream Data Analysis with Excel

After we have identified what needs to be improved in our value stream we want to look for the root cause of problems. Due to the existence of variation we will need statistical data analysis. Microsoft Excel provides effective data analysis tools we can effectively use to do this.

The steps for value stream improvement are:

Map it as it is really happening today in Gemba with a Value Stream Map.
Collect process data in the line.
To understand the value stream dynamic behavior we will need statistically significant data to enable simulation
Check for stability with Statistical Process Control
Know to what extent it is able to meet the customer requirements with Process Capability
Perform additional data analysis to get to the root cause

Now we will continue our value stream data analysis both for variables and attribute data.

Hypothesis Testing

A comparison statement is composed of two hypothesis:

Null hypothesis H0
Alternative hypothesis Ha

The null hypothesis is what you are trying to disprove. It is the hypothesis of no difference or status quo

The alternative hypothesis is what you try to prove. It is the hypothesis of the difference.

Together H0 and Ha cover all possibilities: only one of them must be true.

It is easier to disprove something than it is to prove it.

Logical Sequence

Start by assuming that H0 is true
Try to prove Ha

If you find enough evidence to prove Ha then you can reject H0
If you don't find enough evidence you fail to reject it. This doesn't prove that H0 is true.

In practice however, if we don't find enough evidence we will collect more data or accept H0.

Example of a Comparative Statement

H0: "All trees are green"

Ha: "Not all trees are green" =

"There is at least one tree which is not green"

To prove H0 is impossible

Proving Ha is easier: all we need to do is find one tree which is not green

If we find a tree which is not green that will prove Ha and then we can reject H0

If we are unable to find a non-green tree that doesn’t prove H0

Truth Tables

Confidence intervals are another alternative to hypothesis testing

Download Excel file Data Analysis.xlsx from OneDrive to your PC to run the following examples.

Variables Confidence Intervals Analysis

In Sheet Variable we have an example of confidence interval comparisons:

The mean confidence interval depends on:

Sample size
Sample mean
Sample standard deviation
Confidence level

Let us see how each of these affect the confidence interval:

If we compare samples A and B: an increase of the mean just shifts up the interval
Comparing A and C: an increase of the standard deviation increases the size of the interval
Comparing A and D: an increase of the sample size decreases the size of the interval

If we now look at the effect on the standard deviation confidence interval:

In this case we see intervals are not symmetrical as in the case of means
Comparing A and B: Standard deviation in independent of the mean, therefore its interval is not affected
Comparing A and C: Increasing the standard deviation its interval shifts up and increases in size
Comparing A and D: the increase of sample size reduces the interval

Attributes Confidence Intervals Analysis

In sheet Proportion we can calculate the confidence interval for proportions:

In this case:

If we compare A and B: Reducing fails shifts down the interval and reduces its size
Comparing A and C: The average proportion is still 10% but the size of the interval is reduced

Variables and Attributes Tests

We are going to analyse continuous and discrete data.
For continuous data we will analyse the mean and the standard deviation
For discrete data: proportions

On each of these cases we will compare:

Versus the standard
Two samples
More than 2 samples

Test Conditions

All tests have a basic prerequisite which is stability along time. You can check this with SPC.

Some tests require data to follow a normal distribution you can use the Anderson-Darling test and use the Box-Cox transformation if required in Statistical Process Control.

1 Sample Test for the Mean

We are looking to increase our capacity and a supplier has suggested a piece of equipment to do that. The current average capacity of our equipment is 5000 units/ hour with a standard deviation of 500.

We have tested the new equipment taking a random sample of 20 hours of production obtaining an average production of 5150 units/ hour. Can we say that the capacity of the new equipment is higher than our current one? We accept a 5% error in our decision.

We can use the confidence interval as shown in sheet 1t:

The confidence interval includes our current production capacity of 5000 so we conclude that this test doesn't prove that its capacity is higher.

Another alternative to test this is a 1 sample t-test:

We want to prove the alternative hypothesis Mean > 5000 and the p value 0.098 > 0.05 means the error if we accept the alternative is too high.

So we get the same results with both approaches.

2 Sample Test for the Mean

We want to compare a critical dimension in a part produced by two different molds. To do that we run 50 parts with each of the molds and we obtain the results in sheet 2t. Can we conclude that the two molds are equal?

We will use Excel Data Analysis:

Results:

p Value lower than 0.05 is telling us that the two molds are significantly different.

Multiple Means Analysis with Anova

We have collected average customer satisfaction data from 3 types of customers in sheet Anova.

We want to know if there are significant differences among them. We will use Excel Data Analysis:

Results:

We conclude that at least one of the customer types is different. We could now do comparisons of two to see if type 3 is more satisfied than the other 2 or if 1 and 2 are different:

We confirm there is no significant difference between customer Sat 1 and 2

2 Factor Anova

We have collected hourly production data in sheet ANOVA2.

We want to find out if there is a significant difference between the different days of the week and the 3 shifts.

From the time stamp we can obtain the day of the week: =WEEKDAY(A2;2). This is to start counting 1 as Monday.

In order to extract the shift we first extract the hour from the time stamp: =HOUR(A2)

To obtain the shift from the hour: =IF(D2<6;"N";IF(D2<14;"M";IF(D2<22;"A";"N")))

Considering:

Morning from 6:00 to 14:00
Afternoon from 14:00 to 22:00
Night from 22:00 to 6:00

Now we will use a pivot table to format the data:

Results:

Now we use Data Analysis: Anova two factor with replication

It is obvious that production on Saturdays and Sundays is much lower so we will focus our analysis in Monday to Friday.

Results:

Afternoon shift is the most productive and night the least

Thursdays and Fridays are the most productive and Mondays the least.

Paired Comparison

We want to compare the durability of materials used for children shoe soles.

To do that we took a sample of 20 children and measured the duration of their shoe soles with one material and then the same children with the other material keeping track of what child had what results. The results are in sheet Shoes.

We start our analysis with a two sample t-Test:

Results:

Although the average duration of material B is 61 days Vs 54 days for material A the p value of 0.19 > 0.05 tells me this difference is not significant.

The reason may be the fact that there is a huge variation in duration depending on the child.

Since we have collected the data of which child had which duration we are able to do a paired comparison of this data:

This paired comparison eliminates the effect of the huge variation among the children and it confirms that the difference in duration between materials is significant: material B is better.

1 Standard Deviation Test

We are trying to reduce the standard deviation of a critical dimension in order to improve the Cpk of its manufacturing process.

The previous standard deviation was 5 mm and after a process change we took a sample of 20 parts and obtained a standard deviation of 4 mm.

Has the process improved?

In sheet 1 Std Dev we start this analysis looking at the sample standard deviation confidence interval:

The previous standard deviation of 5 falls inside the interval so this means there is no significant improvement.

Let us now do this analysis with the Chi squared distribution:

The p value above 0.05 confirms there is no significant difference between before and after.

F Test for Two Variances

We want to compare two processes to choose the best one to reduce the variance of a critical dimension.

We have taken a sample of 31 parts from the original process and 42 parts of the new process and collected the variances in sheet 2Variances

We will use the F test for variances in Data Analysis:

There is no significant difference in variance from the original to the new process

1 Proportion Test

A soldering process has a 4% defect rate. We have changed some process parameters and we want to check if the process has improved.

To do this we took a sample of 100 products with the new process and found 2 defective.

Can we conclude that the process has improved?

In sheet 1Proportion we have checked with the confidence interval and then with the Z distribution:

The old rate 4% is inside the sample confidence interval so there is no significant improvement.

The p value 0.154 > 0.05 of the Z distribution also confirms this.

2 Proportion Test

We want to compare the quality level of two suppliers so we took a sample of parts from each and compared their defective rates.

In sheet 2Proportion we have analysed with confidence intervals and with the Z distribution:

The confidence intervals have some overlap but the p value 0.015 < 0.05 of the Z distribution confirms that the defective rate of the first supplier is significantly higher than the second.

Chi2 Test for Attributes

In some cases we have attributes which are not binary for instance:

Fail
Marginal
Pass

We want to compare several suppliers using these attributes, then we use the Chi squared distribution as shown in the example in sheet Chi2:

In this example we conclude there is no significant difference between the three suppliers.

Defect Rate Comparisons

We have collected parts produced and defects every hour in sheet Defects

We want to see if defect rates are different on the different days of the week and the different shifts.

From the time stamps we can extract the DOW and Shift as done before:

Now we will use a pivot table to add production and defects for the different DOWs:

From each of these values we calculate the confidence intervals:

Graphically:

The conclusion is that there is no significant differences in the defect rates of the different days of the week.

The last interval corresponding to Sunday is much wider because the production in much lower.

We repeat the same with the shifts:

Graphically:

Here we see a significant difference in the defect rate of the three shifts:

Afternoon 1% average
Morning 2%
Night 3%

Lot Size Calculation

We have seen so far the impact of the lot size in our analysis.

Now we want to find out what lot size do we need to run our analysis.

In our t distribution for analysis of means lot size will depend on:

difference in mean we want to be able to detect δ
process standard deviation σ
α risk
β risk

See an example in sheet Lot size:

In the first case Vs standard we need a lot size of 35 to detect a difference of 5 in the mean.

In the second case of two samples we need 69 of each sample to detect a difference of 20 between their two means.

For proportions:

If we want to detect a 1% proportion when the current standard is 2% we will need to analyse a lot of 848 units.

To detect a difference between two samples with 2% and 4% proportions we will need 1529 units of each.

We notice the great differences in lot sizes required between variables and attributes.

Conclusions

Process data analysis enables decision taking based on significant statistical evidence rather than impressions

The risk of making the wrong decision can be evaluated and it depends on the sample size analysed

The sample size required when we use variables is much lower than using attributes

Confidence intervals are useful graphic aids to interpret the results and make decisions

Excel Data Analysis is a useful analysis tool

Further analysis: Correlation, Regression and Design of Experiments

Search This Blog

Polyhedrika