Posts

Showing posts with the label Scatter chart

Simpson's Paradox Analysis with Excel

Image
  Probability and Risk: Norman Fenton   Simpson's Paradox is a classic example of a regression misleading interpretation. In this graph, looking at the overall data we see a positive correlation between daily exercise and junk food consumption: the more exercise, the more junk food consumption.  On the other hand looking at the age subgroups it seems the correlation is negative in each one of them.  How could negative correlation of each individual subgroup become positive when subgroups are compounded? This effect when we compound data from several populations is known as Simpson's Paradox. We will analyze this effect with Microsoft Excel. Download Excel file Simpson  from OneDrive to your PC to run this analysis. We have a process described by Y = f ( X ) and we know there is a correlation between input variable X and output Y so we try a linear regression between the two with Excel. We select columns X and Y in the table and select a scatter chart.  We enter a linear trend