Some consideration in interpreting correlation
- Correlation represents a linear relationship.
- Tells you how much two variables are linearly related.
- Restricted range:
- Correlation can be deceiving if full information about each variable is not available.
- Outliers:
- Describes strange outcomes.
- Eg: No heart beat lies between 60—100 bpm but a patient has 40 bpm. Then the doctor considers the heart beat as outlier.
- It is a data point that is an abnormal distance from the other values in a data set whose concept is in exact.
Types of outliers
a) On-line outliers :
- Falls near the regression line.
- Inflate the correlation coefficient.
b) Off-line outliers:
- Deflate the correlation coefficient.
- Falls some distance aways from the original.
Reasons for Outliers and ways of reducing it.
a. Reasons for outliers:
- Incorrect data entry
- Failure to specify missing values.
- Extreme cases
b. Reducing outliers:
- Ensure proper data entry
- Delete variable causing outliers