The skew of a distribution is a measure of its asymmetry. For example, the normal distribution is not skewed as it is perfectly symmetric. However most distributions are either positively or negatively skewed.
Negative skew would be indicated by a graph with a long left tail and a large area below the curve toward the right hand side of the distribution.
Positive skew would be indicated by a graph with a long right tail and a large area below the curve toward the left hand side of the distribution.
Given a set of data, the range is the difference between the largest and smallest observations.
Dr. Lee Fawcett calculates the range of two sets of data.
Given an ordered set of data, the $3$ quartiles are the points that split the data into $4$ sets, each (as far as is possible) of equal size. The first quartile, denoted as $Q_1$, is the value that splits off the lowest $25$% of data from the highest $75$%. The position of $Q_1$ is calculated by \[\text{position of }Q_1=\frac{n+1}{4}\text{.}\] The second quartile, $Q_2$, cuts the data set in half and is another word for the median and is calculated in the same way. The third quartile, $Q_3$, is the value that splits off the highest $25$% of data from the lowest $75$%, the position of $Q_3$ is calculated by \[\text{position of }Q_3=\frac{3(n+1)}{4}\text{.}\] If the position of the quartile lies between two data values, add these data values together and divide by $2$ to find the value for the quartile.
This is a video on quartiles produced by Alissa Grant-Walker.
The interquartile range is the set of numbers between the first and third quartiles. By the definition of quartiles, we can see that the interquartile range is composed of the middle $50$% of the data. The interquartile range can be found as follows: \[\text{I.Q.R}=Q_3-Q_1\text{.}\]
This is a video on calculating the interquartile range produced by Alissa Grant-Walker.
Dr. Lee Fawcett calculates the interquartile range of two sets of data.
An outlier is an observation in a data set which is far removed in value from the others in the set. It is an unusually large or an unusually small value compared to the others.
An outlier might be the result of an error in measurement, in which case it will distort the interpretation of the data having, for example, undue influence on the mean. However an outlier may be a genuine result indicating, perhaps, a peculiarity of the process under study. For this reason, all outliers must be examined carefully and not be routinely removed without further justification.
There is no rigorous mathematical definition for what exactly is or isn't an outlier, however there are a few tests and criterions that can be applied. These include Chauvernet's criterion, Peirce's criterion, Grubb's test for outliers and Dixon's Q-test.
Commonly used rules for recognising outliers include \[\text{lower outlier(s)}<Q_1-(1.5\times\text{IQR})\] and \[\text{upper outlier(s)}>Q_3+(1.5\times\text{IQR})\text{.}\]
Find an outlier given the data set \[5, 2, 6, 3, 37, 4, 7, 4, 1, 8, 0.\]
We can see the observation $37$ is extremely large with respect to the rest of the data, hence we can immediately determine that it is an outlier.
This is a video on detecting outliers produced by Alissa Grant-Walker.
These workbooks produced by HELM are good revision aids, containing key points for revision and many worked examples.