SOSstat - Descriptive statistics

SOSstat

Descriptive statistics

At the end of an experiment, a series of testing or sampling, a set of raw data is available that is desirable to provide a synthetic representation. The purpose of descriptive statistics is indeed to provide tools to make a summary of this data set. Tools can be cut into two families: the graphical summaries and digital summaries.

SOSstat offers many descriptive statistics functions in its analysis modules.

SOSstat

Graphical summaries

There are many graphic tools, but some are essential such as:

  • The histogram that provides an approximation of the distribution of data
  • The Boxplot cuts the sample into 4 parts having the same number of individuals (quartiles). This representation, very synthetic, is used to compare graphically several samples at the same time.

Density estimation by kernel method

Numerical summary

The digital summary is to calculate parameters representative of the specific characteristics of the population. It essentially identifies three groups of parameters : Centering, dispersion and shape parameters.

SOSstat can easily calculate the parameters of a large number of samples.

Numericla summary with SOSstat

Centering parameters or position

The most widely used centering parameters are :

The arithmetic mean :

It represents the center of gravity of the sample (it expresses the value that would have each member of the sample if they were all identical without changing the overall size of the sample)

$$\bar{x}=\frac{1}{n} \cdot \sum_{i=0}^n x_{i} $$

The median :

Value that allows you to share an ordered numeric series into two parts of the same number of elements. For an odd sample, the median is defined by the relation

$$\tilde{x}= x_{ \left( \frac{n+1}{2} \right)} $$

and for a even sample by

$$\tilde{x}= \frac{ x_{\left( \frac{n}{2} \right) } + x_{\left( \frac{n}{2} +1 \right) }}{2} $$

Dispersion or scale parameters

The most used dispersion parameters are :

Range :

distance between the min and max value of a sample

$$R= \max (x_{1} \ldots x_{n}) - min (x_{1} \ldots x_{n}) $$

Variance :

Measuring the concentration of data around the average.

$$\sigma^2= \frac{1}{n-1} \sum_{i=1}^{n} (x_i-\bar{x})^2 $$

Standard deviation :

Root of variance

$$\sigma = \sqrt{ \frac{ \sum_{i=1}^{n} (x_i-\bar{x})^2}{n-1} } $$

Shape parameters

Forms parameters are usually called Skewness for symmetry and Kurtosis for flattening, but under that name can hide different forms of calculation.

Bibliography

DROESBEKE, J. Éléments de Statistique , Éditions Ellipses, 2015, ISBN-13: 978-2340009080 GoogleBooks

SAPORTA, G. Probabilités, analyse des données et statistique, Technip, 2011- 622 pages, ISBN-13: 978-2710809807 GoogleBooks