Statistics

Statistics#

Introduction#

Quantitative data is often summarized and analyzed with statistical methods and visualized with plots/graphs/diagrams. Statistical methods reveal quantitative trends, patterns, and outliers in data, while plots and graphs help to convey them to audiences. Carrying out a suitable statistical analysis and choosing a suitable chart type for your data, identifying their potential pitfalls, and faithfully realizing the analysis or generating the chart with suitable software are essential to back up experimental conclusions with data and reach communication goals.

Dimensionality reduction#

What is it?#

Dimensionality reduction (also called dimension reduction) aims at mapping high-dimensional data onto a lower-dimensional space in order to better reveal trends and patterns. Algorithms performing this task attempt to retain as much information as possible when reducing the dimensionality of the data: this is achieved by assigning importance scores to individual features, removing redundancies, and identifying uninformative (for instance constant) features. Dimensionality reduction is an important step in quantitative analysis as it makes data more manageable and easier to visualize. It is also an important preprocessing step in many downstream analysis algorithms, such as machine learning classifiers.

Batch correction#

What is it?#

Batch effects are systematic variations across samples correlated with experimental conditions (such as different times of the day, different days of the week, or different experimental tools) that are not related to the biological process of interest. Batch effects must be mitigated prior to making comparisons across several datasets as they impact the reproducibility and reliability of computational analysis and can dramatically bias conclusions. Algorithms for batch effect correction address this by identifying and quantifying sources of technical variation, and adjusting the data so that these are minimized while the biological signal is preserved. Most batch effect correction methods were originally developed for microarray data and sequencing data, but can be adapted to feature vectors extracted from images.

Normality testing#

What is it?#

Normality testing is about assessing whether data follow a Gaussian (or normal) distribution. Because the Gaussian distribution is frequently found in nature and has important mathematical properties, normality is a core assumption in many widely-used statistical tests. When this assumption is violated, their conclusions may not hold or be flawed. Normality testing is therefore an important step of the data analysis pipeline prior to any sort of statistical testing.

Statistics

Contents

Statistics#

Introduction#

Dimensionality reduction#

What is it?#

Batch correction#

What is it?#

Normality testing#

What is it?#