Training
Consulting
International Aid
Monday
Nov262012

What is the difference between linear, logistic and Poisson regression?

When we first encounter regression we typically think of fitting a straight line to a set of points (see Figure 1). However we quickly come across the terms linear, logistic, and Poisson regression. Within this tutorial we will review these different types of regression, and will explain when to use each of these different types of regression within a statistical analysis.

Linear regression – this is the first type of regression we typically encounter. We use linear regression when we have a continuous outcome variable (Y) and we want to explore how Y changes as a function of one or more predictors (X). In Figure 1 we have considered a weight loss program, and have measured the weight of a patient at various timepoints throughout the program. Weight can be considered as a continuous variable (or more precisely that the values for the weight are normally distributed around the regression line). For a simple regression model we consider our outcome variable Y as being a function of one predictor X. However in more complex models we often include additional predictors X, where these additional variables are called confounders. It should also be noted that a linear regression model doesn’t determine whether there is an association between X and Y, it determines whether there is a “linear” association. There may be any other relationship between X and Y other than a linear relationship, and hence it is always a good idea to plot your data to determine if such a relationship is linear or non-linear.

Logistic regression – In Figure 1 our outcome variable was continuous, but this isn’t the only kind of outcome variable that we can have. In Figure 2 we consider a group of children were each child is either male or female. We might be interested in knowing whether a mother’s diet during pregnancy was an important predictor for the gender of their child. Again it should be noted that the outcome variable (the variable on the Y-axis) is a binary variable (a categorical variable with two states – male and female).

It should also be noted that the most common form of logistic regression is to have an outcome variable that exists in two states (eg. male/female, black/white). More complex versions of logistic regression allow for a categorical variable that exists in more than two states (eg. blue/orange/red/green). We will explore these extended versions of logistic regression in a later tutorial.

Poisson regression – The third type of regression that is commonly encountered is Poisson regression. This is used when we have an outcome variable that is a count of things. Common examples of counting variables include weight (a number of kilograms), time (a number of years), and class size (a number of people). When we produce a histogram of a count variable we observe a distribution called a Poisson distribution (see Figure 3). This distribution rises steeply and falls off slowly, and is only defined for values that are non-negative (it doesn’t make sense for a person to weigh less than zero kilograms). In Poisson regression we are interested in exploring questions such as what variables are important for predicting whether one person weighs more than another person.

Other types of regression – Linear, logistic, and Poisson regression are the three most commonly encountered types of regression. However there are other kinds of regression for other kinds of outcome variables.

If you would like to find out more: