Kaplan-Meier estimator-Wikipedia


L’ Kaplan-Meier estimator [ first ] , [ 2 ] , also known as the Limite product estimator, is an estimator to estimate the survival function according to lifespan data. In medical research, it is often used to measure the fraction of patients alive for a certain duration after their treatment. It is also used in economics and ecology.


This estimator owes his name to Edward L. Kaplan and Paul Meier.

A kaplan-meier estimate curve for the survival function is a series of horizontal steps of decreasing magnitude which, when a sufficiently large sample is used, allows to approach the real survival function in this population. The value of the survival function between the successive samples observed is considered to be constant.

An example of Kaplan-Meier curve for 2 variables associated with the survival of patients

An important advantage of the Kaplan-Meier curve is that this method can take into account certain types of censored data, in particular censored by the right, which intervenes when a patient disappears from a study, that is to say To say that we no longer have your data before the expected event (for example the death) is observed. On the graph, the small vertical features indicate these censorship. If no truncation or censorship intervenes, the Kaplan-Meier curve is equivalent to the survival function.

Either S ( t ) the probability that a member of a given population has a lifespan above t . For a size sample N in this population, the durations observed until each death of the members of the sample N are :


Every n i corresponds a t i , n i being the number of people “at risk” just before time t i , And d i The number of deaths at time t i .

We note that the intervals between each event are not uniform. For example, a small amount of data can start with 10 cases. Suppose that subject 1 dies on day 3, subjects 2 and 3 on day 11 and subject 4 disappears from monitoring (censored given) on day 9. The data for the first 2 subjects would be as follows:

first 2
3 11
first 2
ten 8

The Kaplan-Meier estimator is the estimation of the maximum non-parametric likelihood of S ( t ). It is a product of the form:

When there is no censorship, n i is the number of survivors just before time t i .
When there is censorship, n i is the number of survivors less the number of losses (censored cases). It is only these surviving cases that continue to be observed (which have not yet been censored) that are “at risk” of death (observed).

Here another possible definition sometimes used:

The two definitions differ only at the times of the events observed. The last definition is “continuous on the right” while the first is “continues on the left”.
Either T the random variable that measures the time of failure and is F ( t ) its cumulative distribution function. We take note that :
