• Users Online: 576
  • Home
  • Print this page
  • Email this page
Home About us Editorial board Ahead of print Current issue Search Archives Submit article Instructions Subscribe Contacts Login 


 
 Table of Contents  
REVIEW ARTICLE
Year : 2016  |  Volume : 2  |  Issue : 2  |  Page : 73-79

Survival analysis: A brief note


Department of Community Medicine, North DMC Medical College and Hindu Rao Hospital, New Delhi, India

Date of Submission01-Sep-2016
Date of Acceptance25-Oct-2016
Date of Web Publication13-Jan-2017

Correspondence Address:
Sandeep Sachdeva
Department of Community Medicine, North DMC Medical College and Hindu Rao Hospital, New Delhi - 110 007
India
Login to access the Email id

Source of Support: None, Conflict of Interest: None


DOI: 10.4103/2455-3069.198374

Rights and Permissions
  Abstract 

This manuscript briefly describes the concept and terminologies related to survival analysis, characteristics, need, data mechanism, functions, and application in health sciences along with different estimation procedure.

Keywords: Cox proportional hazard,life expectancy,lost to followup,mortality, truncation


How to cite this article:
Dwivedi N, Sachdeva S. Survival analysis: A brief note. J Curr Res Sci Med 2016;2:73-9

How to cite this URL:
Dwivedi N, Sachdeva S. Survival analysis: A brief note. J Curr Res Sci Med [serial online] 2016 [cited 2023 May 31];2:73-9. Available from: https://www.jcrsmed.org/text.asp?2016/2/2/73/198374


  Introduction Top


In medical research, it is a question of interest that - “What is the survival pattern of patient suffering from a particular disease before or after treatment;” what is the survival pattern of group treated by different treatment methods? Researcher could also be interested in to find out the effect of explanatory variable such as age, sex, economic status of patients, and/or some other fixed or time-varying variables on the survival time, for example, “does smoking decreases lifespan?” Does good eating habit and exercise increases the incubation period of a disease? Some other specific applications of survival analysis are:[1]

  • Remission duration in a clinical trial for acute leukemia
  • “Time” of infection in kidney dialysis patient
  • “Time” of death for breast cancer trial
  • “Time” of infection among burn patients
  • “Time” of death among kidney transplant patients
  • “Time” of death among patients with cancer.


Similar types of questions emanate from nonhealth sector also such as - Which manufacturing process increases the lifespan of a light bulb? What drives the duration of an individual's unemployment status; the period of strike, or the time duration of a recession? Statistical methods designed to describe, explain, or predict the occurrence of such type of events is called as survival analysis.[2],[3],[4] In this manuscript, we briefly describe the concept of survival analysis, applications, characteristics, need, functions, and different estimation procedure of survival time.


  What Is Survival Analysis? Top


Survival analysis is a method for analyzing data which is in the form of “times,” that is, from a well-defined time origin till the occurrence of some particular event or end-point. This type of data is called as lifetime, failure time, or survival data.[5] In medical research, the time origin corresponds to recruitment of subjects into an experimental study such as a clinical trial which, in turn, may coincide with diagnosis of a particular condition, the commencement of a treatment regimen, or the occurrence of some adverse event and “times” has nothing to do with “frequency” per se. If the end-point is death of a patient, the resulting data are literally referred to as survival time. Survival data include a response variable that measures the duration of time until occurrence of a specific event (event time, failure time, or survival time) and possibly a set of independent variables thought to be associated with the event time variable.[6] Theses independent variables can be either discrete, such as sex, race or continuous such as age or temperature.


  Characteristics of Survival Data Top


The characteristics of survival data include:[7],[8],[9],[10]

  • The time origin should precisely be defined for each individual (e.g., birth, time since entry into study)
  • The end event (failure) must be clearly defined (e.g., death, relapse, adverse drug reaction, development of new disease, recovery or heart attack, AIDS, etc.)
  • All individuals should be as comparable as possible at their time of origin (e.g., to determine the incidence of death due to breast cancer among breast cancer patients, every patient will be followed from a baseline date such as date of diagnosis or date of surgery until the date of death or termination of study)
  • Survival data can never be negative as they are the response time. Time is a positive real-value variable that may be in hours, days, weeks, months, or years from beginning until an event occurs, for example, time from diagnosis to full-blown disease; time to death; length of stay in a jail/hospital/school; and HIV viral load measurement.



  Goals of Survival Analysis Top


  • Estimate and interpret survival and/or hazard functions from survival data such as time until second heart attack for a group of myocardial infarction (MI) patients
  • Compare survival and/or hazard functions such as treated versus placebo MI patients in a randomized controlled trial
  • Assess the relationship of explanatory variables to survival time, for example, does weight, insulin resistance, or cholesterol influence survival time of MI patients.[11]



  Data Mechanism Top


In general, survival data are not completely known. The incomplete nature of survival data depends on whether incomplete information is due to restriction imposed by the researcher or due to any random reason. There are two data mechanism-censoring and truncation.

Censoring

In some settings, it is difficult to predict when the event will occur because the chronological timeframe needed to observe an event in lifetime of all subjects in a study population may practically be very large thus preventing full observation. This leads to a concept termed as “censoring.”[8] As the observations are lifetimes in nature which can be indefinitely long we use censoring to reduce time required for data collection.[11] For example, clinical trials are conducted over a finite period with staggered entry of patients, that is, patient enters a clinical trial over time, and thus, the length of follow-up varies for each individual, consequently, the time to event may not be ascertained on all patients in the limited study period.

In addition, some of the participants may be lost to follow-up (e.g., move-on or refuse to continue in study) before termination of the study. For unbiased analysis of survival curves, it is essential that censoring due to loss to follow-up should be minimal and truly “noninformative, “ that is, participants who drop out of the study should do so due to reasons unrelated to the study. Informative censoring occurs when participants are lost to follow-up due to reasons related to the study. Several methods have been described to deal with the problem of informative censoring.[12] These include imputation techniques for missing data, sensitivity analyses to mimic best- and worst-case scenarios and use of the drop-out event as a study end-point.

Censored observations can arise in three ways:[13]

  1. Event of interest does not occur during the study period
  2. Situation arising due to lost-to-follow-up
  3. Subject has died of some cause totally unrelated to disease in question.


Types of censoring

There are three main types of censoring: (a) right-censoring; (b) left-censoring; and (c) interval censoring.[14]

Right-censoring

It is obvious that not all subjects will experience the event of interest during the observation period. In such a case, we only know that an individual has not experienced the event before the end of the study. This is referred to as right-censoring. Three common situations where an individual's survival time is right censored are: (1) subject is lost to follow-up during the study, (2) a subject does not experience the event before the study ends, or (3) subject withdraws from the study. Right-censoring includes Type I, Type II, and random censoring schemes.[15]

Type I censoring

In this type of censoring scheme, the observational period is fixed. At the end of the study, any subject who has not yet failed is “censored.” In Type I censoring if there is no accidental loss, all censored observations equal the length of the study. Type I censoring is also known as “Time censoring.” Here, the number of items that fails before preassigned time t0 is a random variable. If we put n items on testing and say m items fails before t0 then (n-m) items that survived beyond t0 are included in the time-censored sample, for example, suppose that six rats (named, A, B, C, D, E, and F) have been exposed to carcinogens by injecting tumor cells into their footpads and the time to develop tumor of a given size is observed [Figure 1].[6] The investigator decides to terminate the experiment after 30 weeks of initiation of experiment, and it is observed that rats A, B, and D develop tumors after 10, 15, and 25 weeks, respectively. Rats C and E do not develop tumors by end of the study; their tumor free times are 30-plus some weeks. Rat F died accidentally without any tumor after 19 weeks of observation. The survival data (tumor free times) are 10, 15, 30+, 25, 30+, and 19 weeks (The plus sign indicates a censored observation.[6]
Figure 1: Type I censoring (example)

Click here to view


Type II censoring

Here, total number of failure is fixed in advance. We may put n items on test and terminate the experiment when preassigned number of items, say r (<n) have failed. The sample obtained through this mechanism is called “failure-censored” samples. As in above example, the investigator may decide to terminate the study after four of the six rats have developed tumors [Figure 2].[6] Drawback of Type I and Type II censoring schemes is that they do not allow the discretion of researcher for removal of any individual at points other than the terminal point of experiment.[2]
Figure 2: Type II censoring (example)

Click here to view


Random censoring

It is generalized Type I censoring wherein censoring time is random also known as Type III censoring. For example, in a medical trial, patients may enter the study in a more or less random fashion, according to their time of diagnosis [Figure 3].[6] If the study is terminated at some prearranged date, then censoring times, that is the length of time from an individual's entry into the study until termination of the study is random.[6]
Figure 3: Random censoring (example)

Click here to view


Left-censoring

Left-censoring occurs when the actual survival time is less than what is observed by the investigator. This can occur when an event has occurred by the time of the first examination, and all that is known is an individual's survival time is less than a certain value. For example, if a researcher wants to study the duration between HIV infection and full-blown manifestation of AIDS but may involve a practical problem: HIV carriers can only enter to sample if they have been positively tested for HIV but it is difficult to find out when they acquired the infection. The problem of not knowing the exact point in time an individual has entered the state of interest is referred to as left-censoring. That means in left-censoring all that is known is that individual has experienced the event of interest before start of the study [Figure 4].[16] It is to be noted that right-censoring is very common in lifetime data, but left-censoring is fairly rare.
Figure 4: Left-censoring (example)

Click here to view


Interval censoring

An observation is said to be interval-censored if we know that the event occurs in a time interval (left, right), but we do not know exactly when is this interval [Figure 5].[16] Interval-censored data commonly arise in studies where there is a nonlethal end-point, such as the recurrence of a disease or condition. A common example of interval-censored survival data occur in studies that entail periodic follow-up such as asthma, and AIDS.
Figure 5: Interval censoring (example)

Click here to view


Truncation

Truncation is very similar to censoring but intuitively different. The incomplete nature of truncation is due to a systematic selection process inherent in the study design, for example, in experimental design, truncation means omitting all data outside a particular boundary. For illustration-researcher wants to study the survival pattern of cancer patients aged 60 years and above in a certain area. Therefore, only those patients who are above sixty will be included in the study and others are excluded from the study; although, they are available for study. Similarly, if the researcher wants to find out the causes of anemia in children below age 15 years, then inclusion criteria to participate in the study would be patients below age 15 years only. It describes a sampling constraint that a failure time variable is observable only if it falls in certain region, say [YL, YR] as in above examples 60 and 15 are the defined boundaries. When the value of failure time falls outside the region, the information about the variable is completely lost and therefore excluded from the data set.[17] Type of truncation depends on which limit of the considered interval is known. If we fix the lower limit of interval then it is said to be left truncated otherwise right truncated sample.[18]


  Functions of Survival Data Top


The survival data are described or characterized by three functions:[19]

  1. Survival function or survivorship function
  2. Probability density function
  3. Hazard function.


Survival Function

The survival function summarizes information from survival data by giving survival probabilities for different values of time. A survival probability is a probability that a person survives longer than specified time (say t) or the probability that an individual survives from the time origin to some time beyond “t.” Survival function is denoted by S (t) and is mathematically defined as:[20]

  • S (t) = P (an individual survives longer than t)
  • = P (T > t).


Theoretically, all survival functions have the following characteristics [Figure 6]:
Figure 6: Characteristics of survival function

Click here to view


  • As time t increases, S (t) decreases
  • S (0) =1, since at the beginning of the study, no one has experienced an event, and the probability of surviving past time 0 is unity
  • S () =0 since if the study period were limitless, presumably everyone eventually would experience the event and the probability of surviving would ultimately fall to zero.


Probability density function

Probability density function is defined as the limit of the probability that an individual falls in the short interval per unit of time.

Hazard function

This is defined as the probability of failure during a very small time interval assuming that the individual has survived to the beginning of the interval. The hazard function is also known as the instantaneous failure rate or age-specific failure rate.

Life expectancy

The most popular measure of duration of survival is an expectation of life. The average number of years expected to be lived by individuals in the population is called life expectancy. The ideal method for computing the expectation of life is by observing a large cohort of live births as long as any individual of the cohort is alive. It may take more than 100 years and is impractical. Hence, as a shortcut a life table is constructed which assume that the individuals at different ages are exposed to the current risk of mortality. Thus, the current age-specific death rates are used on a preassumed cohort. The average of life so obtained is the number of years a newborn is expected to live at the current level of mortality. This is more useful because it tells about the existing situation and can be computed immediately without the need to wait for 100 years.[15]


  Estimation of Survival Time Top


There are three approaches to estimate survival time for underlying observed data-parametric, semi-parametric, and nonparametric approach.[9],[18],[20] [Table 1] depicts a summary of these approaches.
Table 1: Summary of estimation procedures of survival time

Click here to view


Parametric approach

In parametric survival analysis, all parts of model are specified both the hazard function and the effect of any covariates. The strength of this approach is that estimation is easier and estimated survival curve is smoother as it draws information from the whole data. In addition, it is possible to do more sophisticated analyses with parametric models, such as including random effects or using Bayesian methodology to pool sources of information. The main drawback of parametric methods is that they require extra assumptions that may not be always appropriate.[20]

Parametric methods assume that underlying distribution of the survival times follows certain known probability distributions. The popular ones include exponential, Weibull, and Lognormal distributions. Hence, in parametric approach researcher has some structural equation model. For example, parametric approach is possible to investigate potential causal pathways for inequalities in cancer survival. Researcher can select the probability distribution according to the nature of survival.[20]

Semi-parametric approach

Semiparametric method of analysis is used for multivariate survival data when some of subjects are related such as family studies or litter matched animal studies. It is also used when subject may experience two or more type of events in tandem, as in infectious disease such as AIDS, where each subject may first have event of HIV infection and then may get the occurrence of clinical AIDS.[19] In semi-parametric survival analysis, only some parts of the model, for the survival time “T” are specified.[21] One of the most popular semi-parametric approaches for survival analysis is cox-proportional hazard model.

Cox-proportional hazard model

This model is used to identify impact of different variables with a focus to identify crucial factor for handling disease. In medical scenario, thrust is on to finding out the cause or the other characteristic of a disease, for example, a patient suffering from heart attack, has disease of high blood pressure or has family history of heart problem. In general, regression analysis is used for this purpose, but due to the presence of censored data, ordinary regression techniques cannot be used. Therefore, cox's proportional hazards model is more appropriate in such situations.

Cox proportional hazard model is a statistical method that also determines a cumulative probability of an event but also accounts for impact of covariates on that probability.[22],[23] If values of covariates changes with time then they are called time-dependent covariates otherwise time-independent covariates, for example, patient performance during the treatment period is time-dependent and sex is time independent covariate. If time-dependent covariates are involved, cox proportional hazards model cannot be used. More examples of time-dependent covariates are cholesterol level of a patient changes during the study; regular examination of the patient, etc., In the case of time-dependent covariates, analysis is performed using cox-nonproportional hazard model.

Nonparametric approach

Parametric and semi-parametric survival analysis requires some assumptions on the underlying distributions of observed data, which may not be appropriate in some situations. When there is not enough ground to make these assumptions, nonparametric models, also known as distribution-free models, could be an appropriate alternative. Most popular nonparametric method is Kaplan–Meier (KM) method.[24]

Kaplan–Meier analysis

KM method was derived by KM in 1958 as a method to analyze censored data by direct generalization of the censored survival function.[25] The KM survival curve is defined as the probability of surviving in a given length of time while considering time in many small intervals.

A KM analysis allows estimation of survival over time, even when patients drop out or are studied for a different length of time. KM analysis is used to estimate the probability that those who have survived at the beginning will survive to the end also. Thus, it is a conditional probability. It can be used to compare the survival rates of two or more groups of subjects and the analysis is expressed with respect to the proportion of the patients still alive after achieving the desired time limits following the entry or enrollment of subjects in the study.[26]

[Figure 7] is an example of KM curve, which shows the survival curve for two different groups.[20] The graph plotted between estimated survival probabilities (when data are continuous) or estimated survival percentages (when data are discrete) on Y-axis and time past after entry into the study (on X-axis) includes horizontal and vertical lines. In the curve, length of horizontal line along X-axis represents survival duration for that interval. The interval is terminated by the occurrence of the event of interest and the time of censored data is indicated by vertical lines.
Figure 7: Kaplan–Meier curve

Click here to view


KM method is a statistical treatment of survival times which not only makes proper allowances for those observations that are censored but also makes use of the information from these subjects till the time when they are censored.[27] Only one event is measured in each time interval and event occur at the beginning of estimated time interval. Hence in a way it is a modified form of the “life table” technique.[28],[29]


  Conclusion Top


Survival analysis is a tool for analyzing the time to event type data, especially in clinical trial. The primary goal is to estimate, interpret, and compare survival function and assess the relationship of explanatory variables with survival time.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

 
  References Top

1.
Klein JP, Moeschberge ML, Gail M, Samet JM, Tsiats A. Statistics for Biology and Health. New York: Springer; 2003.  Back to cited text no. 1
    
2.
Balakrishnan N, Rao CR. Advance in survival analysis. New York: Elsevier; 2004.  Back to cited text no. 2
    
3.
Matter U. A Short Introduction to Survival Analysis. Available from: https://wwz.unibas.ch/fileadmin/wwz/redaktion/wipo/Ulrich_Matter/Matter_Intro_SurvivalAnalysis_20062012.pdf. [Last accessed on 2016 Apr 15].  Back to cited text no. 3
    
4.
Introduction to Event History Analysis. Available from: https://www.utexas.edu/cola/prc/_files/cs/Fall2012_Brown_Introduction%20to%20Survival%20Analysis%20v3.pdf. [Last accessed on 2016 Mar 10].  Back to cited text no. 4
    
5.
Lawless JF. Statistical Models and Methods for Lifetime Data. 2nd ed. New York: John Wiley and Sons; 1982. Available from: http://www.samples.sainsburysebooks.co.uk/9781118031254_sample_379319.pdf. [Last accessed on 2016 Mar 25].  Back to cited text no. 5
    
6.
Collett D. Modelling Survival Data in Medical Research. 2nd ed. London: Chapman and Hall/CRC; 2003. Available from: http://www.evunix.uevora.pt/~pinfante/eb1011/Collett%20David%20-%20Modelling%20survival%20data%20in%20medical%20research%20(Chapman and Hall-CRC,%202004)(ISBN%201584883251).pdf. [Last accessed on 2015 Dec 17].  Back to cited text no. 6
    
7.
Introduction to Survival Analysis Procedure. Available from: http://www.okstate.edu/sas/v8/saspdf/stat/chap10.pdf. [Last accessed on 2015 Nov 11].  Back to cited text no. 7
    
8.
Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and Truncated Data. New York: Springer; 2003.  Back to cited text no. 8
    
9.
Nelson W. Applied Life Data Analysis. General Electric Company Corporate Research and Development. New York: John Wiley and Sons; 1982.  Back to cited text no. 9
    
10.
Smith T, Smith B. Survival Analysis and Application of Cox's Proportional Hazards Modelling Using SAS, Statistics, Data Analysis, and Data Mining. Available from: http://www2.sas.com/proceedings/sugi26/p244-26.pdf. [Last accessed on 2015 Dec 05].  Back to cited text no. 10
    
11.
Introduction to Survival Analysis. Lecture 15: BIOST 515. Available from: http://www.stat.columbia.edu/~madigan/W2025/notes/survival.pdf. [Last accessed on 2015 Nov 21].  Back to cited text no. 11
    
12.
Shih W. Problems in dealing with missing data and informative censoring in clinical trials. Curr Control Trials Cardiovasc Med 2002;3:4.  Back to cited text no. 12
    
13.
Zaman Q, Pfeiffer KP. Survival Analysis in Medical Research. Available from: http://www.interstat.statjournals.net/YEAR/2011/articles/1105005.pdf. [Last accessed on 2015 Nov 21].  Back to cited text no. 13
    
14.
Machin D, Campbell MJ, Walters SJ. Medical Statistics. 4th ed. New York: John Wiley and Sons; 2007.  Back to cited text no. 14
    
15.
Indrayan A. Medical Biostatistics. 3rd ed. New York: Chapman and Hall/CRC Press; 2013.  Back to cited text no. 15
    
16.
Lee ET. Statistical Methods for Survival Data Analysis. 2nd ed. New York: Wiley-Inter-Science; 2008. Available from: http://www.mortality.org/INdb/2008/02/12/7/document.pdf. [Last accessed on 2016 Jan 12].  Back to cited text no. 16
    
17.
Balakrishnan N, Basu AP. The Exponential Distribution Theory, Methods and Applications. Netherland: Gordon and Breach Publishers; 1995.  Back to cited text no. 17
    
18.
Truncation and Censoring. Available from: http://www.support.sas.com/documentation/cdl/en/etsug/65545/HTML/default/viewer.htm#etsug_severity_gettingstarted02.htm. [Last accessed on 2015 Dec 16].  Back to cited text no. 18
    
19.
Watanabe H. Applications of statistics to medical science, IV survival analysis. J Nippon Med Sch 2012;79:176-81.  Back to cited text no. 19
    
20.
Sinha D, Dey DK. Semi parametric Bayesian analysis of survival data. J Am Stat Assoc 1997;92:1195-212.  Back to cited text no. 20
    
21.
Teele DL. Introduction to Event History Analysis. Dustin Brown, Population Research Center. Available from: http://www.liberalarts.utexas.edu/prc/_files/cs/Fall2012_Brown_Introduction%20to%20Survival%20Analysis%20v3.pdf. [Last accessed on 2016 Apr 16].  Back to cited text no. 21
    
22.
Huang J, Wellner JA. Interval Censored Survival Data: A Review of Recent Progress. Proceedings of First Seattle Symposium in Biostatistics. Lecture Notes in Statistics: 123. New York: Springer; 1997.  Back to cited text no. 22
    
23.
Singh R, Mukhopadhyay K. Survival analysis in clinical trials: Basics and must know areas. Perspect Clin Res 2011;2:145-8.  Back to cited text no. 23
  Medknow Journal  
24.
Ventre J, Fine L. A Programmer's Introduction to Survival Analysis Using Kaplan Meier Methods. PharmaSUG2011-PaperCC16. Available from: http://www.lexjansen.com/pharmasug/2011/cc/pharmasug-2011-cc16.pdf. [Last accessed on 2016 May 02].  Back to cited text no. 24
    
25.
Satagopan JM, Ben-Porat L, Berwick M, Robson M, Kutler D, Auerbach AD. A note on competing risks in survival data analysis. Br J Cancer 2004;91:1229-35.  Back to cited text no. 25
    
26.
Rich JT, Neely JG, Paniello RC, Voelker CC, Nussenbaum B, Wang EW. A practical guide to understanding Kaplan-Meier curves. Otolaryngol Head Neck Surg 2010;143:331-6.  Back to cited text no. 26
    
27.
Altman DG. Analysis of Survival Times in Practical Statistics for Medical Research. London: Chapman and Hall; 1992.  Back to cited text no. 27
    
28.
Why Use a Kaplan-Meier Analysis? Available from: http://www.biostat.mc.vanderbilt.edu/wiki/pub/Main/ClinStat/km.lam.pdf. [Last accessed on 2015 Oct 03].  Back to cited text no. 28
    
29.
Goel MK, Khanna P, Kishore J. Understanding survival analysis: Kaplan-Meier estimate. Int J Ayurveda Res 2010;1:274-8.  Back to cited text no. 29
[PUBMED]  Medknow Journal  


    Figures

  [Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7]
 
 
    Tables

  [Table 1]


This article has been cited by
1 Survival analysis: A primer for the clinician scientists
Sushmita Rai, Prabhakar Mishra, Uday C. Ghoshal
Indian Journal of Gastroenterology. 2022;
[Pubmed] | [DOI]
2 Failure modes based censored data analysis for repairable systems and its industrial perspective
Garima Sharma,Rajiv Nandan Rai
Computers & Industrial Engineering. 2021; 158: 107439
[Pubmed] | [DOI]



 

Top
 
 
  Search
 
Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

 
  In this article
Abstract
Introduction
What Is Survival...
Characteristics ...
Goals of Surviva...
Data Mechanism
Functions of Sur...
Estimation of Su...
Conclusion
References
Article Figures
Article Tables

 Article Access Statistics
    Viewed6617    
    Printed315    
    Emailed0    
    PDF Downloaded591    
    Comments [Add]    
    Cited by others 2    

Recommend this journal


[TAG2]
[TAG3]
[TAG4]