Details about the Population Stability Index (PSI)

Population Stability Index (PSI) measures the difference in distributions between two samples. A larger value of PSI indicates a greater change, or more serious drift, in the distribution of a variable between two samples. Usually, a PSI value of 0.1 shows that two distributions are moderately different. A PSI value of 0.2 shows that two distributions are severely different.

For moderate or severe drift, investigate the reasons for these differences and their effect on the model's performance. Also, consider whether to change the thresholds based on how well they correspond to model performance. For example, PSI tends to increase as sample sizes increase, so the possibility exists that larger thresholds are appropriate for a process with hundreds of observations per production period.

The calculation of PSI compares the percentages of data in the same categories for two samples of data. Use the following steps to calculate the PSI for a continuous variable.

  1. Sort the baseline data in descending order.
  2. Find the ideal size of a bin, each containing approximately 10% of the data.

    where

    TermDescription
    the number of observations in the bin
    the number of observations in the data set
    is the floor of
  3. Add observations to the first bin. Tied values are added to the same bin. Thus, bin sizes may not be equal.
  4. Continue for each bin. If the last bin is less than 50% of , then move those observations to the previous bin.
  5. Calculate the percentage of the production data in the same bins as the baseline data.
  6. Calculate the PSI statistic with the following equation.

where

TermDescription
Sthe Population Stability Index (PSI)
jthe number of bins, usually 10
tthe percent of the baseline data in the bin
pthe percent of the production data in the bin

For a categorical variable, each level is a bin. Otherwise, the calculation is the same.