Baseline, production, and stability data requirements

Learn about the data requirements for baseline, production, and stability data in Minitab Model Ops.

Baseline data

Drift reports require the baseline data for your model. The baseline data set should include predictor columns that contain the data used for training the model, and may also include an optional column that contains the actual response.

Training data are the data that were used to fit the model. Often, you want to upload the training data after you import the model to use the training data as baseline data.

For more information on importing baseline data, go to Upload baseline and production data.

Requirements for baseline data

  • Use the same variable names and same variable types that the model uses. The names are case-insensitive.
  • Use the same number of rows of data in all columns.
  • Save baseline data in CSV format. Minitab Model Ops also accepts Minitab worksheet files (MWX), but converts to a CSV format upon upload.
    Note

    Use a comma or a semicolon as the delimiter, depending on your regional settings.

  • The file size of the CSV must be 10 megabytes or less. Large MWX files may fail if they exceed this limit when converting to CSV.

Example of baseline data

Table 1. Baseline Data
Age Sex Cholesterol Slope
55 male 233 1
63 male 187 1
27 female 145 3
Table 1 shows sample baseline data. The columns of predictors contain the data for training the model.

Production data

The production data are also used in the drift report. The production data set contains the predictor variables used to make predictions based on the model. The data set is formed by cumulating the predictor variable values from calls to the endpoint. The data set can also include an optional observation ID and an optional date/time. If you do not specify date/time information, Minitab Model Ops uses the current server time.

For more information on importing production data, go to Upload baseline and production data.

Requirements for production data

  • Use the same variable names and same variable types that the model uses. The names are case-insensitive.
  • Use an optional observation ID column, as specified in Settings. If you do not include observation ids, Minitab Model Ops cannot generate stability reports with this data. For more information, go to Deployment settings.
  • Use an optional date/time column, as specified in Settings. If you do not specify date/time data, Minitab Model Ops uses the date and time of the prediction request in UTC.
  • Use the same number of rows of data in all columns.
Note

When using a file, save the data in CSV format. Use a comma or a semicolon as the delimiter, depending on your regional settings, and use UTF-8, UTF-8-BOM, or UTF-16-LE as the encoding.

Example of production data

Table 2. Production Data
Age Sex Cholesterol Slope ObservationID
55 male 233 1 100987
63 male 187 1 100988
27 female 145 3 100989
Table 2 shows sample production data. The columns of predictors contain the data for the calculation of the predictions. If you use the production data to monitor drift, the drift report uses the date and time of the prediction request in the results, unless you add an optional date/time column. You can also use an optional observation ID column that you name on the Settings tab. For this example, the observation ID column is ObservationID.

Stability data

The stability data are used to create a stability report. The stability data set must contain the response values and the observation IDs.

For more information on importing stability data, go to Send stability data.

Requirements for stability data

  • Include the name of the response variable. The name is case-insensitive.
  • Include the observation ID that corresponds to the prediction for the actual value. Use the same name as specified in Settings.
  • Use the same number of rows in the columns.
Note

When using a file, save the data in CSV format. Use a comma or a semicolon as the delimiter, depending on your regional settings, and use UTF-8, UTF-8-BOM, or UTF-16-LE as the encoding.

Example of stability data

Table 3. Stability Data
Heart Disease ObservationID
0 100987
1 100988
1 100989
Table 3 shows the stability data. The stability data set has a column for the response variable with the name Heart Disease and a column with the name ObservationID for the unique identifier that links the prediction and stability information into a complete record.

Reserved variable names

Commas and semicolons cannot be used in file paths. Also, the following names are reserved and cannot be used as variable names.
  • "prediction-score", "prediction_score"
  • "mtb-correlation-id", "mtb_correlation_id"
  • "mtb-timestamp", "mtb_timestamp"