Before analyzing fraud detection trends, the dataset must be cleaned and standardized. In this section, you will:
All data preparation is completed in the Minitab Data Center.

For more information, go to Manage the dataset schema or Set data source options.

Change claim_number data type from numeric to text.
Prepend the # symbol to all claim numbers.
This prevents numeric interpretation and preserves formatting.
This removes unrealistic ages and invalid income entries.
Standardized categories improve readability and reporting.
This preserves leading zeros and prevents unintended numeric operations.
Sorting helps organize fraud-related records for review.
The Minitab Data Center provides a conversational interface that guides your data preparation, while in the Cleanup view. For the example above, you can enter the following text into the Minitab AI prompt to get the same results as individual steps.
Make claim numbers to text. Add the number symbol to claim numbers. Remove drivers that are older than one hundred. Change m to male and f to female. Remove drivers that don’t have a valid income. Change address_change to text. Make 1 to yes and 0 to no for address changes. Sort by fraud, injury claim, and zip code.
For more information on using Minitab AI in the Data Center, go to Using Minitab AI to clean your data.
In addition to cleaning and standardizing data, you may need to combine or reorganize datasets before analysis.
For more information, go to Join datasets.
For more information, go to Union datasets.
For more information, go to Transpose datasets.



For example, channel has 3 levels and days open shows a bimodal distribution.


The data summary for channel shows the frequency for each of the 3 levels.

Use the right-click menu to edit the grouping label, exclude the group from the dataset, or show only the rows that contain this value.
Because the data for days open indicate two distributions, the insurance company wants to look at this further. Go to Analyze your data.