Before analyzing fraud detection trends, the dataset must be cleaned and standardized. In this section, you will:
The Minitab Data Center uses a data pipeline to prepare your data. A pipeline is a sequence of connected steps that transform raw data into a clean, analysis-ready dataset.
Every Data Center project contains an interactive pipeline diagram that represents the data processing steps. A typical pipeline flow contains the following nodes.
Data Source → Cleanup → Merge/Reshape→ Output
Each step appears as a visual node in the pipeline, making it easy to understand and reuse your data preparation process.
Data Source → Cleanup → Output


For more information, go to Manage the dataset schema or Set data source options.
When to use each view:

Change claim_number data type from numeric to text.
Prepend the # symbol to all claim numbers.
Why this matters: Prevents numeric interpretation and preserves formatting consistency.
Why this matters: Removes unrealistic ages and invalid income entries that could skew results.
Why this matters: Standardized categories improve readability, grouping, and reporting.
Why this matters: Preserves leading zeros and prevents unintended numeric operations.
Why this matters: Sorting helps prioritize and review fraud-related records efficiently.
In addition to cleaning and standardizing data, you may need to combine or reorganize datasets before analysis.
For more information, go to Join datasets.
For more information, go to Union datasets.
For more information, go to Transpose datasets.
The Minitab Data Center provides a conversational interface that guides your data preparation in the Cleanup view.
For the example above, you can enter the following text into the Minitab AI prompt to get the same results as individual steps.
Make claim numbers to text. Add the number symbol to claim numbers. Remove drivers that are older than one hundred. Change m to male and f to female. Remove drivers that don’t have a valid income. Change address_change to text. Make 1 to yes and 0 to no for address changes. Sort by fraud, injury claim, and zip code.
For more information on using Minitab AI in the Data Center, go to Using Minitab AI to clean your data.



For example, channel has 3 levels and days open shows a bimodal distribution.


The data summary for channel shows the frequency for each of the 3 levels.

Use the right-click menu to edit the grouping label, exclude the group from the dataset, or show only the rows that contain this value.
Because the data for days open indicate two distributions, the insurance company wants to look at this further. Go to Analyze your data.