Prep your data

Use data prep tools to organize and prepare your data for more robust analyses.

Open your data source

From the Minitab Solution Center home page, you can access the Minitab Data Center.
  1. From the Solution Center home page, select Data Prep.
  2. Select Add Data. Sign into an online repository. You can also add a local data file.
  3. Browse to the file location, then select Open.
A schematic diagram represents the data processing steps.
Cleanup view
You can begin cleaning your data when you are in the Cleanup view.
Data Source view
If you need to change the data set schema or any settings that affect the entire data set, select the data source file icon to open the Options panel.

For more information, go to Edit the data set schema or Set data source options.

Data prep steps

In this example, a compliance team is concerned about fraud detection accuracy in the automotive industry; however, the data need prep before analysis can begin. Follow these steps to prepare insurance_fraud_data.csv for further analysis.
  1. Open Insurance Fraud Data in the Minitab Data Center.
  2. Make sure you are in the Cleanup view.
  3. Select the column and open the Data Prep Options dropdown menu to access the column cleanup options.
  4. For claim_number, change the data type from numeric to text.
  5. For claim_number, prepend # to the column values.
  6. For age_of_driver, filter to only include drivers that are less than or equal to 100 years old.
  7. In gender, change M to male and F to female.
  8. For annual_income, filter to only include drivers that make more than 1.
  9. For address_change, change the data type from numeric to text.
  10. In address_change, change 1 to yes and 0 to no.
  11. For zip code, change the data type from numeric to text.
  12. Use Advanced Sort to sort by fraud, injury claim, and ZIP code.

Use Minitab AI to clean your data

The Minitab Data Center provides a conversational interface that guides your data preparation, while in the Cleanup view. For the example above, you can enter the following text into the Minitab AI prompt to get the same results as invidual steps.

Make claim numbers to text. Add the number symbol to claim numbers. Remove drivers that are older than one hundred. Change m to male and f to female. Remove drivers that don’t have a valid income. Change address_change to text. Make 1 to yes and 0 to no for address changes. Sort by fraud, injury claim, and zip code.

For more information on using Minitab AI in the Data Center, go to Using Minitab AI to clean your data.

Export data prep steps

After you apply all the prep steps, save the steps to use for future data sets with the same columns. To save the steps, export them as a .mdcs file.
  1. In the Steps pane on the left, select Export Steps from the dropdown menu.
  2. The file is saved to your downloads folder or other save location and uses the same name as your data file. Change the name accordingly.

Import data prep steps

To apply the steps to a new data file, import them as a .mdcs file. Select Import Steps from the dropdown menu in the Steps pane.

Explore data summaries

Each column has a summary that shows the shape of the data, the range of the data, and an icon that represents the data type.

A quick look at the column graphical summaries show that channel has 3 levels and days open shows a bimodal distribution.

Open the Data Summary to get more information on the summary statistics on these columns.

The data summary for channel shows the frequency for each of the 3 levels.

Use the right-click menu to edit the grouping label, exclude the group from the data set, or show only the rows that contain this value.

What's next

Because the data for days open indicate two distributions, the insurance company wants to look at this further. Go to Analyze your data.