356. Data Preprocessing Steps

The Main 4 Steps

There are mainly 4 steps for data preprocessing.

  1. Data Quality Assessment
  2. Data Cleaning
  3. Data Transformation
  4. Data Reduction

1. Data Quality Assessment

Before jumping into coding, evaluating the overall data quality is essential. Here are several problems to look out for.

  1. Mismatched Data Types
  2. Mixed Data Values
  3. Data Outliers
  4. Missing Data

2. Data Cleaning

Now that you’ve examined and understood the issues with the current data, our next step is to start cleaning the data by fixing the problems we’ve found on our previous step.

3. Data Transformation

By cleaning the data, you are finally able to stand at the starting line. Now we will transform the data so that your data will be turned into proper formats for analysis and other downstream phases.
Here are some examples.

  1. Aggregation
  2. Normalization
  3. Feature Selection
  4. Discreditization

4. Data Reduction

The more data you have, the more harder it will get to analyze the data. Data reduction not only makes the analysis easier but cuts down on data storage.
Here are some examples.

  1. Attribute Selection
  2. Numerosity Reduction
  3. Dimensionality Reduction

Reference

What Is Data Preprocessing & What Are The Steps Involved?