EDA Process

The six practices of EDA are iterative and non-sequential

Exploratory data analysis (EDA) is not like a cake recipe. It is not a step-by-step process you follow. Instead, the six practices of EDA are iterative and non-sequential.

Because of the varying nature of datasets, the approach to exploring that data will be different each time. That means that you will need to use your logic and experience throughout the EDA process to determine which of the six practices to utilize, how many times to apply them, and when in the process you should apply them.

Visual example

Imagine you are assigned a dataset that has only 200 rows and five columns of data about trees in a coniferous forest in Norway. You know that to complete your full analysis you’ll need more than 1,000 rows and at least two more columns. Even without much more detail than that, your entire EDA process might look something like this:

Untitled

  1. Discovering: You check out the overall shape, size, and content of the dataset. You find it is short on data.
  2. Joining: You add more data.
  3. Validating: You perform a quick check that the new data doesn’t have mistakes or misspellings.
  4. Structuring: You structure the data in different time periods and segments to understand trends.
  5. Validating: You do another quick check to ensure the new columns you’ve made in structuring are correctly designed.
  6. Cleaning: You check for outliers, missing data, and needs for conversions or transformations,
  7. Validating: After cleaning, you double check the changes you made are correct and accurate,
  8. Presenting: You share your dataset with a peer.

Notice you performed the “validating” practice iteratively, or multiple times, to make sure your changes to the data did not unwittingly introduce errors. Also, because you recognized the need for more data up front, the practice of “joining” was performed immediately following the practice of “discovering.”

After you present your cleaned dataset to a peer, there is a good chance you will receive notes or ideas for more exploration and/or cleaning. Because of that, you will see even more iterations.