Am I a Data Scientist? Part 3 of my journey

Week 3. The journey has been smooth. Last week I went deeper into Python and learnt about common libraries used in Python for data science. I was focused on Numpy and Pandas so I’ll tell you about them. Before that I need to tell you what a library is and all other things associated with it. This simple website clarified things for me. In python, you write codes that you can store in files. These files are named modules and they can be imported into other python codes. Having a collection of these modules is described as having a package. A library, like Numpy or Pandas, is a collection of packages. A framework is a collection of libraries. All these are incorporated into a python application.

Photo by Franki Chamaki on Unsplash

Another new term I learnt last week was Exploratory Data Analysis — EDA. An EDA is a method used to determine how to analyze a set of data. I didn’t know of it nor of its essence until recently. Let’s get into it. Imagine yourself in a company, assume any role. Your company has been able to acquire data from its customers through multiple sources. This acquired data is meant to advise business decisions. Before working on the data and arriving at conclusions, it is mandatory to check if the data is relevant and will produce expected results. Else, all your efforts will be futile.

I worked on an EDA team assignment with three other amazing people. One of the very first tasks was to know the company whose data we were asked to work on. We took the time to understand their business objectives, ask relevant questions and state our assumptions and hypotheses. In doing the analysis, there was a need to clean the data by dropping cells with missing values that might wrongfully inform results. We then presented data using visualization tools such as bar graphs and pie charts after finding the correlation between various variables in the data. A quick summary of the data was generated by using pandas — this is known as a Pandas profiling report and trust me, it makes insight generation from data very easy.

All praise to the Kaggle website for giving me the necessary information I needed for understanding an EDA. Before modelling data, please do your Exploratory Data Analysis or else, that company you imagined earlier, yeah that one, you might crash it.

Have a good week!