Eight Weeks and a Hackathon

Otema Yirenkyi
3 min readSep 14, 2020

In Week 8, I participated in the team hackathon I introduced in my previous article. Many thanks to my group members for the teamwork they demonstrated. One of my team members carried the presentation so effortlessly while the other worked on creating a machine learning module in Microsoft Azure. By now you’re itching to know what we worked on. Am I right? Okay, I’ll let you in. As unfortunate as has been, we are still living in a pandemic. It has affected us in many ways. It has rocked economies and taken away our dear ones. My heart goes out to anyone who has lost someone. We are in difficult times.

Our team project was centered in the health industry. We were focused on finding out the effects of COVID-19 or the novel coronavirus on society. We zoomed in on how the world perceives the introduction and distribution of coronavirus vaccines through social media. We researched and found an existing challenge on Zindi. We decided to take it up. Zindi holds a repository of many problems you can solve with Data Science. There’s so much more to Zindi but I’ll let you find that out when you visit their website. It’s great for learners. We found data on how tweets from Twitter users exhibited people’s sentiments towards coronavirus vaccines. These sentiments were labeled under positive, negative and neutral feelings.

Guys, so remember when we spoke about Exploratory Data Analysis in one of my previous articles? We’ll see the concept in live action here. Before we could present the value of our project, there was the need to clean up the data and get rid of missing values, unwanted values etc. This was my role. In Jupyter notebook and with Python programming language, I explored the data using pandas Python library and generated a profiling report as well, to get a general sense of the data. I dropped rows with missing values and dropped columns which would not be necessary for our modelling. I generated visuals to give a clearer picture of the distribution of the data. There’s so much you can do with pandas when you want to understand your data to be honest.

The point of generating a machine learning model for our project was to predict how we can tell people’s sentiments about vaccines from the content they post on social media. Based on historical data that we used, we monitor current sentiments and train the machine learning model with this data, such that it can recognize other tweets based on certain keywords and then put that tweet under the required sentiment label. This was a classification problem where we used RandomForestClassifier and NaiveBaiyesClassifier for improved accuracy. Thus, we evaluated our model using accuracy. Deployment of the model was done in Azure when we exported our pickle file to the platform.

Photo by National Cancer Institute on Unsplash

With the insights generated from the data and project in general, governments can use it to understand how their citizens respond to the manufacturing and distribution of vaccines. Health organizations can employ strategies of bringing vaccines to the market and be transparent so that people are more comfortable with vaccines. At the end of the day, we all want to live a life free of disease. We need to work together in ridding this virus in whatever way, big or little, we can.

On the side, my friend and I drafted a timetable to study for the upcoming examination. I improved my skills on how to use Microsoft Azure’s designer to create a machine learning clustering model without coding. More work needs to be done in studying for the exam and it will be covered in the coming week.

Did you know that you can tell me your thoughts right under my posts? You can share your suggestions and general comments by hitting the response button beneath or on the side of the article. It’s got a bubble icon. So, go ahead. Feel free. It’s a safe space.

--

--