WhatsApp Group Chat Analysis using Python | Data Analysis and Data Visualization

PRANJAL VERMA
5 min readDec 14, 2020

--

Photo by AARN GIRI on Unsplash

Who sends the most messages? Which are the most common words used? Which are the most common Emojis used? When do people message often? If you’ve ever wondered about these things in your WhatsApp groups with your mates, then this is the article for you. Find out with some simple (relatively) Python!

In late 2011, WhatsApp reached a new milestone: More than one billion messages were being sent using its app in a single day.

Today, it is reporting far more impressive figures: More than 65 billion messages are being sent daily via WhatsApp (Source : Connectiva Systems, 2019) which boils down to 2.7 billion per hour, 45 million per minute, and more than 750,000 per second.

With this treasure house of data right under our very noses , this article aims to serve as a step-by-step guide to build your own WhatsApp Group Chat analyzer, and is divided into the following 3 main topics:

  • Data Collection
  • Data Preparation
  • Data Exploration

A picture is worth a thousand words. So feel free to jump directly to the images .

Data Collection and Cleaning

WhatsApp provides a feature of exporting any chat (with and without media) as a .txt file. I exported my group chat without media. A sample exported.txt file looks like —

Sample Image

More info about data : This is our hostel WhatsApp group active since 18 July 2019 . This group is 18 months old and has around 150 members , of which some are usually active whole day while others rarely send a message . A fraction of them sends some particular words while others send some particular emojis . Using this project we will try to analyze this data.

So let’s move to our task . Let us consider just a single line from the text and see how we can extract relevant columns from it:

29/07/19, 4:49 pm — Subhashis Das: You need to give an application for the same and get it approved by the Hostel Office/warden.

In our sample line of text, our main objective is to automatically break down the raw message into 4 tokens.

{Date}, {Time} — {Sender}: {Message}

{29/07/19}, {4:49 pm} — {Subhashis Das}: {You need to give an application for the same and get it approved by the Hostel Office/warden}

After cleaning the data and converting this into dataframe , we get

Looks neat and clean . We can see that it has around 17,742 entries which means group data consists of 17k+ messages .

Data Exploration

Photo by Luke Chesser on Unsplash

Ok so the most interesting part is here.

KEEP CALM AND LET THE DATA SPEAK !

When were the most number of messages sent ?

By this heatmap , we can see that group is more active after 5:00 pm (time when class gets over ) . Also it is quite obvious to conclude that group is inactive from 2 am (approx.) to 10 am . Though there is one outlier and I know the reason behind that (spamming) .

What were the number of messages sent by an individual ?

By using some simple python functions , we can display a bar chart like this

By using top_10_sender_value_counts , we can display top 10 senders with their respective counts.

Which are the most emojis used in the chat ?

I don’t know how to use emoji before but the above link made it so easy for me. So without wasting your time , let me show the interesting result

Some Group Stats

Again after using some python functions , we can see the total number of messages , total media messages , total emojis used and total links sent in the group .

Sender wise Stats

Isn’t this cool ?

What are the total number of words sent by an individual ?

Which is the most happening day and time ?

Well , I know the reason what happened on 1st sept. 2019 , Techniche right ?

It’s Word-Cloud Time

WordCloud of most used words by all members

Ending Note

Congratulations! You are more insightful about your WhatsApp conversations now. In this article, I tried to analyze our WhatsApp group chats using Python and I hope you enjoyed reading this article . It was really a fun ride to analyze our Group chat . It is not 100% accurate but it is near to .

I have done some manipulation in the dataset and also tried to maintain every group member’s privacy . I am sorry if I had failed in that.

Future works: 1) Sentiment Analysis 2)Web-app deployment

Also , if you are interested in code then check this out https://github.com/PranjalVerma08/Whatsapp-Group-Chat-Anaylsis

Till then Happy exploring !

--

--

PRANJAL VERMA
PRANJAL VERMA

Written by PRANJAL VERMA

Data Scientist | Kaggle Master | Data Science and AI Enthusiast

No responses yet