Who Should Face-Off?: Secondary Data Analysis of Marvel Characters
I use SQL to analyze secondary data of Marvel Characters' attributes and popularity. I ran the SQL code using Python in a Jupyter Notebook to show the output of the queries.
Project Summary
Here’s a video of me walking through the entire project in less than 5 minutes:
Project Links
Business Problem & Key Questions
While superhero movies have grossed a lot over the past decade, this company (confidential) has failed to make a successful (i.e., well-received and profitable) superhero movie.
This time around it conducted research to ensure that the characters are popular and will put on a great show. That way everyone goes to see the movies and people buy merchandise.
I have been hired to analyze their data to answer the following question:
Based on the characters’ popularity and their attributes, who should go head-to-head in the next film?
Since this was a broad question, I developed a few questions to help guide the analysis. Below are my questions:
Who are the characters in the 'Genius+ IQ' groups?
Find the characters who are in the 'Genius+' group by also have high durability or great strength
Data Preparation
To obtain the data, I used copied the company’s code for the table that houses their results.
The code can be found here.
Parts of the table were not easily understood. For example, a character’s speed is labeled on a range of 1 - 7, but what does that mean? This is the case for all ability or power-related attributes.
After researching the scales, I created a power grid table based on Marvel Fandom Database
I used SQL to add columns to the company’s table and populate it with the information from the power grid.
I was ready to begin my answering their main question.
Approach
Here’s the approach I used to determine which characters were in the ‘Genius+ IQ’ group:
I created used a “
CASE-WHEN
” function to group and count characters by their alignment.
7 characters were in the “Genius+ IQ” group,
Here’s the approach I used to rank the characters by their intelligence among others in their alignment (i.e., good, bad, or anti-hero):
I used
PARTITION BY alignment
to ensure that the ranking is done separately for characters with different alignments, such as "Good," "Bad," or "Anti-hero."This query ranks provides rankings specific to each alignment category rather than ranking all the characters together.
Here’s the approach I used to find the characters have a genius level IQ or higher, are in the top 25 of popular characters, and also have high durability or great strength:
I used the query above to create a CTE to help shorten the code.
strength >= 5
, means the character can lift over 25-75 tons over their headdurability >= 5
, means the character is bulletproof
Output:
Results & Key Takeaways
Even though anti-hero films are becoming more popular, the classic, "Good vs Evil" will always fare well.
Only two characters who are in the top 25 of popularity, have a bulletproof level of durability, are classified as a genius or higher, can lift over 25-75 tons over their head, and align with good or bad: Rogue and Doctor Doom!
Based on these analyses, Rogue and Doctor Doom should go head-to-head in a cinematic feature film.
Any comic book fan knows that this dataset is not accurate. I purposely choose to hide the rows from the initial table so you all would follow along despite it being off.