🎶 Spotify Music Analysis

Project Overview

This project explores what makes a song popular using a Spotify dataset.

The dataset was cleaned and analyzed using both Python (Pandas + SQLAlchemy) and PostgreSQL (SQL queries).

Tools & Technologies

Data Cleaning & Preparation

Dataset Overview

Cleaning in SQL

  1. Removed duplicate rows (deduplicated on track_name & artists)
  2. Standardized text columns (applied TRIM())
  3. Handled invalid durations

Cleaning in Python

  1. Loaded dataset into PostgreSQL using SQLAlchemy
  2. Standardized missing values:
  3. Trimmed whitespace in string columns
  4. Replaced empty strings with "Unknown" across all text fields
  5. Filled numeric columns with median values
  6. Saved cleaned data back into PostgreSQL as tracks_cleaned

Summary Statistics (after cleaning)

Metric Value
Total tracks 81,201
Avg popularity 34.7
Popularity stdev 19.3
Avg danceability 0.56
Avg energy 0.64
Avg valence 0.46

Popularity Analysis

Question: What makes a song popular?

We analyzed audio features such as danceability, energy, valence, acousticness, instrumentalness, and tempo to see how they relate to popularity. Correlation analysis was performed in Python (pandas).

Correlation with Popularity

Feature Correlation with Popularity Interpretation
Danceability 0.035 Slight positive effect
Tempo 0.013 Almost no effect
Energy 0.001 No meaningful relationship
Acousticness -0.025 Slight negative effect
Valence -0.041 Very weak negative effect
Instrumentalness -0.095 Instrumental tracks tend to be less popular

Key Takeaways

Visualizations

All plots are saved in the visualizations/ folder and displayed below.

Correlation Heatmap

Heatmap of audio features vs popularity

Danceability vs Popularity

Danceability vs Popularity

Findings & Summary

Repository Structure

- README.md: Project overview and explanation

- index.html: This live HTML version

- visualizations/: Scatter plots, heatmaps, and other images

- sql/: SQL scripts for cleaning and preparing the dataset

- scripts/: Python scripts for linking the dataset to SQL, cleaning, and generating visuals from the dataset

- data/: raw dataset used for analysis