DSCI 510 — FALL 2023

Peloton Customer Churn Analysis

Combining Qualtrics survey data with geocoding and U.S. Census demographics to understand why Peloton customers reduce their usage — and what geographic, income, and age patterns predict churn.

Contents

  1. Project Overview
  2. Data Pipeline
  3. Geographic Distribution
  4. Churn Analysis
  5. Demographics & Correlations
  6. Key Findings

Project Overview

Peloton experienced rapid growth during the COVID-19 pandemic, followed by a significant decline in subscriber retention. This project investigates the factors behind customer churn using a dataset of 317 survey respondents collected via Qualtrics, enriched with geographic and demographic data.

317 Survey Respondents
273 After Cleaning
3 Data Sources
19 Features

Research Questions

Data Pipeline

The final dataset was assembled from three sources through a multi-step process, combining survey responses with geocoded location data and U.S. Census demographics.

1

Survey Data

Qualtrics survey with lat/long coordinates, usage behavior, and churn status

2

Geocoding

GeoPy API (Nominatim) converts 317 coordinate pairs to ZIP codes

3

Census Join

ZIP code join with 40,959-row census dataset for city, state, population, density

4

Feature Merge

Demographics (income, age, gender) and behavior data merged into final dataset

Data Sources

Sample Code: Geocoding with GeoPy

# Iterate through coordinates and reverse-geocode to ZIP codes for value in df_coord['CombinedCoordinates']: geolocator = Nominatim(user_agent="email@usc.edu") reverse = RateLimiter(geolocator.reverse, min_delay_seconds=5) location = reverse((value), language='en', exactly_one=True) zipcode = location.raw['address']['postcode']

Geographic Distribution

Survey responses were distributed across the United States, with the heaviest concentration in California, New York, and other major coastal states.

Donut chart showing distribution of Peloton customer responses by state

Distribution of Responses by State

A donut chart showing which states had the most survey respondents. California and New York dominate, reflecting Peloton's urban-coastal customer base.

Bar chart of Peloton customer responses in each state

Response Count by State (Descending)

The same distribution as a bar chart for precise comparison. The long tail reveals that Peloton has customers spread across nearly every state.

Histogram of population distribution

Distribution by City Population

Most Peloton customers are located in cities with moderate population sizes, with a right-skewed distribution showing some customers in very large metro areas.

Churn Analysis

Customer churn — defined as a reduction in Peloton usage over the past 6 months — was the central variable of interest. The visualizations below explore churn patterns across geography and age.

Distribution of Peloton customer churn by state

Churn Distribution by State

Stacked bar chart showing churn (orange) vs. retained (blue) customers in each state. California has both the highest total responses and the highest absolute churn count.

Distribution of Peloton customer churn by age

Churn Distribution by Age

Customer churn broken down by age group. Churn appears across all age brackets, but certain age groups show disproportionately higher churn rates.

Histogram of respondent age distribution

Age Distribution of Respondents

Histogram of the overall age distribution, providing context for interpreting the churn-by-age chart above. The majority of respondents fall in the 30-50 age range.

Demographics & Correlations

To understand whether any numerical variables share linear relationships, pairwise plots and correlation matrices were generated for population, density, ZIP code, age, and income.

Pairplot of numerical variables

Pair Plot — Numerical Variables

Scatterplot matrix exploring relationships between population, density, ZIP code, age, and income. No strong linear relationships are apparent between these variables.

Correlation matrix heatmap

Correlation Matrix

A slight correlation exists between population and density (0.52), with a minor correlation between age and income. The matrix does not indicate a strong relationship between ZIP code, income, and age.

Pivot table heatmap of age, income, and churn

Churn Heatmap — Age vs. Income

This pivot table heatmap reveals two distinct churn patterns: (1) strong churn at lower income and higher age, and (2) a milder churn pattern at higher income and younger age (~29.5 years).

Key Findings

Two distinct churn profiles emerged from the analysis — suggesting that Peloton faces different retention challenges across customer segments.

Primary Findings

  1. Lower income + older age = higher churn. Customers with lower household income and higher age showed the strongest churn signal, potentially driven by price sensitivity or changing fitness priorities.
  2. Higher income + younger age (~29.5) = moderate churn. A secondary pattern appeared among younger, higher-income customers — possibly reflecting lifestyle changes or competition from alternative fitness options.
  3. No strong geographic predictor. While responses concentrated in coastal urban areas, churn was distributed across states without a clear geographic pattern beyond population density.
  4. Weak linear relationships. Traditional correlation analysis between numerical features (population, density, ZIP, age, income) did not reveal strong predictive signals, suggesting churn is influenced by behavioral factors not fully captured in this dataset.

Limitations

Tools & Technologies

Python 3.11 Pandas NumPy Matplotlib Seaborn GeoPy Google Colab Qualtrics