Optimizing Online Sports Retail Revenue

Vtantravahi
12 min readDec 16, 2023
Nike Vs Adidas Revenue (Source)

Problem statement: To optimize revenue by extracting insights and recommendations.

Objective:

The online sports retail industry is in the midst of a remarkable surge in growth. As e-commerce continues to transform the retail landscape,
businesses specializing in sports clothing have seized the opportunity to thrive in this dynamic environment. In this project,
we delve into the world of an online sports retail company, embarking on a journey to unravel the intricate data fabric that underlies its operations.

Data in hand:

Data type — Retail Data belonging to Product characteristics and Transactional data of Brands (Adidas & Nike)

Domain: Sports

There are total of 5 tables

Tables: Brands, Finance, Product, Reviews, Traffic on Products

Time period of data — 2018 to 2020.

Currency in Dollars.

Checking for data quality issues and Initial Data exploration results:

In all the tables of dataset, every row of product id is unique that means there are no duplicates.

Relation Between Tables

Brands table:

-- Count of Product IDs without a Brand Name
SELECT COUNT(*) as MissingBrandCount
FROM brands_v2
WHERE brand IS NULL OR brand = '';

-- Count of Products by Brand
SELECT brand, COUNT(DISTINCT product_id) as ProductCount
FROM brands_v2
WHERE brand IN ('Adidas', 'Nike')
GROUP BY brand;

-- Count of Products Unclassified
SELECT COUNT(DISTINCT product_id) as UnclassifiedProductCount
FROM brands_v2
WHERE brand NOT IN ('Adidas', 'Nike') OR brand IS NULL OR brand = '';
  1. 59 out of 3179 product ID’s don’t have a brand name corresponding to them, brand name is missing for these rows
  2. Data types are accurate.
  3. We have product data of Adidas and Nike brands.
  4. Adidas are selling 2575 different products and Nike is selling 545 different products.
  5. Rest goes unclassified.

Finance table:

-- Identifying Products with Missing Financial Details
SELECT product_id
FROM finance
WHERE listing_price IS NULL AND sale_price IS NULL AND discount IS NULL AND revenue IS NULL;

-- Checking for Redundant Entries
SELECT product_id, COUNT(*)
FROM finance
GROUP BY product_id
HAVING COUNT(*) > 1;
  1. For those same 59 out of 3178 products -> list pricing, sales price, discount, revenue details are also missing.
  2. Data types are accurate.
  3. Redundancy existed and handled accordingly.
import matplotlib.pyplot as plt
import seaborn as sns

# Load the finance data
finance_df = pd.read_csv('./finance.csv')

# Function to plot a Box plot for each numerical column
def plot_boxplots(df):
numeric_cols = df.select_dtypes(include=['float64', 'int64']).columns
for col in numeric_cols:
plt.figure(figsize=(10, 4))
sns.boxplot(x=df[col])
plt.title(f'Box plot of {col}')
plt.show()

# Plotting the box plots
plot_boxplots(finance_df)

4. Outliers exits in numerical columns and handled them accordingly with the help of Box plot.

Product Table:

  1. Data types are accurate.
  2. And for the same 59 products even the product name and description as well is missing.
  3. Product table is a big mess as the product name and description does not really help in giving a clear picture about the product a particular brand is selling.
  4. Sliced and diced the data, made transformations by creating the new columns such as brand type, model, purpose of product, gender categorization through a single column.

Reviews table:

  1. As usual for the same 59 products, the ratings and reviews are also not available.
  2. Reviews and ratings are organized in a random manner, categorized them as per our analysis.
  3. Data types are accurate.

Traffic on last visit data Table:

-- Count of Products Without Last Visited Date
SELECT COUNT(*) AS MissingLastVisitedCount
FROM traffic_v3
WHERE last_visited IS NULL;

-- Percentage of Products Missing Last Visited Date
SELECT (COUNT(*) / 3180.0) * 100 AS MissingLastVisitedPercentage
FROM traffic_v3
WHERE last_visited IS NULL;

-- Identify Product IDs with Missing Last Visited Date
SELECT product_id
FROM traffic_v3
WHERE last_visited IS NULL;

-- Cross-Referencing with Other Tables
SELECT t.product_id
FROM traffic_v3 t
LEFT JOIN brands_v2 b ON t.product_id = b.product_id
WHERE t.last_visited IS NULL AND b.product_id IS NULL;
  1. Out of 3180 products, 251 products doesn’t have their last visited date making 7.92% of data is missing.
  2. In the traffic table there is mix of missing data for last visited date, Irrespective of whether the data related to product ID’s exists or not (in previous tables) but for few of them last visited date is missing.
  3. From the observations found out that these are all same product ID’s(This is an important factor to check so that we can impute or filter out those rows from the data)
  4. All the 59 products which doesn’t have any data in the data set are same in all the tables, except in traffic table Because out of 59, 52 have last visited data but for 7 of them complete data is missing.

(At a high level, from the above observations, 2.2% of product data was missing in all the above tables)

Since the problem statement we are dealing with is revenue optimization can’t afford to take any wrong directions in imputing or filtering the data.

-- Create a Temporary Table with Product IDs to Delete
CREATE TEMPORARY TABLE temp_product_ids (product_id VARCHAR(255));

-- Delete from the brands table
DELETE FROM brands_v2
WHERE product_id IN (SELECT product_id FROM temp_product_ids);;

-- Delete from the finance table
DELETE FROM finance
WHERE product_id IN (SELECT product_id FROM temp_product_ids);;

-- Delete from the product information table
DELETE FROM info_v2
WHERE product_id IN (SELECT product_id FROM temp_product_ids);;

-- Delete from the reviews table
DELETE FROM reviews_v2
WHERE product_id IN (SELECT product_id FROM temp_product_ids);;

-- Delete from the traffic table
DELETE FROM traffic_v3
WHERE product_id IN (SELECT product_id FROM temp_product_ids);;
  1. But since data coming from these particular 59 Products Does not really add any value on to the Analysis, can confidently filter them out.
  2. The reason behind it is, if a certain product id does not hold any characteristics and no transactional data without any classification, keeping them alive in the date can cause a lot of mishaps and can lead us giving wrong results.
  3. So in order to avoid all those, we have filtered those 59 product ID’s from the analysis.

Insights and findings

Insights and Findings (Source)

Identifying gaps in our data is the first step to ensuring data quality and accuracy.

  1. Explored data completely and filtered out the unnecessary data from data set.
  2. Outliers are identified and categorized accordingly.
  3. Text transformations are done, date columns are transformed and organized, transformed few integer columns to make them accurate and binned the columns in a channelized way.
  4. Nulls and missing data are handled.

Amongst the products for which the entire data set absent apart from last visited date:
i. 26 of them belong to 2018 as their last visited date.

ii. 19 products belong to 2019 as their last visited data

iii. 7 belongs to 2020 as their last visited date
iv. And for 7 of them complete data is empty, so we can remove these from the data.

sing a set of subqueries filtered out the product ID’s which actually doesn’t add any value to the data.

Questions Asked:

  1. Nike vs. Adidas Pricing:
-- Aggregate Revenue by Brand
SELECT b.brand, SUM(f.revenue) AS TotalRevenue
FROM finance f
JOIN brands_v2 b ON f.product_id = b.product_id
WHERE b.brand IN ('Adidas', 'Nike')
GROUP BY b.brand;

-- Calculate Revenue Share
WITH BrandRevenue AS (
SELECT b.brand, SUM(f.revenue) AS TotalRevenue
FROM finance f
JOIN brands_v2 b ON f.product_id = b.product_id
WHERE b.brand IN ('Adidas', 'Nike')
GROUP BY b.brand
),
TotalRevenue AS (
SELECT SUM(TotalRevenue) AS Total
FROM BrandRevenue
)
SELECT br.brand, br.TotalRevenue, (br.TotalRevenue / tr.Total) * 100 AS RevenueShare
FROM BrandRevenue br, TotalRevenue tr;

Exploring how the pricing strategies of two major brands, Nike and Adidas, compare and influence revenue.

From the perspective of Revenue market share:

  1. Adidas holds — 89% of revenue share
  2. Nike holds — 11% of revenue share

This is because of a simple fact that Adidas sells 2575 products and Nike sells only 545 products, making Adidas the market leader while comparing both the brands both in terms of revenue and quantity sold so far.

2. Pricing Strategies:

-- Adidas with 0% Discount and Its Revenue
SELECT SUM(f.revenue) AS TotalRevenue
FROM finance f
JOIN brands_v2 b ON f.product_id = b.product_id
WHERE b.brand = 'Adidas' AND f.discount = 0;

-- Adidas with 0.5% Discount, Its Sales and Revenue
SELECT SUM(f.revenue) AS TotalRevenue, COUNT(*) AS TotalSales
FROM finance f
JOIN brands_v2 b ON f.product_id = b.product_id
WHERE b.brand = 'Adidas' AND f.discount = 0.5
GROUP BY f.discount;

-- Nike’s Revenue by Its Single Discount Strategy
SELECT f.discount, SUM(f.revenue) AS TotalRevenue
FROM finance f
JOIN brands_v2 b ON f.product_id = b.product_id
WHERE b.brand = 'Nike'
GROUP BY f.discount;

i. Adidas with 0 percent discount achieved the highest revenue, with average selling of its goods
ii. But Adidas with 0.5 percent discount has achieved a highest sales, with second highest revenue
iii. Nike introduced only single discount and the revenue so far generated is by the same discount strategy.

Labeling Price Ranges: Categorizing products into price ranges for a better understanding of customer preferences.

SELECT 
brands_v2.brand,
CASE
WHEN brands_v2.brand = 'Adidas' AND sale_price BETWEEN 21 AND 40 THEN 'Adidas: 21$-40$'
WHEN brands_v2.brand = 'Adidas' AND sale_price BETWEEN 41 AND 60 THEN 'Adidas: 41$-60$'
WHEN brands_v2.brand = 'Adidas' AND sale_price BETWEEN 61 AND 80 THEN 'Adidas: 61$-80$'
WHEN brands_v2.brand = 'Nike' AND sale_price BETWEEN 101 AND 149 THEN 'Nike: 101$-149$'
WHEN brands_v2.brand = 'Nike' AND sale_price > 149 THEN 'Nike: 149$+'
ELSE 'Other'
END AS PriceRange,
SUM(revenue) AS TotalRevenue,
COUNT(*) AS TotalSales
FROM finance
JOIN brands_v2 ON finance.product_id = brands_v2.product_id
WHERE brands_v2.brand IN ('Adidas', 'Nike')
GROUP BY brands_v2.brand, PriceRange;

3. Pricing Categorization:

1. Adidas with products that fit into the category of 21$-40$ as the sale price generated the highest revenue with highest goods sold as well.
2. And Adidas Products that fit into category of 41$-60$ and 61$-80$ have a liner difference in terms of revenue and quantity sold.
3. While coming to Nike, the products that fit into category of 101$-149$ generates more revenue as well as highest quantity sold.
4. And products with 149$ + as their sale price generates second most highest in terms of revenue and quantity sold.

With this there is a point vital that, Adidas is making the money on the products with Average Sale price, while Nike starting sale price is way much higher than Adidas.

-- Count and Revenue of Products Priced Above $149 (Outliers) for Each Brand
SELECT
brands_v2.brand,
COUNT(*) AS OutlierProductCount,
SUM(finance.revenue) AS OutlierRevenue
FROM finance
JOIN brands_v2 ON finance.product_id = brands_v2.product_id
WHERE sale_price > 149
GROUP BY brands_v2.brand;

-- Total Revenue and Market Share of Outliers
WITH TotalRevenue AS (
SELECT
brands_v2.brand,
SUM(finance.revenue) AS BrandRevenue
FROM finance
JOIN brands_v2 ON finance.product_id = brands_v2.product_id
GROUP BY brands_v2.brand
), OutlierRevenue AS (
SELECT
brands_v2.brand,
SUM(finance.revenue) AS OutlierRevenue
FROM finance
JOIN brands_v2 ON finance.product_id = brands_v2.product_id
WHERE sale_price > 149
GROUP BY brands_v2.brand
)
SELECT
tr.brand,
orr.OutlierRevenue,
tr.BrandRevenue,
(orr.OutlierRevenue / tr.BrandRevenue) * 100 AS OutlierRevenueShare
FROM TotalRevenue tr
JOIN OutlierRevenue orr ON tr.brand = orr.brand;
  1. The sale price greater than 149 are considered as outliers
  2. Initially I had the thought of removing those outliers, but after checking on the correlation of sale price with revenue and count of products it holds, resisted the thought.
  3. Because numerically there are 124 products that are considered as outliers and significantly they holds 10–15% of the market revenue share in each of the brands.

4. Average Discount by Brand:

Analyzing how discounts vary across different brands and their impact on sales.

SELECT 
brands_v2.brand,
AVG(finance.discount) * 100 AS AverageDiscountPercentage
FROM finance
JOIN brands_v2 ON finance.product_id = brands_v2.product_id
WHERE brands_v2.brand IN ('Adidas', 'Nike')
GROUP BY brands_v2.brand;
  1. Adidas -> 0.3%
  2. Nike -> 0%

These Values impact on Sales and Revenue was already discussed earlier.

Correlation between Revenue and Reviews: Investigating the relationship between customer reviews and revenue.

Rating category (General Perspective & Brand Specific):

  1. Rating value fitting the criteria “2–4” generated 6591090 revenue with 128204 quantity sold by the market share of 53.47
  2. Rating value fitting the criteria “4+” generated 4448937 revenue with 83232 quantity sold by the market share of 36.09
  3. Rating value fitting the criteria 0–2 generated 1287354 revenue with 27849 quantity sold by the market share of 10.44
Rating Revenue Results
Aggregated Brand Rating Revenue

Reviews category (General Perspective):

  1. Review fitting the criteria “51–80” mean which has got 51–80 reviews as a product generated 4600147 revenue with the market share of 37.31
  2. Review fitting the criteria “81–100” generated 3986071 revenue with the market share of 32.33
  3. Review fitting the criteria “21–50” generated 2814547 revenue with the market share of 22.83
Review Category Revenue
Aggregated Review Category Results

5. Reviews by month / year vs Brand:

Unveiling trends in reviews over time, coupled with brand-specific insights.

6. Clothing and Product Performance:

Evaluating the performance of clothing and Footwear products.

Nike Product Performance:

-- Nike Apparel Products Performance
SELECT
COUNT(*) AS TotalProductsSold,
SUM(f.revenue) AS TotalRevenue
FROM finance f
JOIN brands_v2 b ON f.product_id = b.product_id
JOIN info_v2 i ON f.product_id = i.product_id
WHERE b.brand = 'Nike' AND i.category = 'Apparel';

-- Nike Clothing Products Performance
SELECT
COUNT(*) AS TotalProductsSold,
SUM(f.revenue) AS TotalRevenue
FROM finance f
JOIN brands_v2 b ON f.product_id = b.product_id
JOIN info_v2 i ON f.product_id = i.product_id
WHERE b.brand = 'Nike' AND i.category = 'Clothing';

After evaluating the performance of Clothing and Footwear products for Nike as well the results concluded that.

Nike with apparel products generated a revenue of 438997 dollars with 456 products being sold.

Nike with clothing products generated a revenue of 1023 dollars by selling close to 4 products.

And In Specific the product / product categories that generated a good amount revenue for Nike and Adidas follows as:

  1. Nike sells most of its products under Men section and drilling down through Men section on the specific products Air Jordan Retro, Air Jordan Rester, Air Max and Nike Air Force version generated good amount revenue. While this stays for Men section there are very few options for women and among them Nike Day break, Nike Benassi, Nike Flessi generated good revenue.
  2. And now coming on to the Adidas products, Adidas has equal products to both Men and Women and under Men section and while drilling down on it the categories of Outdoor apparel, Originals, Indoor, Training titled category products generated a healthy revenue.

Below are the Products / Product Categories that went unsold and generated zero revenue for Nike as a product.

These Products have not contributed to Nike’s Revenue in any way and since they have variants in the product size and model, product name has been repeated multiple times.

SELECT 
i.product_name,
SUM(f.revenue) AS TotalRevenue
FROM finance f
JOIN brands_v2 b ON f.product_id = b.product_id
JOIN info_v2 i ON f.product_id = i.product_id
WHERE b.brand = 'Nike'
GROUP BY i.product_name
HAVING SUM(f.revenue) = 0
ORDER BY i.product_name;
output 

Air Jordan 1 Jester XX Low Laced SE, Air Jordan 1 Mid, Air Jordan 11 Low,....,
NikeCourt Air Max Vapor Wing Premium, NikeCourt Blanc, NikeCourt Lite 2, NikeCourt Royale SL, PG 4 EP, Zoom LeBron 3 QS

len = 187 products

Similarly, Products that generated zero revenue and went unsold for Adidas as product:

SELECT 
i.product_name,
SUM(f.revenue) AS TotalRevenue
FROM finance f
JOIN brands_v2 b ON f.product_id = b.product_id
JOIN info_v2 i ON f.product_id = i.product_id
WHERE b.brand = 'Adidas'
GROUP BY i.product_name
HAVING SUM(f.revenue) = 0
ORDER BY i.product_name;
Output

Men’s adidas Running Puaro Shoes, MEN’S ADIDAS RUNNING PUARO SHOES,..........,
Men’s adidas Originals Pusha T Ozweego Shoes.

len = 25 products

Based on our in-depth data analysis, we present key recommendations for Adidas and Nike to facilitate a smooth transition into the e-commerce sector. These insights are designed to enhance their digital strategy, improve customer engagement, and boost online revenue.

Revenue Optimization Recommendations for Adidas and Nike

For Adidas:

  1. Focus on High-Revenue Product Categories: Adidas should capitalize on its high-revenue generating product categories, investing in new versions to sustain and grow its market share.
  2. Product Rationalization: It’s advisable for Adidas to reduce or halt the production of underperforming products. This strategic move will prevent resource wastage in warehousing, production, distribution, and supply chain management.
  3. Optimize Discount Strategies: With a current operating discount of 0.5%, Adidas could slightly increase discounts on the right product categories to boost sales volumes, thereby enhancing brand perception and market influence.

For Nike:

  1. Expand Product Accessibility: Despite healthy revenues, Nike faces low sales volumes in certain categories. To address this, Nike should consider reducing prices and introducing a starting discount of around 0.3%. This strategy could make their products more accessible and improve brand reach.
  2. Leverage Seasonal Trends: Both Nike and Adidas experience sales spikes during September to November. Capitalizing on this trend with additional discounts and boosted production during these months can significantly enhance revenue.

General Strategies for Both Brands:

  1. Emphasize Well-Rated Products: Products with ratings between “2–4” and “4+” should be prioritized in production and marketing efforts as they are likely to sell more.
  2. Focus on Highly Reviewed Products: Products that garner reviews in the “51–80” and “80–100” range tend to have a higher revenue share. Focusing on these products could result in increased sales and revenue.

If you enjoyed the journey follow me on medium for more like this.

want to collaborate? connect me through Linkedin.

--

--

Vtantravahi

👋Greetings, I am Venkatesh Tantravahi, your friendly tech wizard. By day, I am a grad student in CIS at SUNY, by night a data nerd turning ☕️🧑‍💻 and 😴📝