Delivery Performance & Customer Satisfaction in E-commerce

Dataset Context

This project uses a public dataset from a Brazilian e-commerce platform (thanks to Kaggle), containing detailed information on over 100,000 orders placed between 2016 and 2018. The dataset includes:

  • Customer data (location, zip codes)

  • Order lifecycle (timestamps for purchase, approval, shipping, delivery)

  • Product and seller details (category, price, freight, dimensions)

  • Payments (value, type, installments)

  • Customer reviews (rating scores and comments)

The data is stored across 9 interrelated tables, making it ideal for practicing advanced SQL, performing end-to-end data analysis, and visualizing insights across multiple business dimensions.

Project Details

This project explores how delivery delays influence customer satisfaction using real-world e-commerce data. I built a robust data pipeline using MySQL 8.0, applying advanced SQL techniques like CTEs, window functions, and index optimization to join and transform data from multiple tables including orders, reviews, products, and sellers.

The project focuses on:

  • Measuring delivery delays by seller, product category, and region

  • Analyzing the effect of delays on review scores and customer sentiment

  • Highlighting underperforming sellers and logistical bottlenecks

Key tools used:

This project showcases how data can drive actionable decisions in logistics and customer experience. View the full project on GitHub