A comprehensive analysis and interactive dashboard for e-commerce data, featuring both Jupyter notebook exploration and a professional Streamlit dashboard.
This project provides tools for analyzing e-commerce performance metrics including revenue trends, customer behavior, product performance, and geographic distribution of sales. The analysis examines data from 2022-2023 with the ability to filter by custom date ranges.
- EDA_Refactored.ipynb - Comprehensive Jupyter notebook with exploratory data analysis
- dashboard.py - Interactive Streamlit dashboard with real-time filtering
- data_loader.py - Utility functions for loading and preparing datasets
- business_metrics.py - Functions for calculating key business metrics
- ecommerce_data/ - Raw CSV datasets directory
The Streamlit dashboard provides a professional, interactive interface with:
- Application title
- Global date range filter (applies to all visualizations)
- Total Revenue - Total sales with year-over-year growth trend
- Monthly Growth - Month-over-month revenue change percentage
- Average Order Value (AOV) - Mean transaction value with trend indicator
- Total Orders - Order volume with trend comparison
Each KPI card displays:
- Large, readable metric value
- Trend indicator (↑ for growth, ↓ for decline)
- Percentage change with color coding (green for positive, red for negative)
-
Revenue Trend Line Chart
- Solid line for current period
- Dashed line for previous period (comparison)
- Grid lines for easier reading
- Y-axis formatted as $300K, $2M, etc.
-
Top 10 Categories Bar Chart
- Horizontal bar chart sorted in descending order
- Blue gradient coloring based on revenue amount
- Values formatted as $300K, $2M
-
Revenue by State (US Choropleth Map)
- Color-coded by revenue amount
- Blue gradient color scale
- Interactive hover details
- Albers USA projection for accurate representation
-
Customer Satisfaction vs Delivery Time
- Bar chart showing average review scores
- Delivery time buckets (1-3 days, 4-7 days, etc.)
- Demonstrates correlation between faster delivery and higher satisfaction
-
Average Delivery Time
- Large display of average days to delivery
- Trend indicator showing improvement/decline vs previous period
- Color-coded trend arrow
-
Review Score
- Large average review score (out of 5.0)
- Star rating display
- Subtitle: "Average Review Score"
orders_dataset.csv
order_id- Unique identifier for each ordercustomer_id- Identifier linking to the customerorder_status- Current status (canceled, delivered, pending, processing, shipped, returned)order_purchase_timestamp- Date and time order was placedorder_delivered_customer_date- Date order was delivered
order_items_dataset.csv
order_id- Links to orders tableproduct_id- Links to products tableprice- Price of individual itemfreight_value- Shipping cost for item
products_dataset.csv
product_id- Unique product identifierproduct_category_name- Product categoryproduct_description_length- Description length
customers_dataset.csv
customer_id- Unique customer identifiercustomer_state- State location (US states)customer_city- City location
order_reviews_dataset.csv
review_id- Unique review identifierorder_id- Links to orders tablereview_score- Rating from 1-5review_creation_date- When review was posted
-
Clone or download the repository
-
Install dependencies
pip install -r requirements.txt
-
Ensure data files are in place
- Place CSV files in the
ecommerce_data/directory
- Place CSV files in the
streamlit run dashboard.pyThe dashboard will open in your browser at http://localhost:8501. Use the date range filter in the header to dynamically update all visualizations.
jupyter notebook EDA_Refactored.ipynbThe notebook provides detailed exploratory analysis with configuration parameters at the top for adjusting the analysis period.
The dashboard calculates and displays:
- Total Revenue - Sum of all order values
- Revenue Growth - Year-over-year percentage change
- Average Order Value (AOV) - Mean revenue per order
- Total Orders - Count of delivered orders
- Average Delivery Speed - Days from order to delivery
- Average Review Score - Customer satisfaction rating (1-5 scale)
- Revenue by Category - Sales distribution across product categories
- Revenue by State - Geographic sales distribution
- Delivery Time vs Satisfaction - Correlation analysis
- Streamlit - Interactive dashboard framework
- Plotly - Interactive charting library
- Pandas - Data manipulation and analysis
- NumPy - Numerical computing
- Matplotlib - Static visualization (EDA notebook)
- Professional card-based layout
- Consistent color scheme (blue primary, red/green for trends)
- Responsive design that adapts to different screen sizes
- Custom HTML/CSS for KPI cards
- Plotly's built-in interactivity for charts
Based on 2022-2023 data analysis:
- Revenue Performance - Slight decline of 2.46% year-over-year
- Order Volume - 2.40% decrease in order count
- AOV Stability - Average order value remained stable at ~$725
- Customer Satisfaction - Consistent 4.1/5.0 average review score
- Delivery Performance - Maintained ~8 days average delivery
- Geographic Distribution - Sales across all US states with California leading
Use the date filters in the dashboard header to analyze any period within your dataset.
Edit the color definitions in dashboard.py:
COLOR_PRIMARY- Primary chart color- Plotly colorscales for gradients
Use the calculate_* functions in business_metrics.py to add new KPIs or charts.
Issue: Data not loading
- Ensure CSV files are in the
ecommerce_data/directory - Check that filenames match exactly in
data_loader.py
Issue: Charts not displaying
- Verify Plotly is installed:
pip install --upgrade plotly - Clear Streamlit cache:
streamlit cache clear
Issue: Date filter not affecting charts
- Check that the date range is within your dataset's date boundaries
- Verify data preparation filters in
prepare_data()function
- Customer segmentation analysis
- Predictive revenue forecasting
- Product performance drill-down
- Customer lifetime value calculations
- Inventory optimization insights
- Marketing attribution analysis
This project is provided as-is for educational and analytical purposes.
For questions or issues, refer to the inline documentation in:
business_metrics.py- Metric calculation functionsdata_loader.py- Data loading and preparationdashboard.py- Dashboard structure and components