Colorado Solar Sales Booster Shane Rodgers Graduate Project 3.11.25
Overview This project uses Python and simple machine learning to identify high-potential ZIP codes for solar panel sales in Colorado. It demonstrates how data-driven targeting can improve marketing efficiency. Project Description The Colorado Solar Sales Booster project: • Analyzes solar irradiance, home ownership, and property values across multiple regions • Uses a decision tree model to predict solar potential • Creates visualizations to guide marketing strategy • Identifies specific ZIP codes and regions for targeted campaigns Running the Jupyter Notebook To run this project: 1 Install required packages: pip install pandas numpy matplotlib seaborn scikit-learn 2 Create a new Jupyter notebook or open the provided Colorado_Solar_Sales_Booster.ipynb 3 Run the cells sequentially to generate the analysis and visualizations Project Results Running this notebook will produce: 1 A data analysis of Colorado ZIP codes and their solar potential 2 Feature importance analysis showing which factors most affect solar potential 3 Decision tree visualization explaining the prediction process 4 Regional comparison across key metrics 5 Identification of top ZIP codes for targeted marketing 6 Saved PNG files of all visualizations for use in presentations Code Sample Here's a glimpse of the code used in this project:
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.tree import DecisionTreeRegressor from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error, r2_score from sklearn import tree
sns.set_style("whitegrid") plt.rcParams['figure.figsize'] = (10, 6)
def create_sample_data(n_samples=30): """Create a sample data file with Colorado ZIP codes for demonstration""" import random
# Set random seed for reproducibility
random.seed(42)
np.random.seed(42)
regions = ["Denver Metro", "Front Range", "Western Slope"]
data = []
# Generate sample data
for i in range(n_samples):
region = regions[i % 3]
if region == "Denver Metro":
solar = round(random.uniform(5.3, 5.7), 1)
owner = round(random.uniform(45, 68), 1)
home_value = random.randint(400000, 900000)
elif region == "Front Range":
solar = round(random.uniform(5.5, 5.9), 1)
owner = round(random.uniform(60, 80), 1)
home_value = random.randint(350000, 750000)
else: # Western Slope
solar = round(random.uniform(5.8, 6.2), 1)
owner = round(random.uniform(65, 85), 1)
home_value = random.randint(300000, 650000)
zip_code = f"80{random.randint(100, 999)}"
data.append({
"Zip_Code": zip_code,
"Region": region,
"Solar_Irradiance": solar,
"Owner_Occupied_%": f"{owner}%",
"Median_Home_Value": f"${home_value:,}"
})
# Create dataframe
df = pd.DataFrame(data)
return df
Notes for Presentation This project is designed for a PowerPoint presentation that focuses on: • How data-driven targeting can improve marketing efficiency • Which factors matter most for solar potential • Regional differences across Colorado • Specific ZIP codes to target for marketing campaigns The visualizations generated by this notebook should be incorporated into your presentation for maximum impact. Requirements • Python 3.7+ • pandas • numpy • matplotlib • seaborn • scikit-learn • Jupyter Notebook Author Shane Rodgers License This project is licensed under the MIT License - see the LICENSE file for details.