We develop a framework to conduct experiments for estimating direct and spillover effects when units are grouped into mutually exclusive clusters. Crucially, our framework accounts for heterogeneous treatment effects across clusters and heterogeneous cluster sizes, which are pervasive in empirical settings but typically ignored in experimental design. We show that failing to account for cluster heterogeneity in experimental design can severely overestimate power and underestimate minimum detectable effects. We study the large-sample behavior of OLS estimators for direct and spillover effects with heterogeneous clusters and use our results to derive simple formulas to calculate power, minimum detectable effects and optimal cluster assignment probabilities. We also set up a potential outcomes framework that justifies interpreting OLS estimands as causal effects. We apply our methods to design a large-scale experiment to estimate the spillover effects of a communication campaign on property tax compliance. We find an increase in tax compliance among individuals directly targeted with our mailing, as well as compliance spillovers on untreated individuals in street blocks with a high proportion of treated taxpayers.