San Francisco housing construction history and associated data
This repository contains historical SF housing data and R scripts to graph that data. The data here was used to generate the graphs and analysis in the blog post "Employment, construction, and the cost of San Francisco apartments", and was recently used in a paper by Stanford researchers, "The Effects of Rent Control Expansion on Tenants, Landlords, and Inequality: Evidence from San Francisco.".
Data for each year lives in the file named after the year. Later years may be listed as "craigslist-X".
You can get the rent out by running
./extract-craigslist craigslist-2016for example. Note the data is not perfect. Here are some samples in the 2016 Craigslist data, for example.
799000 Apr 29 Exceptional Pacific Heights TIC $799000 / 2br - (Pacific Heights) pic 800 Apr 29 Awesome 5 Bedroom Available $800 / 5br - 3895ft2 - (2483 N Smiderle, San Bernardino, CA) pic 99 Apr 29 Jr. 1 BD. Washer & Dryer in unit! $99 deposit $3425 / 1br - 550ft2 - (nob hill) pic map
(It's not clear if these prices have been stripped before generating the averages in the
You can combine a bunch of data sources by running the "combine" script,
./combine. This generates the
combinedfile in this repository.
The charts in the blog post are generated by running the
modelscript in this repository, on the
calc-medianscomputes the medians for each year in the file. It prints the median, 95th, and 5th percentile for each year in the dataset. These values are present in the
mediansfile in this repository.
To get the Craigslist data, open the SF rentals page, select all and copy/paste the page's contents into a text file. Keep copying every page into the same text file until done. Save this file as craigslist-YYYY-MM.
All Craigslist files should be combined into one per year, via eg:
cat craigslist-2019-* > craigslist-2019
After pulling in new data, recalculate the medians:
./calc-medians > medians