Many consumer real-estate products present a single forecast value. We instead estimate a distribution over future parcel values, because local housing markets are noisy, heterogeneous, and exposed to shocks that point forecasts compress into one number. Homecastr's pipeline produces probabilistic forecasts for over 150 million parcels.
1. Data and MLOps
At national scale, manual ETL does not hold up. Our pipeline ingests county assessment rolls, parcel geometries, and macro features from multiple public sources, including:
- Harris County Appraisal District (HCAD): Parcel-level appraisal records for the Houston metro area.
- Florida Dept. of Revenue (DOR): Statewide NAL/NAP/SDF property records across all 67 counties.
- NYC DOF RPAD: Assessment roll data for all five boroughs, joined with MapPLUTO geometries.
- TxGIO: Texas statewide property data from the Comptroller's office.
- American Community Survey (ACS): Census-tract-level demographic and housing features used for nationwide coverage.
- FRED Macroeconomic Series: Eight time series including 30-year mortgage rates, federal funds rate, CPI, 10-year treasury yield, oil prices, unemployment, VIX, and global economic policy uncertainty.
Transformed features are stored in PostGIS, Supabase, and Redis to support both batch training and low-latency serving.
2. Model Architecture
In our experiments, gradient-boosted baselines produced competitive point forecasts but did not capture multi-horizon uncertainty as well as a generative approach. Our current model combines an FT-Transformer tabular encoder (a self-attention model over heterogeneous features) with learned spatial tokens that summarize nearby parcel context, and a Schrödinger Bridge diffusion decoder for multi-horizon uncertainty.
The feature set spans several categories: structural attributes (living area, land area, year built, bedrooms, bathrooms, stories), locational coordinates (latitude, longitude), and macroeconomic indicators (mortgage rates, federal funds rate, CPI, oil prices, unemployment, VIX, 10-year treasury, global economic policy uncertainty). We use DDIM (Denoising Diffusion Implicit Models) sampling to generate percentile paths (P10, P50, P90) directly, with loss normalized independently at each forecast horizon, instead of fitting uncertainty after the point forecast is produced.
3. Evaluation
Skipping offline evaluation raises the risk of unnoticed performance drift. We evaluate each model candidate on held-out years across two axes: point-forecast accuracy against an industry baseline, and probabilistic calibration.
1-Year Error (vs. Zillow Baseline)
Lower 1-year held-out error than the Zillow baseline in our test set.
Long-Horizon Stability
Median Absolute Error remained roughly stable over a 4-year forecast horizon.
Beyond standard error metrics, all model candidates pass through a calibration suite before promotion to production: PIT (Probability Integral Transform) histograms to verify that forecast quantiles are uniformly distributed, empirical interval coverage checks (does the 80% band actually contain ~80% of outcomes?), and tail calibration tests to ensure the P10 and P90 bands are not systematically too narrow or too wide.
Backtest Coverage — NYC Upper West Side (ZCTA 10025)
Each colored band shows what the model predicted (P10–P90) from a given origin year. The solid line shows what actually happened. Well-calibrated predictions should consistently cover the actuals. Click legend items to toggle vintages.
4. Explainable Forecasts
Accuracy alone is not enough if users cannot understand why a forecast changed. We run a post-hoc attribution step we call Surrogate SHAP: a gradient-based method that extracts approximate local feature attributions from the FT-Transformer layers and maps them to readable variables (e.g., interest rates, local zoning changes, demographic shifts).
These attributions feed the "Explainable Forecasts" UI on the platform, giving users per-forecast breakdowns of which inputs contributed most to the predicted trajectory.
5. Serving and Cost
A model is less useful if inference cost or latency is too high for production. Once the offline pipeline completes, we split the workload into deterministic shards and run them in parallel to produce the full set of probabilistic arrays.
On the frontend, we use a three-tier server-side cache (Supabase, GCS, and Google APIs) with localized LRU caches for Street View images. This reduces our effective image cost to $0.007 per image and keeps typical render times below one second on tested mobile devices.
Interested in the engineering challenges behind Homecastr? Feel free to connect on LinkedIn.
