Overview

Housing price prediction is critical for buyers, sellers, lenders, and urban planners.
In this project, we developed multiple predictive models to estimate housing price per unit area in Taipei City, Taiwan.

The goal was not only to maximize prediction accuracy, but also to interpret which urban features most strongly influence real estate valuation.


Dataset

We used a publicly available dataset collected by Cheng Yeh (2018), containing:


Exploratory Data Analysis (EDA)

The report includes distribution plots for all major predictors (age, MRT distance, store count, latitude/longitude, price).

EDA Feature Distributions

Modeling Approaches

We implemented and compared four regression models:

1. Baseline Linear Regression

A multiple linear regression explained ~62.5% of price variance.


2. Relaxed LASSO Regression

LASSO selected four key predictors:

Performance remained similar to baseline:


3. Neural Network (Nonlinear Model)

To capture nonlinear interactions, we trained a feedforward neural network:

This improved prediction accuracy substantially:

NN Training vs Validation Loss


4. Random Forest (Best Model)

A tuned random forest achieved the strongest predictive performance:

Random Forest Feature Importance


Key Findings

Across all nonlinear models, the most important driver of housing price was:

Distance to MRT station (dominant factor)

Both neural network permutation importance and random forest feature rankings consistently placed MRT proximity as the strongest predictor.

Other influential features included:

NN Permutation Importance


Discussion

This project demonstrates that housing prices in Taipei are shaped primarily by transport accessibility and spatial structure, rather than only property-level attributes.

Why Nonlinear Models Outperformed Linear Regression

Baseline linear regression achieved an RMSE of ~7.75, explaining about 62.5% of price variance. However, both neural networks and random forests significantly improved performance:

This suggests nonlinear interactions play an important role in real-world housing valuation.

MRT Proximity as the Dominant Driver

Across both permutation importance (neural network) and feature importance rankings (random forest), the most influential factor was consistently:

Distance to the nearest MRT station

This aligns with Taipei’s metro-centered urban structure, where transit accessibility strongly drives neighborhood desirability.

Practical Implications

The results suggest housing valuation models should emphasize:

Such predictive frameworks can support:


Conclusion

We successfully built a comparative regression pipeline for Taipei housing price prediction, evaluating:

The tuned random forest achieved the strongest performance:

Most importantly, we identified MRT accessibility as the key structural driver of real estate valuation in Taipei.

Model Comparison Table


Limitations & Future Work

Despite strong predictive results, limitations remain:

Potential extensions include:

Cover Image Credit: me, bewildered about where to go next on the snow trail