Overview

ReviewLens is a full-stack analytics and discovery platform designed to transform large-scale Amazon review data into trustworthy, interpretable product intelligence.

Modern e-commerce platforms contain millions of products and reviews, yet users still struggle to answer key reliability questions:

ReviewLens integrates product metadata, review history, and reviewer behavioral analytics into a unified exploration interface that supports both search-driven and discovery-driven workflows.

The system supports interactive product analytics, reviewer credibility modeling, and graph-based recommendation over millions of records while maintaining real-time query performance.


Dataset & Data Engineering

Dataset Scale

ReviewLens integrates two public Amazon 2018 datasets:

Review Dataset

Product Metadata Dataset

Together, the system processes 13M+ structured and semi-structured rows across multiple relational entities.


Data Preparation Pipeline

Raw Amazon data is distributed as nested JSON.GZ files requiring substantial transformation before relational ingestion.

1. Incremental Data Loading

2. Type Casting and Validation

3. Text Cleaning and Normalization

4. Nested Structure Flattening

Amazon metadata contains multiple nested arrays that were exploded into relational tables:

5. Storage Optimization


Database & Schema Design

The relational schema models the Amazon review ecosystem as a hybrid graph-relational system.

Core Tables

Product

Stores static metadata including:

Reviewer

Stores reviewer identity and behavior tracking.

Review

Composite primary key:

(asin, reviewer_id, unix_time)

Stores:

This structure ensures each review instance is uniquely traceable.

Category Dimension

Allows multi-category mapping for products through a bridge table.

Relationship Graph Tables

These tables support graph traversal queries used for recommendation ranking.


Design Advantages

The schema supports:


Platform Features

1. Product Search & Exploration

The Home interface serves as the global discovery entry.

Users can:

Each product result exposes key summary metrics:


2. Product Detail Analytics

The product detail modal integrates multiple analytical modules:

Rating Stability Visualization

Displays monthly rating trends allowing users to detect:

Helpful Review Ranking

Ranks reviews based on helpful vote count, highlighting high-signal user feedback.

Graph-Based Recommendation

Surfaces “customers also bought” relationships that allow navigation through product co-purchase networks.


3. Category-Level Product Intelligence

Category dashboards implement multiple ranking algorithms.

Top Products Ranking

Ranks products using a composite score combining:

Hidden-Gem Discovery (“Surprise Me”)

Identifies long-tail products defined as:

This feature intentionally balances popularity bias and promotes product diversity.

Always-Positive Products

Identifies products with:

This provides high-confidence reliability signals.


4. Reviewer Behavior Analytics

ReviewLens introduces reviewer-centric trust modeling.

Top Reviewer Leaderboards

Two ranking dimensions:

These metrics identify influential and active reviewers.


Reviewer Rating Style Classification

Reviewers are classified based on statistical deviation from global rating distributions:

This allows users to contextualize subjective reviewer bias.


Reviewer Profile Exploration

For each reviewer, the platform provides:


Recommendation Engine

The recommendation module leverages Amazon co-purchase graphs.

Weighted Scoring Model

Each candidate recommendation receives a composite score:

Score = w₁(popularity) + w₂(avg_rating) + w₃(review_count)

This hybrid scoring balances reliability and popularity while reducing noise from sparse products.


Query Optimization & Performance Engineering

Handling multi-million row datasets required extensive SQL optimization.

Optimization Techniques

Indexing

Materialized Aggregates

Precomputed:

Query Rewriting


Performance Improvements

Query Task Original Runtime Optimized Runtime
Graph-based recommendation >10 min 412 ms
Category ranking by rating quantiles 4m 16s 1.26 s
Reviewer classification 1m 35s 1.58 s
Hidden gem discovery 39 s 528 ms

These optimizations enabled interactive dashboard latency.


System Architecture

Backend


Frontend

Built using:

The UI supports cross-navigation between products, reviewers, and categories.


Technical Challenges

Large-Scale Relational Joins

Multi-table joins across reviews, metadata, and graph relationships required careful indexing and query decomposition.


Semi-Structured Data Normalization

Amazon metadata contained nested arrays and inconsistent formatting requiring:


Balancing Discovery vs Popularity Bias

Designing ranking metrics required blending statistical reliability and exploration heuristics.


Business Value

ReviewLens enables users to make informed purchasing decisions by providing:


Conclusion

ReviewLens demonstrates how large-scale review ecosystems can be transformed into trustworthy recommendation intelligence through:

The system delivers real-time exploration over millions of records while preserving analytical interpretability.


Future Work


Cover Image Credit: me