FVN.li Documentation Help

ratings:backfill

Backfills missing ratings by scanning all events to recover historical rating data.

Overview

This command scans through historical events to identify and import ratings that may have been missed during normal import processes. It's designed to fill gaps in rating data and ensure comprehensive coverage of all available ratings.

Key Features: Historical scanning, gap detection, batch processing, data validation.

Usage

php artisan ratings:backfill [options]

Options

--batch-size=SIZE

Number of events to process in each batch (default: 1000).

Backfill Process

The command follows this comprehensive workflow:

  1. Scans historical events from the beginning of recorded data

  2. Identifies rating events that may contain missing ratings

  3. Extracts rating data from event payloads

  4. Validates rating information for completeness and accuracy

  5. Checks for existing ratings to avoid duplicates

  6. Imports missing ratings into the database

  7. Updates aggregate statistics to reflect new data

  8. Reports backfill statistics for monitoring

Examples

php artisan ratings:backfill

Backfills missing ratings using default batch size.

php artisan ratings:backfill --batch-size=5000

Processes larger batches for faster completion (uses more memory).

php artisan ratings:backfill --batch-size=500

Uses smaller batches for systems with limited resources.

php artisan ratings:backfill -v

Shows detailed progress and statistics during processing.

When to Use

Recommended Usage Scenarios

  1. After system downtime that may have missed rating imports

  2. When discovering gaps in historical rating data

  3. During initial database setup or migration

  4. After improving rating detection algorithms

  5. For comprehensive data quality assurance

Event Scanning

The backfill process examines various types of events:

Rating Events

  • Direct Rating Events: Explicit rating submissions

  • Review Events: Reviews that include ratings

  • Update Events: Rating changes or modifications

Game Events

  • Publication Events: May include initial ratings

  • Update Events: Could contain rating information

  • Metadata Events: Sometimes include rating data

User Events

  • Profile Updates: May reference rating activity

  • Collection Changes: Could indicate rating preferences

  • Activity Events: General user activity including ratings

Data Validation

The backfill process includes comprehensive validation:

Event Validation

  • Event Integrity: Ensures events are complete and valid

  • Timestamp Validation: Verifies event timing is reasonable

  • Source Verification: Confirms events are from legitimate sources

Rating Validation

  • Score Ranges: Ensures ratings are within valid ranges (1-5 stars)

  • User Validation: Verifies rating users exist and are valid

  • Game Validation: Confirms rated games exist in the database

  • Duplicate Detection: Prevents importing duplicate ratings

Performance Considerations

Factor

Impact

Optimization

Event Volume

Processing time

Batch size tuning

Memory Usage

System resources

Batch processing

Database Load

Query performance

Efficient queries

Duplicate Checking

Processing overhead

Indexed lookups

Batch Processing

The command uses intelligent batch processing:

Batch Size Selection

  • Small Batches (100-500): Lower memory usage, slower processing

  • Medium Batches (1000-2000): Balanced performance and resource usage

  • Large Batches (5000+): Faster processing, higher memory requirements

Progress Tracking

  • Event Position: Tracks current position in event stream

  • Completion Percentage: Shows overall progress

  • Processing Rate: Events processed per minute

  • ETA Calculation: Estimated time to completion

Gap Detection

The backfill process identifies various types of gaps:

Temporal Gaps

  • Missing Time Periods: Periods with no rating imports

  • Sparse Coverage: Periods with unusually low rating activity

  • Event Sequence Gaps: Missing events in chronological sequence

Content Gaps

  • Game Coverage: Games with missing or incomplete ratings

  • User Coverage: Users whose ratings may be incomplete

  • Category Gaps: Specific game categories with missing data

Error Handling

Comprehensive error handling manages various scenarios:

Event Processing Errors

  • Malformed Events: Skips events with invalid format

  • Missing Data: Handles events with incomplete information

  • Processing Failures: Continues with remaining events on errors

Database Errors

  • Constraint Violations: Handles database constraint issues

  • Connection Problems: Manages database connectivity issues

  • Transaction Failures: Rolls back failed batch operations

Monitoring and Reporting

The command provides detailed progress reporting:

Processing Statistics

  • Events Scanned: Total number of events examined

  • Ratings Found: Number of rating events discovered

  • Ratings Imported: Successfully imported ratings

  • Duplicates Skipped: Existing ratings not reimported

Quality Metrics

  • Success Rate: Percentage of successful imports

  • Error Rate: Failed operations requiring attention

  • Data Coverage: Improvement in rating data completeness

Recovery and Resumption

The backfill process supports recovery from interruptions:

  • Position Tracking: Remembers last processed event

  • Resume Capability: Can continue from interruption point

  • State Preservation: Maintains progress across restarts

  • Checkpoint System: Regular progress checkpoints

Last modified: 01 June 2025