ratings:backfill
Backfills missing ratings by scanning all events to recover historical rating data.
Overview
This command scans through historical events to identify and import ratings that may have been missed during normal import processes. It's designed to fill gaps in rating data and ensure comprehensive coverage of all available ratings.
Key Features: Historical scanning, gap detection, batch processing, data validation.
Usage
Options
- --batch-size=SIZE
Number of events to process in each batch (default: 1000).
Backfill Process
The command follows this comprehensive workflow:
Scans historical events from the beginning of recorded data
Identifies rating events that may contain missing ratings
Extracts rating data from event payloads
Validates rating information for completeness and accuracy
Checks for existing ratings to avoid duplicates
Imports missing ratings into the database
Updates aggregate statistics to reflect new data
Reports backfill statistics for monitoring
Examples
Backfills missing ratings using default batch size.
Processes larger batches for faster completion (uses more memory).
Uses smaller batches for systems with limited resources.
Shows detailed progress and statistics during processing.
When to Use
Recommended Usage Scenarios
After system downtime that may have missed rating imports
When discovering gaps in historical rating data
During initial database setup or migration
After improving rating detection algorithms
For comprehensive data quality assurance
Event Scanning
The backfill process examines various types of events:
Rating Events
Direct Rating Events: Explicit rating submissions
Review Events: Reviews that include ratings
Update Events: Rating changes or modifications
Game Events
Publication Events: May include initial ratings
Update Events: Could contain rating information
Metadata Events: Sometimes include rating data
User Events
Profile Updates: May reference rating activity
Collection Changes: Could indicate rating preferences
Activity Events: General user activity including ratings
Data Validation
The backfill process includes comprehensive validation:
Event Validation
Event Integrity: Ensures events are complete and valid
Timestamp Validation: Verifies event timing is reasonable
Source Verification: Confirms events are from legitimate sources
Rating Validation
Score Ranges: Ensures ratings are within valid ranges (1-5 stars)
User Validation: Verifies rating users exist and are valid
Game Validation: Confirms rated games exist in the database
Duplicate Detection: Prevents importing duplicate ratings
Performance Considerations
Factor | Impact | Optimization |
|---|---|---|
Event Volume | Processing time | Batch size tuning |
Memory Usage | System resources | Batch processing |
Database Load | Query performance | Efficient queries |
Duplicate Checking | Processing overhead | Indexed lookups |
Batch Processing
The command uses intelligent batch processing:
Batch Size Selection
Small Batches (100-500): Lower memory usage, slower processing
Medium Batches (1000-2000): Balanced performance and resource usage
Large Batches (5000+): Faster processing, higher memory requirements
Progress Tracking
Event Position: Tracks current position in event stream
Completion Percentage: Shows overall progress
Processing Rate: Events processed per minute
ETA Calculation: Estimated time to completion
Gap Detection
The backfill process identifies various types of gaps:
Temporal Gaps
Missing Time Periods: Periods with no rating imports
Sparse Coverage: Periods with unusually low rating activity
Event Sequence Gaps: Missing events in chronological sequence
Content Gaps
Game Coverage: Games with missing or incomplete ratings
User Coverage: Users whose ratings may be incomplete
Category Gaps: Specific game categories with missing data
Error Handling
Comprehensive error handling manages various scenarios:
Event Processing Errors
Malformed Events: Skips events with invalid format
Missing Data: Handles events with incomplete information
Processing Failures: Continues with remaining events on errors
Database Errors
Constraint Violations: Handles database constraint issues
Connection Problems: Manages database connectivity issues
Transaction Failures: Rolls back failed batch operations
Monitoring and Reporting
The command provides detailed progress reporting:
Processing Statistics
Events Scanned: Total number of events examined
Ratings Found: Number of rating events discovered
Ratings Imported: Successfully imported ratings
Duplicates Skipped: Existing ratings not reimported
Quality Metrics
Success Rate: Percentage of successful imports
Error Rate: Failed operations requiring attention
Data Coverage: Improvement in rating data completeness
Recovery and Resumption
The backfill process supports recovery from interruptions:
Position Tracking: Remembers last processed event
Resume Capability: Can continue from interruption point
State Preservation: Maintains progress across restarts
Checkpoint System: Regular progress checkpoints
Related Commands
ratings:import - Import current ratings from itch.io
games:refresh - Refresh games that may have new ratings