Image Recognition of Vehicle Odometer Readings
Student Capstone Project
To qualify for a distance-based discount on auto insurance, the Insurance Corporation of British Columbia (ICBC), requires drivers to declare their odometer reading at the time they renew their insurance. Drivers send images of their odometer via e-mail, which are then manually reviewed by a team at ICBC, who either confirm or deny the low-kilometre discount.
To reduce the amount of manual review, ICBC tasked a group of UBC Master of Data Science (MDS) Vancouver students to develop a pipeline capable of accurately reading odometers from dashboard photographs.
ICBC provided the team with over 20,000 vehicle odometer images, which did not contain any personal or customer information. The students' main challenge was the lack of labelled data required for training image detection models. They also faced significant image variability, including differences in image quality and various types of odometers, such as mechanical and LCD screens.
The students needed a way to build a pipeline that was robust enough to learn all of these different types of dashboards and be able to review and document all of them. Their initial attempt used an off-the-shelf Optical Character Recognition (OCR) model, but they realized it wasn’t going to work because the area around the odometer was too cluttered.
The team expanded their project into two steps. For step one, the team took the raw image and, after some pre-processing, it passed through their first model to detect the odometer. Once the odometer image had been identified, the image was cropped and passed through a second model that detected the digits within. The training set that the students used for digit detection had 6,000 images, which included house numbers, water meters and labels on container ships.
For each step, the team used a YOLO (You Only Look Once) v8 model. YOLO provided object detection, image classification and instance segmentation. What the students liked about YOLO was its high rate of accuracy, that it could be deployed at low cost and had a well-structured Python package.
Evaluation of the pipeline with a test dataset of over 6,000 test images revealed a reduction in manual workload by 53%, while maintaining an accuracy level of 95%. After using another dataset of 500 images and passing it through the two models, the team found a 46% reduction in manual review at a rate of 90% precision.
The team demonstrated the pipeline through a web application capable of processing raw images and predicting odometer readings. They proposed to ICBC a strategy for continuous improvement by encouraging drivers to annotate their own images, thus generating a larger annotated dataset for model training. In addition, the team also suggested that ICBC eventually develop a web app or mobile app enabling quicker verification of odometer images and real-time image feedback.
As for future improvements to their pipeline, the team would like to increase the amount of labelled training data that would consider the different car models and years of the cars. With more data, the team believes they can improve their model and achieve even greater reductions in ICBC's manual workload.
Explore our Data Science Programs Explore Other Data in Action Stories