In the second China International Consumer Goods Fair held in Shanghai, many companies hold new product launches, and iFLYTEK's smart voice recorder is one of the most eye-catching products. It can not only support high-quality recording and efficient text transcription, but also recognize 10 languages and 12 dialects. The recognition rate of Chinese online transcription is as high as 98%, and offline transcription is also supported. What is even more surprising is that the voice recorder is equipped with a text recognition OCR (Optical Character Recognition) camera. After shooting the desired content through the camera, the image can be freely cropped to help improve the accuracy of recognition. This voice recorder uses text recognition technology to further increase the distance with consumers. In recent years, text recognition has become a ubiquitous convenience helper.
Application Scenarios of OCR Text Recognition
Document identification: mainly identify document information, used for identification of more than 20 kinds of documents such as ID cards, passports, driver's licenses, etc. Currently, there are document collectors, passport readers, access control and attendance machines, personal ID scanners, and mobile terminal identification.
Bank card identification: It mainly identifies the bank card number, which is used to bind the card for mobile payment and improve the APP user experience. Support the identification of credit cards and debit cards of various domestic banks.
License plate recognition: It mainly recognizes vehicle feature information such as license plate number, color, type, car logo, body color, etc. It is used in mobile police, lane parking, parking lot management, car insurance and other fields.
Business card recognition: It mainly recognizes the content of business cards, and is used in mobile exhibition, CRM customer management system and other fields.
Business license identification: It mainly identifies business license information, and is used in fields that need to replace manual extraction of business license information.
Vehicle VIN code identification: mainly identify the frame number (vehicle VIN code), which is used in the fields of car management, car service, second-hand car transaction, car rental and other fields.
Recognition of bills: It mainly identifies the contents of bills in different formats such as VAT invoices, which are used in financial management, automobiles, banking, finance and other fields.
Document text recognition: It mainly recognizes document text, and is used in libraries, newspapers, etc. for texts such as books, newspapers, magazines, etc., and other fields that need to digitize paper documents.
Application Difficulties of OCR Text Recognition
Generally speaking, document recognition is relatively simple, followed by text recognition of general documents and natural scenes, and finally text recognition of general forms.
Relatively speaking, certificate images have more constraints, or the problem space is smaller. For example, in the "gender" column of the ID card, there are only two possibilities of "male" or "female". In terms of layout, the second-generation ID card is currently the main one, with a single layout and a certain font, and the text recognition accuracy rate is higher.
However, there are also difficulties in identification. For example, when identifying the names of people and places, the biggest risks are user privacy and data compliance issues. For this, data synthesis is required, but how to better synthesize data that is effective for the model? Poor data synthesis will result in incorrect display and reduced recognition rate.
The difficulty of a generic document is how to structure it well. such as resume identification. Imagine all kinds of layouts, but key-value pairs are pretty much enumerable. It should not be easy to give you a plain text version of your resume, and it should not be easy to use NLP to structure various styles, let alone a non-text version.
The difficulty of natural scenes lies in the complex and diverse backgrounds, various fonts, occlusion, lighting, multi-scale, and how to quickly train in large batches. Another feature of current natural scenes is that there will be noise near the target text area (for example: building signs There are advertisements nearby), making the analysis and structure of target information is also a pain point and difficulty.
Table recognition is the most difficult, because of the high similarity of styles between tables and the extreme error-proneness of cell reasoning (for multi-line intensive, basically one line is wrong, and the whole table is finished), while the borderless table reasoning recognition is even more difficult.
OCR Text Recognition Data Resources Are Scarce
As the saying goes, "human material needs are the driving force for the development of productivity", and the above-mentioned landing scenarios promote the rapid development of OCR technology.
At present, the deep learning algorithm has become the SOTA scheme with the theme of OCR. The current deep learning OCR algorithm adopts the above two-stage mode: text detection + text recognition. These deep learning algorithms require large amounts of data to train. Data is always a prerequisite for deep learning algorithms to achieve great results.
However, because scene data is not easy to obtain, many life scene data are private, and labeling is difficult and requires professional teams to clean and label. The scarcity of data hinders the implementation and development of OCR technology. Therefore, engineers need to use the power of professional data companies to obtain more accurate OCR image data, thereby accelerating the scientific research process and the possibility of technology landing.
With its global resources in more than 100 countries, Magic Data supports the AI industry in AI data collection and annotation efficiently and effectively and it is trusted by over 200 top AI companies around the world.
Magic Data is ISO/IEC 27001 & ISO/IEC 27701:2019 accredited and GDPR compliant.
For more information, contact firstname.lastname@example.org.