博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Tessnet2 a .NET 2.0 Open Source OCR assembly using Tesseract engine
阅读量:6871 次
发布时间:2019-06-26

本文共 3099 字,大约阅读时间需要 10 分钟。

http://www.pixel-technology.com/freeware/tessnet2/

 

 

Tessnet2 a .NET 2.0 Open Source OCR assembly using Tesseract engine

Keywords: Open source, OCR, Tesseract, .NET, DOTNET, C#, VB.NET, C++/CLI

Current version : 2.04.0, 02SEP09 (

The big picture

is a C++ open source OCR engine. Tessnet2 is .NET assembly that expose very simple methods to do OCR.
Tessnet2 is multi threaded. It uses the engine the same way Tesseract.exe does. Tessdll uses another method (no thresholding).

License

Tessnet2 is under Apache 2 license (like tesseract), meaning you can use it like you want, included in commercial products. You can read full license info in source file.

Quick Tessnet2 usage

  1. , add a reference of the assembly Tessnet2.dll to your .NET project.

  2. Download language data definition file and put it in tessdata directory. Tessdata directory and your exe must be in the same directory.

  3. Look at the Program.cs sample

Note: Tessnet2.dll needs Visual C++ 2008 Runtime. When deploying your application be sure to install C++ runtime (, )

Tessnet2 usage

Bitmap image = new Bitmap("eurotext.tif");

tessnet2.Tesseract ocr = new tessnet2.Tesseract();
ocr.SetVariable(
"tessedit_char_whitelist", "0123456789"); // If digit only
ocr.Init(@"c:\temp", "fra",
false); // To use correct tessdata
List<tessnet2.Word> result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word word in result)
    Console.WriteLine("{0} : {1}", word.Confidence, word.Text);

Tessnet2 source code and recompiling

  1. and expand it in a directory

  2. and expand it in Tesseract source code root directory (it should create dotnet sub directory)

  3. Open the project solution tessnet2.sln. It's a Visual Studio 2008 C++/CLI project

Memory leak

Tesseract C++ source code is full of memory leak. Using tessnet2 assembly several time will cause memory overflow. This is not tessnet2 leak, this is tesseract leak and I spent two days in tesseract source code trying to improve this with no success.

Tessnet2 demo

In the Tessnet2 source code you have two C# demo project. TesseractOCR is a multi-tread WinForm demo with a progression bar. TesseractConsole is a console demo.

The confidence score is between braquets. < 160 mean not bad

Version History

07JUN08: First release on Tesserect 2.03

10JUN08: Version 2.03.1. Change Confidence behavior, now it's calculated from each word letter and not from the first letter. Type change from byte to double. 0 = perfect, 100 = reject

13JUN08 : Version 2.03.2

After 3 days in Tesseract code (urgh), here is Tessnet2 version 2.03.2

The corrections deals with the following problems
* Confidence was not very useful, the value was strange. This has been corrected, setting the variable tessedit_write_ratings=true. After many test I found this mode is the best for confidence accuracy. Value range from 0 (perfect) to 255 (reject) . When value goes over 160 this really mean the OCR was bad.
* Calling DoOCR twice was not giving the same result. It was, as expected, a problem with global variables. The problem is almost fixed, sometime it doesn’t work but right now I can’t find what is not correctly reinitialized.
 

转载地址:http://qxsfl.baihongyu.com/

你可能感兴趣的文章
《大话西游》/月光宝盒/大圣取亲
查看>>
laravel创建资源路由控制器
查看>>
使用 Go 的 struct tag 来解析版本号字符串
查看>>
Objective-c——UI基础开发第十一天(UICollectionView)
查看>>
CentOS 7 搭建Jenkins+JDK+Git+Maven+Gradle持续集成系统
查看>>
yarn的 文件名、目录名或卷标语法不正确
查看>>
《C专家编程》笔记(四)——数组和指针并不相同
查看>>
最新工作环境整理遇到的一些问题。
查看>>
ip通信基础第七周(下)
查看>>
mysql5.6.38占用内存过大问题解决
查看>>
那些年学过的计算机编程语言
查看>>
MySQL----外键
查看>>
全面总结: Golang 调用 C/C++,例子式教程
查看>>
安卓手机的屏幕规格很多。app开发者在设计User Interface的时候,要怎么处理,才能适应不同屏幕大小?...
查看>>
php合并数组并保留键值的方法
查看>>
WinEdit编辑器中中文乱码
查看>>
HTTP的长连接和短连接(转)
查看>>
《Java从入门到精通》第十三章学习笔记
查看>>
洛谷 p2530 化工场装箱员(资源型)
查看>>
js数组去重
查看>>