Installing Tesseract OCR 4.0 on CentOS 6

xkcd: compiling - https://xkcd.com/303/Tesseract OCR package is available for CentOS 6 via EPEL yum repository, but unfortunately, at the time of writing this article, the latest available Tesseract version in EPEL is 3.0.4.

Installing Tesseract 4.0 from source is possible, but with some extra effort as CentOS 6 doesn't come with Leptonica 1.77, which is required by Tesseract 4.0, nor it comes with autoconf-archive package (which was orphaned in EPEL), nor it comes with GCC that supports C++11.

So far, things don't look promising but rest assured, it's not the end of the world Smile

Firstly, install development tools and a couple of prerequisites for Tesseract.

yum -y groupinstall "development tools"
yum -y install libpng-devel libtiff-devel libjpeg-devel

Next, install CentOS Software Collections yum repository and newer version of GCC. Don't worry, GCC from SCL repository won't interfere with GCC from base repository.

yum -y install centos-release-scl
yum -y install devtoolset-7-gcc-c++

In order to reach newly installed GCC, you simply need to source devtoolset-7 script

source /opt/rh/devtoolset-7/enable

Next up for installation is autoconf-archive, a collection of re-usable Autoconf macros:

cd /usr/src/
wget http://ftpmirror.gnu.org/autoconf-archive/autoconf-archive-2019.01.06.tar.xz
tar xvvfJ autoconf-archive-2019.01.06.tar.xz
cd autoconf-archive-2019.01.06/
./configure --prefix=/usr
make
make install

Now we can move to Leptonica installation. Tesseract 4.0 requires Leptonica 1.77 or newer.

cd /usr/src/
wget http://leptonica.org/source/leptonica-1.77.0.tar.gz
tar xvvfz leptonica-1.77.0.tar.gz
cd leptonica-1.77.0/
./configure --prefix=/usr/local/
make
make install

At this point, all system requirements are satisfied. We can finally install Tesseract OCR:

cd /usr/src/
wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0.tar.gz -O tesseract-4.0.0.tar.gz
tar xvvfz tesseract-4.0.0.tar.gz
cd tesseract-4.0.0
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
./autogen.sh
./configure --prefix=/usr/local/ --with-extra-libraries=/usr/local/lib/
make install

There you go. You now have Tesseract 4.0 on CentOS 6.

[root@localhost ~]# tesseract --version
tesseract 4.0.0
leptonica-1.77.0
libjpeg 6b (libjpeg-turbo 1.2.1) : libpng 1.2.49 : libtiff 3.9.4 : zlib 1.2.3

If you're getting -bash: tesseract: command not found error, you most probably don't have /usr/local/bin in your $PATH, so make sure to fix that by adding (or appending to existing configuration) the following to your ~/.bash_profile:

export PATH="$PATH:/usr/local/bin"
Category: 

2 comments

10
Feb

very good job bro, need small fix tar xvvfz tesseract-ocr-4.0.0.tar.gz replace this tar xvvfz tesseract-4.0.0.tar.gz and after run cd commnad cd tesseract-4.0.0 and everything else is the same extra info: I'm tested centos 7, everything ok, it works
10
Feb

Hi Faruk. Good catch. I've updated the article. Cheers mate!

Add new comment