Tesseract OCR package is available for CentOS 6 via EPEL yum repository, but unfortunately, at the time of writing this article, the latest available Tesseract version in EPEL is 3.0.4.
Installing Tesseract 4.0 from source is possible, but with some extra effort as CentOS 6 doesn't come with Leptonica 1.77, which is required by Tesseract 4.0, nor it comes with autoconf-archive package (which was orphaned in EPEL), nor it comes with GCC that supports C++11.
So far, things don't look promising but rest assured, it's not the end of the world
Firstly, install development tools and a couple of prerequisites for Tesseract.
yum -y groupinstall "development tools" yum -y install libpng-devel libtiff-devel libjpeg-devel
Next, install CentOS Software Collections yum repository and newer version of GCC. Don't worry, GCC from SCL repository won't interfere with GCC from base repository.
yum -y install centos-release-scl yum -y install devtoolset-7-gcc-c++
In order to reach newly installed GCC, you simply need to source devtoolset-7 script
source /opt/rh/devtoolset-7/enable
Next up for installation is autoconf-archive, a collection of re-usable Autoconf macros:
cd /usr/src/ wget http://ftpmirror.gnu.org/autoconf-archive/autoconf-archive-2019.01.06.tar.xz tar xvvfJ autoconf-archive-2019.01.06.tar.xz cd autoconf-archive-2019.01.06/ ./configure --prefix=/usr make make install
Now we can move to Leptonica installation. Tesseract 4.0 requires Leptonica 1.77 or newer.
cd /usr/src/ wget http://leptonica.org/source/leptonica-1.77.0.tar.gz tar xvvfz leptonica-1.77.0.tar.gz cd leptonica-1.77.0/ ./configure --prefix=/usr/local/ make make install
At this point, all system requirements are satisfied. We can finally install Tesseract OCR:
cd /usr/src/ wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0.tar.gz -O tesseract-4.0.0.tar.gz tar xvvfz tesseract-4.0.0.tar.gz cd tesseract-4.0.0 export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh ./configure --prefix=/usr/local/ --with-extra-libraries=/usr/local/lib/ make install
There you go. You now have Tesseract 4.0 on CentOS 6.
[root@localhost ~]# tesseract --version tesseract 4.0.0 leptonica-1.77.0 libjpeg 6b (libjpeg-turbo 1.2.1) : libpng 1.2.49 : libtiff 3.9.4 : zlib 1.2.3
If you're getting -bash: tesseract: command not found error, you most probably don't have /usr/local/bin in your $PATH, so make sure to fix that by adding (or appending to existing configuration) the following to your ~/.bash_profile:
export PATH="$PATH:/usr/local/bin"
2 comments
very good job bro, need small
Submitted by faruk yetkin (not verified) on 10. February 2019 - 18:38Re: very good job bro, need small
Submitted by Sasa Tekovic on 10. February 2019 - 19:12Hi Faruk. Good catch. I've updated the article. Cheers mate!
Add new comment