Virtualpaper can be run as manually as a system daemon.
Refer to Dockerfile
and docker-compose.yml
for sample configurations.
Virtualpaper needs a running Postgresql database and a Meilisearch server. Make sure the database has ‘utf8’ encoding. New database can be created with command:
psql > CREATE DATABASE virtualpaper WITH ENCODING='utf8' TEMPLATE template0;
Meilisearch server can be shared with other processes too, but Virtualpaper uses an index for each user, namespaced as ‘virtualpaper-{userid}’. Meilisearch has a hard limit of 200 indices, meaning that Virtualpaper is able to handle maximum of 200 users.
Virtualpaper uses libraries / other binaries for processing the documents. List of required and recommended binaries:
Depending on distribution Imagemagick likely requires modifying its policies via the /etc/ImageMagick-7/policy.xml
.
The Virtualpaper repository contains reference policy at docker/imagemagick-7-policy.xml
Tesseract is the program used for OCR:ing pdf and image files. For optimal results it needs a language pack for the specific languages that the documents are primarily. Many distributions ship these packages as e.g. package ’tesseract-ocr-fin’ but they are also available in https://github.com/tesseract-ocr/tessdata.