Internals

Data structure

Virtualpaper stores all files in a single directory that is configured with setting processing.output_dir.

Original documents are stored in a three-level hierarchical tree where the path to documents is identified by their ids. Virtualpaper uses UUIDv4 as document identifiers and each id is split into two directories, followed by the rest of the id. For instance, a document id: d669c348-463d-4025-a290-49bfe65c5287 is located in

# original document
<data-dir>/documents/d/6/69c348-463d-4025-a290-49bfe65c5287

# thumbnail
<data-dir>/previes/d/6/69c348-463d-4025-a290-49bfe65c5287.png

Aside from original documents and previews, the rest of the data is located in a PostgreSql database. Meilisearch acts a secondary storage containing the search indexi. While it may be beneficial to backup the Meilisearch data, especially because it might make it faster to restore backup, it is considered optional to backup the Meilisearch instance. Virtualpaper can always re-index the document data into a fresh Meilisearch instance from the Admin UI.

Extracting document ids and paths from database

Sql script for getting all files and their paths

SELECT 
    id, 
    name, 
    substring(id,1,1)||'/'||substring(id,2,1)||'/'||substring(id,3,100) AS path,
    mimetype
FROM documents;

To show only for single user, use:

SELECT 
    id, 
    name, 
    substring(id,1,1)||'/'||substring(id,2,1)||'/'||substring(id,3,100) AS path,
    mimetype
FROM documents
WHERE user_id=<numerical-user-id>;