DjVu-Migration

Aus djvu-wiki
Zur Navigation springen Zur Suche springen

DjVu-Migration

See also: DjVu-Viewer_Integration

1. Check Servers

Two test files are used to probe each server/image store via djvu_migrate --test:

File Pages Size Hash
AB1938_Kreis-Beckum_Inhaltsverz.djvu 3 50 KB c/c7
Auenheim-Frauweiler_Dokument-1693-03-09.djvu 10 3.9 MB b/b8

djvu_migrate --test --write updates ~/.djvuviewer/server_config.yaml (under RCS).

Findings:

Server Image FS total migrated target Stat (ms) Read (ms) Speed (MB/s) djvudump (ms)
qn genwiki CIFS ~4000 0 no 11.64 1.32 37.9 24
fixit gruff SAMBA ~4000 0 no 3.26 0.15 0.7 17
fur gruff SSD ~4000 0 no 2.60 0.007 14.7 1.6
fur yuyu SSD ~4000 0 no 0.025 4.1 1.4
fur luxio HD ~4000 ~4000 no 2.60 3.88 12.8 1.7
fur entei HD ~4000 0 no 2.60 0.010 10.1 1.7
genwiki39 genwiki CIFS ~4000 0 no 66.79 0.34 148.4 9
hostsharing genwiki SSD yes 0.004 2.0 1500 6
  • fur/gruff is an SSD copy of the original unbundled files — use it to count total/migrated for all slow disks
  • fur/luxio is the bundled source — counts come from djvu_migrate --info (DjVu SQLite)
  • Only hostsharing is target: true — the only server we write to

2. Decide Migration Path

  • Source: fur/luxio — fully bundled, fast HD, timestamps corrected to original
  • Target: hostsharing/genwiki — SSD 1500 MB/s, target: true
  • Transfer speed fur→hostsharing: 2.4 MB/s (measured)

DB state from valid_bundled_stats query:

total valid_bundled bundled_no_size (tainted) not_bundled valid_gb est. transfer time
4288 2178 48 2062 10.46 GB ~73 min
  • Strategy: agile — migrate valid_bundled ordered small→large, learn from early results
  • The 48 tainted (bundled, no filesize) need separate investigation
  • The 2062 not_bundled are future work

3. Query Three Sources

All sources queried via named parameterized queries (YAML). Run via djvu_migrate --info.

Source Table Content
DjVu SQLite djvu Conversion status, page count, bundled yes/no
MediaWiki API cache mw_images Images known to the wiki
MariaDB genwiki39 wiki Wiki pages and DjVu file list

4. Verify Expected State

Compare what the three sources report against each other:

  • files in djvu but not in mw_images → not known to wiki
  • files in mw_images but not in djvu → not yet converted/indexed
  • files in wiki MariaDB → cross-check with both above

5. Copy Files

rsync -a from fur/luxio to hostsharing/genwiki, preserving timestamps. Only bundled files. Only target is writable.

6. Create Hard-Links

Hard-links from djvu-wiki → genwiki39e on hostsharing — no extra storage needed.

7. Create Wiki Pages

Create wiki pages in djvu-wiki using Vorlage:DjVuViewer and Vorlage:GOV.

8. Statistics

Output per batch: success / errors / skipped — with causes.