Thank you for your interest in preserving our common litterary heritage. You are acting out of love and are a hero, no matter what the chuds of the publishing (((industry))) might say.
The process of digitalizing a book is fivefold:
- Procuring a copy of the book (or a trustworthy file for ebooks)
- Scanning it with a camera or flatbed scanner (or removing any DRMs and storing it in a perenne file format for ebooks)
- Post processing the scans, that is
- Splitting the images if two pages were scanned together
- Renaming the individual pages
- Cropping them
- Dewrapping and exposing them correctly (some people might prefer to have unedited views of the pages)
- Merging the pages in a DjVu or PDF file
- Publishing the file in a resilient manner
- Transcripting the pictures in searchable text with OCR, proofreading and maybe a new digital edition with LaTex
If you have a smartphone, the CamScanner app is recommended! It will automatically crop and color-correct the pages, the results are decent! There are ways to have an unlimited free trial and the watermarks on the demo version are not really an issue!
Some parts of this process have a debatable value as they may alter the layout of the original book, making it hard to use the digital edition for referencing. Even programs like ScanTailor modify the original images quite a lot and do not produce an “archival quality” file.
The approach taken by archive.org is little post processing and only an image view. On the other hand, wikisource and gutenberg often don’t link the edition that their ebooks are based on and alter the original layout quite a lot.
Our current editorial choice is:
- An ebook is not enough (no pagination or proof of the original content) but considering the amount of books that aren’t even available as ebooks, it makes this item lower priority.
- A scan is a weak proof of original content but is enough for our purposes, it is not very convenient to use however, especially if it is a scan made with a cell-phone.
- A digital edition isn’t enough in itself (as the original layout and pagination aren’t preserved) but is very convenient and easy to print. Our current choice of layout prioritizes a lot of text on few pages, for archival purposes. One also needs to take into consideration the time needed to proofread pages (up to 10 minutes / page)
- We ought to keep original copies to avoid corruption in the future, lest we end up with a society like the one described in 1984 where people throw away any piece of paper after reading it to avoid reliable memory of facts about the past.
The source files for our editions will soon be available for people who want to get inspired by them. You can download our typical back-cover here