You are here

converting PDFs to eBooks

15 posts / 0 new
Last post
euxalot
converting PDFs to eBooks

Here is useful --but hard to find -- info I learned by trial-and-error on converting PDFs to eBooks that are readable on your tablet/kindle.

1. Many tablets/kindles will display PDFs, but I want to read the eBook format since it will re-flow text on my device screen.

2. Converting PDFs in Calibre or other tools often contains many errors.

3. For PDFs that are "digital born" (ie, not compiled from scanned images), follow these steps to assemble a functioning eBook that renders well on your kindle/tablet: PDF --> Doc --> ePub --> azw3/mobi, etc.

3.1 ePub files use html codes to display, so the first step is to convert the PDF to editable text. While Calibre will do this, Adobe Acrobat does it better. Much less clean up required.

3.2 You can quickly check the Doc version to see if the table of contents is clickable/rendered as headers; if it is, great! You will have a functioning table of contents on your tablet/kindle. If not, add "headers" directly to the Word doc.

3.3 Once the Doc is ready, add it to Calibre and convert that to ePub.

3.4 Open the ePub directly in Calibre to check the conversion quality. Problem is, Calibre often omits spaces between text, like this: Calibreoftenomitsspacesbetweentext,likethis.

3.5 Now you need to perform a "replace all" text edit on the underlying html files.

3.5.1 The first step is to change the epub file extention to zip

3.5.2 uncompress the zip files

3.5.3 Find a tool that can batch edit html files. On a Mac, BBEdit works really well for this ("it doesn't suck").

3.5.4 Replace all "[/span]" with " [/span]" (that is: insert one space before the left open bracket) [edit: had to change angle bracket to square bracket since concen will think I want to write html here and hides the terms]

3.5.5 After the /span has been changed, reverse your steps: compress the folder back to a zip; change the zip file extension to epub; and open that epub on your tablet, or send it back to Calibre for conversion to kindle format

It works!

4. If you have a PDF from scanned images, the odds are low that you can extract enough text through OCR. As a result, you will need to do a LOT more clean up in the Doc format step. Indeed, downloading any ePub format from the Internet Archive is inherently problematic -- best to just download their PDFs. It really depends on the quality of the scan. But the first step will be the same: convert it to a doc and take a look at what you are dealing with. However, it takes MUCH longer to clean up text for this type of PDF.

euxalot
two other suggestions for

two other suggestions for value:

in a world where the dominant commercial actors actively want to divide our attention, the kindle team actually did something right by storing all my highlights and notes from all books in a single file on the kindle (mount it to your computer and navigate to "my comments.txt"

this is a far simpler way to keep track of your reading habits and what is important to *you*, since most ereaders store such notes in the individual files themselves (ie, no federated way to search them)

and you can go even one step further and use a service combo like Notion --> Readwise to serve your own highlights back at you via email

this means you don't need to forget the highlights you make since Readwise will send you a few of them each day or week, depending on your preferences

needless to say, if you are student, tools like these are helpful to organize your notes for classes

tutorial here:

https://www.youtube.com/watch?v=jl7LD0K25A8&t=92s

zoopenhoff
Useful

Thanks for posting this!

There are some private e-book trackers that I'm always happy to search for things, just PM me if there's something you're interested in. Often a lot easier than converting it's never really worked out well for me. I got used to reading on my laptop instead. Flux is an app that makes reading the screen more bearable.

laneigile
Hm

1. You can re-flow PDF text if you jailbreak your Kindle and install Koreader. Similar to that for older devices install Chinese Duokan OS, it's beautiful system with superior PDF reader.

2. I agree.

3. You have unnecessary step: ePub. The correct order would be PDF --> Doc --> azw3/mobi.

3.1 Try Abby Fineraeder or Nuance Omnipage.

3.2 I agree.

3.3 Use Finereader and edit the mistakes.

3.4 I agree.

3.5 I agree.

3.5.1 I agree

3.5.2 I agree

3.5.3 On Windows you can use Sigil.

3.5.4 I agree

3.5.5 I agree

4. Use Abby Finereader to create PDF from scanned images, better images less cleaning of text.

euxalot
laneigile wrote:

Hi, I'm not quite ready to jailbreak my kindle, but that is interesting info to know.

Can you share some screen grabs of the PDF reflow to show us what it looks like? Or suggest a site that faithfully captures the functionality from your perspective?

I will do some comparing of AbbyFineReader vs Acrobat and report back. I compared it years ago, and Acrobat was better at that time. Perhaps it has since improved.

laneigile wrote:

3. You have unnecessary step: ePub. The correct order would be PDF --> Doc --> azw3/mobi.

I gave the steps for fixing spacing errors, not the steps for creating a kindle file; I need to fix the ePub file since it is made of html files.

So, technically we are both correct:

To create a kindle file: PDF --> Doc --> azw3/mobi

But to fix underlying spacing issues, the flow is the one I gave: PDF --> Doc --> ePub --> unzip and batch replace in html files --> ePub and/or azw3/mobi

laneigile
Hm

All private torrent e-book trackers (except Bibliotik) collect books from public trackers. They are convenient because they keep books much longer than public trackers. And you can't find everything, that's for sure. For example, you won't find some particular title but you can find something similar to replace it. On Windows you can use Night Light instead of flux. Every android have reading mode which is pretty decent for the eyes.

zoopenhoff
Not all.

Some of teams ripping from amazon all-you-can-kindle and audible.

euxalot
zoopenhoff wrote:
zoopenhoff wrote:

Some of teams ripping from amazon all-you-can-kindle and audible.

Good to know ;>

zoopenhoff
euxalot wrote:
euxalot wrote:

zoopenhoff wrote:
Some of teams ripping from amazon all-you-can-kindle and audible.

Good to know ;>

omg now you've quoted me I can't fix my morning grammar.

Yes do let me know if there's anything you want. Sometimes it takes a few weeks but has always come through so far.

laneigile
(No subject)


euxalot
looks good, many thanks for

looks good, many thanks for sharing. Some Android PDF readers have a PDF reflow, but I never saw anything this good. That is really neat software, that appears way ahead of anything in iOS or Android.

Where are the instructions to use this jailbreak trick? I am on a Mac and last time I looked, it was mostly a Windows thing. Thanks for any advice

I did some comparison between Abbyy FineReader and Acrobat yesterday. There are pros and cons.

Pro: Abbyy FineReader produced a higher quality version epub without the spacing issues I encountered in Acrobat. This is good to know!

Con: It takes *significantly* longer time to get there than Acrobat (and including time I would spend in the Acrobat version to batch edit underlying html files --> though the final product would not be as high quality, as stated).

The time factor is not a small thing since I have so many PDFs that need conversion.

Abbyy FineReader costs $129. yikes

Does anyone still use Serial Box, SerialSeeker or iSerial? I cannot find it any more...

laneigile
The best PDF reflow which I

The best PDF reflow which I have seen on Android is WPS office mobile view.
You can find many interesting topics about Kindle jailbreaking here: https://www.mobileread.com/forums/forumdisplay.php?s=47bb639fc556dad25ff...
Recently developers started upgrading their software protection, so there is almost no software which can be activated with serials.
Ofcourse there are cracks from two reverse engineering groups but I'm not eager to use them, serials are clean and preferred way for me.
You can find serials here: https://www.macserialjunkie.com/
serialbox: https://www.torrentmac.net/serial-box-09-2021/
Finereader with crack (latest):https://www.torrentmac.net/abbyy-finereader-pdf-15-2-2/
Finereader with serial (I'm using this one):https://www.torrentmac.net/abbyy-finereader-ocr-pro-12-1-14/

euxalot
Thanks for recommending I try

Thanks for recommending I try Abbyy FineReader again. I found a working copy for Mac, so I will work with it for a while.

I may reach back to you for advice about jailbreaking my kindle. I have two, so perhaps I will try it on my older one first.

If you highlight notes in the reflow, does it store the notes in the "My comments.txt" file?

laneigile
That file is called My

That file is called My clippings on my old kindle. When I highlight note in Koreader which is third-party application, it's in Koreader's bookmarks, not in Kindle's clippings. Duokan is separate OS - it has nothing common with Kindle.

euxalot
cheers, many thanks for your

cheers, many thanks for your help, Laneigile. Greatly appreciated.

yes, "my clippings" not "my comments", you are right.

In that case, my preferred method will still be to convert pdfs to flowable ebooks since I can capture highlights in one doc file. but it has been helpful having you share your knowledge of the other tools.

Log in to post comments