Toolchain for converting LaTeX Book documents to ebook formats
A toolchain using free/open source tools to generate both PDF and ebooks
(.mobi and soon .epub) from a single set of LaTeX sources.
We used this toolchain to create the hardcopy and Kindle editions of our
textbook Engineering Long-Lasting Software,
along with additional design elements and macros to give our book its
unique appearance.
This brief guide assumes you are very comfortable with LaTeX and the
Unix development environment (using Makefiles to manage complex builds,
etc.) and will teach you nothing about those things. If you haven’t
used LaTeX before, writing in LaTeX is more like programming than like
authoring. If that scares you, this bundle’s not for you. If you’re
not comfortable running Makefile-based builds, good luck.
There is no support. Seriously. None. Pull requests are welcome since
many improvements are needed, but sadly I just don’t have the time to
teach people to use this. It has lots of moving parts and things that
can break. When better ebook authoring tools come around I’ll be happy
to switch!
The two major ebook file formats today are Mobipocket (.prc) and ePub
(.epub). Amazon bought Mobipocket, added their own DRM to Mobipocket
format, and rebranded it as “Kindle format” (.azw files); ePub is an
open standard that allows for but doesn’t require DRM. Amazon has since
extended its format to add more support for HTML positioning, wrapping,
etc; this is the new KF8 (Kindle Format 8), which is essentially a
proprietary extension to Mobipocket that takes advantage of the
renderers on the Kindle Fire and Kindle reading apps. Currently, this
toolchain doesn’t have support for the new KF8 features.
Both formats are based on HTML markup for text and embedded assets
(images, etc). tex4ht was designed to output HTML from LaTeX documents.
However, because of differences between “plain old” HTML 5 and the
individual formats, limitations/quirks of tex4ht, and limitations/quirks
of the rendering software on ebook readers, substantial surgery is
needed on the output of tex4ht, and some care is needed in your LaTeX
authoring. The ‘mobi_postprocess.rb’ and ‘html_postprocess.rb’ scripts
perform this surgery using the powerful Nokogiri XML library as a base.
The toolchain works exclusively with the LaTeX “book” document class.
There are some extensions and some limitations on what you can do.
In general, every logical type of document element—chapter header,
figure, sidebar, code file listing, etc.—must be wrapped in its own
LaTeX macro, because the output instructions for ebook generation may
differ substantially from the instructions for PDF output.
If you stick to the book elements described below, you should be able to
use everything as-is. If you want to customize/add behaviors, see
Customizing at the end of this README.
A Unix-like system (Mac OS X is fine, but you need to have Unix-fu) and
full installs of the following:
IMPORTANT: The .mobi ebook file WILL NOT BUILD unless you at least have
dummy values/files for all of the above.
If you add your book content according to the following structure, you
won’t need any Makefile changes. If you follow your own structure,
you’ll need to make substantial changes to the Makefile and to
common.tex. Unless you want to burn a lot of time on this, do it my
way.
Add your book chapters, each in its own subdirectory, organized as
follows for a chapter called mychap:
ch_mychap/
ch_mychap/mychap.tex - toplevel file for that chapter
ch_mychap/figs/ - figures (.pdf files ONLY–see below)
ch_mychap/tables/ - tables (usually just .tex files)
ch_mychap/code/ - tables (usually just .tex files)
IMPORTANT: you should include ALL the subdirs figs, tables, code for
each chapter, even if empty, or some Makefile rules will break!