Book

TODO

I need to borrow some library books, which haven't been borrowed for many many years, to make sure they don't trash them!

Example of troubleshooting a problem

Send me feedback about this section

(2025-03-31)

A few days ago, the post-processing step came up with an error - it was attempting out-of-bounds access on an array of blockquotes during my optimisation step in which adjacent blockquotes were merged together subject to certain conditions.

It was off by 1, and it occurred around the time that I added an example citation before the main content region, so I assumed that my code somehow assumed all citations would occur in the main content region. I removed the example citation, instead opting to mimic a citation using CSS rules (which is better for an example) - but not only did this not fix it, this actually introduced another problem: the CSS.

Recovering from my SSD failure

Send me feedback about this section

Due to personal stupidity, I only backed up every month. Thus I lost a month of work on my book.

Except I had build 'artifacts' (Markdown, HTML and metadata files), only a few days old, from which I could theoretically reverse-engineer my book's source code (RMarkdown).

And so began what amounted to about 20 hours' work - maybe 3 hours writing and perfecting and testing a script to extract the data, and 17 hours manually going through the data to add it to my source code.

The main problem is that my source code is not in one strict standard. It is compiled through R, and thus my diff script needs to transform my source code in an R-like way, to produce output that has a standard order, standard function calls, and discards metadata.

For example, newspaper_citation(“New York Times”, pages=list(3,4)) needs to go to citation(author=“New York Times”, pages=“3, 4”) - discarding information so that it is standardised in the same way that the 'reverse-engineered' output is structured.

It is surprisingly difficult to script a `diff` tool

Over 175,000 lines that I had to go through manually...

My source code, of course, is split over dozens of different files, in over a dozen folders, like a tree. Getting the script to recognise changes-of-file - especially when the changes often involved moving paragraphs from one file to another - was a huge pain.

Low-priority metadata was discarded, because of the complexity it would introduce into the script - I'd essentially have to parse the HTML in addition to the Markdown if I wanted to recover these kinds of metadatas.

Publishing

Send me feedback about this section

I'm very aware that publishing content is swimming in the same ocean as billions of AI-created or ghostwritten slop.

If newspapers can't convince people to pay them instead of consuming fake news, how can an unknown author convince people to pay him to read a boring old book?

Credentials

Everyone I've seen talk about it has said 'long gone are the days when quality was the only thing publishers cared about'. Now, authors are expected to bring an audience with them.

Obviously I'm not famous (no audience) nor highly credentialed in this field.

I think it would be easiest to get published first in an academic journal, and then to leverage that as credentials for publishing a book in the same field.

Journals probably prioritise quality above credentialism, at least more so than book publishers. Moreover, normal people more strongly believe in the quality of journals than they perhaps deserve - saying that my work is published in several journals would sound more impressive than saying I've had a book published, despite non-fiction books in this field being far harder to produce.

Journals are a bit hit or miss, apparently. And to be honest I disapprove of their business practice - authors write for free while readers pay the publishers!

Sources

I can find a professor who says anything that I want to 'prove'. That doesn't mean being a professor isn't valuable.

Many years ago, I remember a famous journalist - Channel 4 or BBC I think - who wrote a book about Victorian-era death sentences, and she got totally embarrassed in a radio interview where the interviewer pointed out that none of the people she wrote about had actually been executed.

Everyone mocked her for it. I've forgotten the details, but her mistake was actually understandable - the original records said something like 'he was executed', but apparently in official terms that actually meant 'he was released from prison'. It was such a ridiculously unintuitive terminology - only an expert would know to check this kind of thing, because to non-experts the phrasing looked so clear-cut that it was an execution.

When reading about economics, I came across 'marginals'. What the hell is that? Economists seem to use it to mean derivatives - the rate of change of something, relative to something else.

Grammar

To minimise my word count, and remain precise, my grammar can sound a bit awkard in places. For example (paraphrasing):

Entity A waged a long campaign against him, forcing Entity B to eat his cheese, plaster his bedroom, and placed him on a rusty chair.

It doesn't look like it is grammatically correct - but it is! Note that it is Entity B that places him on the rusty chair, while Entity A is doing the other two actions.

Nonetheless, I resolved to resolve the confusion by grouping by entity instead of by order of time:

Entity A waged a long campaign against him, placing him on a rusty chair and forcing Entity B to eat his cheese and plaster his bedroom.

Mindset: Book vs Web

While learning professional typsetting, and testing how to structure my book, it occurred to me that I would have to split up my book into multiple books, so that each could be more focused.

I had designed my web content purposefully to maximise the chance that it would be shared - meaning there was no visible overarching narrative, that readers would almost read whatever narrative they wanted into the text. All content categorised and sorted into over 100 categories and different orders - designed to appeal to contradictory opinions by confirming their 'priors'.

A book requires direction. Readers read it linearly - it has to build a case while maintaining interest throughout. It can't simply be a list of things - forgettable - because although this works fine on the web (low barrier to sharing - people share articles like this on social media based on quick fleeting emotions) it doesn't work at all for books. Book readers think too highly of themselves for this - a book has to make them feel like they can show off how smart they are to other people, which means it has to have a 'clever idea' that it 'proves'.

As a Web App

I don't expect anyone to pay anything to read anything I've written. I'm obsessive, not delusional.

Thus the main intended publication is a freely-viewable, downloadable, public HTML file. Specifically, a single-page interactive web app (PWA) that is much more intuitive to read than a book.

I want to make it as easy as possible to view my sources, to have different perspectives on the same data (e.g. events on a map, or on a timeline or around key people).

Ink-on-paper is an extremely limited medium, suited for linear reading, but the TikTok generation probably lacks the attention span to actually read through a whole book like that. That is genuinely the main reason for including all this additional useful things - pop-up excerpts, map highlights, people relations, etc - it's all to break up the text and create visual elements to keep the reader engaged in multiple ways.

Pay Attention!

I think bullet-point lists are like crack cocaine for people. People go head-over-heels for 'news' articles about the '10 best X' or '10 craziest Y' or whatever - because it gives them power to decide how much attention to pay to which parts.

The most important points should stand out. Unimportant parts should be easy for the reader's eyes to skip, if they choose to. Readers should not be forced to parse all the text of every paragraph in a book.

Previously, I considered using a lighter colour font to mark the 'less important' words in the book - but since my paragraphs are quite short I don't think I'd need to do that. The 'important' text already stands out, as it is usually within 'action buttons' (for links to sources, maps and people).

Beware Copyright

Much of my recent effort has been to trim down my work - especially excerpts that I have included from newspapers. The rule of thumb is not to include more than 5% of a newspaper article when citing it.

I'm extremely aggressive in paraphrasing articles to trim down the size of the book, and this has the side-effect of transforming the quotes to be extremely short and less copyrightable.

(TODO: Show some before-and-after git diffs)

As Paperback

I don't expect anyone to pay to read my books. But I do want several physical copies, just so that I can refer to them in person if my book comes up in conversation.

Really, the physical book should only exist to drive traffic to the Web App 'book'. The physical format is too limited - it loses a lot of its usefulness if the reader is no longer able to 'zap around' to each reference or between each related section.

Margins

It's such a simple thing, but it's difficult to find information about what margins you should use.

For example, the default settings in LaTeX, and the default in most paperbacks that I've seen, use larger outer margins than inner margins, and on StackExchange people have mentioned that 'common knowledge' guidelines such as 'the outher margin should be roughly 150% of the inner margin'.

But from my own experience reading paperback books, the outer margin should be thinner than the inner margin.

This question and its answers finally answered a lot of my questions:

The author's publisher advised him that his inner margins should be at least as large as his outer margins
The author demonstrates that this goes against classical design principles
Answers mention:
- “usually, the bottom margin is a bit wider to balance things out visually [and leave] space for the reader['s fingers]”
- “Larger outer margins are used to offset the visual appearance of creep ... the slow outward movement of content due to the gutter and binding ... (which you can calculate).”

There's a few other things too that I've seen around the internet - like that publishers increase the inner margins towards the end of the book, to account for more of the inner margin being 'eaten up' by the spine (or something like that).

Typesetting

Here's a resource explaining why typography is important.

Since I have everything already transpiled into Markdown, writing it into Word would be simply unacceptable. Maybe there is a tool that can convert Markdown or PDF or HTML into Word format - but there's no way my computer wouldn't crash opening an 800-page (or whatever it would be) heavily-formatted file in Word.

Typesetting software:

LaTeX
- It has powerful book-publishing libraries
Typst
- Modern alternative to LaTeX
- My concern is that all I have seen of it, thus far, is pretty PDFs. I haven't seen any proof that it can create good-looking print exports - e.g. that it manages different book margins for the bindings on alternate pages. Edit: Having been learning Typst for an hour, this is actually easily-solved - either through the 'hyrda' addon, or with in-built variables such as if calc.odd(here().page()) { ... }.
- But it is available as a single executable, which is far more easily-usable than LaTeX (see here for why this matters)
- It was created, and is maintained, by a company with only 2 full-time employees (both alumni of Berlin University). Compared to LaTeX - which is run by Adobe - it is shocking they've managed to create something so capable, extensible and learnable.
- Here's a collection of useful links.
- Here is an example Typst paper I wrote, and here is the resulting PDF
ConTeXt
Quarto
- Input is Markdown
- Exports to web, print and presentations (including PowerPoint or reveal.js)
- Allows for inline interactive plots
Beamer

The crazy thing is, 'normal' people are so far removed from LaTeX-type software that even book specialist websites don't mention any of these. Look at this 'science of typesetting' guide by a subsidiary of selfpublishing.com - the software it recommends for 'DIY typesetting' have none of the real features of LaTeX:

Adobe InDesign, which she (the author) claims is “the Swiss Army knife of typesetting tools”; it costs $23/month
Affinity Publisher, a slightly-cheaper InDesign clone
Vellum, an expensive InDesign clone for the Apple ecosystem
Microsoft Publisher, part of Microsoft Office

Like some people can sniff out AI-generated content, perhaps it is possible to sniff out 'non-engineer' writers, because these kinds of phrases are so common in their writing but nobody with an 'engineer-brain' writes like this:

“In today’s digital age, having the right software can make a big difference in how your book looks”
“Choosing between DIY typesetting and using a professional involves balancing costs and expectations.”

This link compares LaTeX, Typst and ConTeXt. Typst is the clear winner for people who don't need to publish works in journals who might require LaTeX source code.

Word

Typesetting is not something most authors will deal with - most publishers will use InDesign for the final print, but before that, most editors will expect to use Word's versioning system to coordinate edits with authors.

This versioning system makes it difficult to avoid using Word:

Build: makefiles and some homebrew perl scripts FTW. Type “make”, check out latest draft and generate HTML, RTF and (eventually, via an external toolchain) Word files I can ship to my editors.
Back in those days, copy edits showed up as a bunch of paper print-outs with red ink on them and you mailed them back to production after you added your own chicken scratchings. (If you were smart you scanned/photocopied the pile first, for insurance: this saved my ass on two occasions when CEMs went missing in the post ...). Page proofs ditto. ...

Then the publishers began moving to Microsoft Word tracked changes for processing copy-edits, and annotated PDFs for the page proofs, and it was all over ... Word tracked changes suck, but trying to check the changes on a large document in a third party word processor like LibreOffice sucks even harder ([due to bugs] that nobody triggered before ...), forcing me into Word for the post-writing workflow. And then a Better Way came along for writing books in the shape of Scrivener, for which there is nothing remotely like an open source equivalent

...:

if you have an editor who's willing to not use Word, you can use a tool like Authorea to track changes, and then incorporate those back into the markdown source.

Word
- Everyone has probably used Word at some point
- It has been slowly introducing important features such as automatic reference numbering, but it still lacks many of the most basic typesetting features
- The pain of using Word increases exponentially with the number of pages
  - You can waste hours trying to get spaces, line breaks, font sizes etc to work with whatever tables or figures you want to display
  - Then changing a word in a chapter can mess up your positioning and you have to do it all over again
- It doesn't work well with version control, because it is not stored as plain text
- Word 365 is a lot worse than older versions of Word used to be - for example, image anchors are no longer reliable - and this is probably because it is overly-eager to anticipate what users want

Cover Art

With AI images, you'd expect this to be easy:

Generate images until you find a few that you like
Upscale them
- Amazon suggests 1600 x 2560 pixels, which is around 300 DPI for an A5 sized page
Pick the upscaled image that looks best

The problem is that upscaling is only good for certain types of images. It is great for faces, landscapes, stock photos, pretty much anything that it has a lot of data for. It is not so good at upscaling completely novel things.

For example, here's what I want to upscale, from Dalle3's native 1024 x 1024 resolution:

It requires upscaling it by a factor of 2.5. I upscaled it using several free tools, such as ImageUpscalerAI.com, and the results were very impressive - but the problem was that the bokeh blur looked completely unrealistic when upscaled.

It's maybe not too noticeable, but I might instead just recreate this scene in Blender and render the scene. That would probably take a few hours (finding the right textures, surface distortion maps, camera angles, and other settings). Maybe it's not worth doing properly.

Choosing A Publisher

Should I self-publish on Amazon? Or look for an indie publisher (who might have low reputation)? Or try for a big-name publisher (who probably won't even bother skimming through my book)? Or truly self-publish?

Other peoples' experiences:

Amazon banned me for no reason

Print-On-Demand

Amazon

Lulu

Ingramspark

Don't Be Sloppy

Amazon's print-on-demand books seem to be generally low quality. Is this because self-published authors make mistakes (bad fonts, small font size, incorrect margins)?

Blurry covers are presumably caused by self-published authors' ignorance (not preserving aspect ratios, stretching images, using low-resolution images).

Cost

Amazon's KDP costs £8.85 per book for 800 pages. The shipping is probably around £3 for individual copies, or £30 in bulk.

So, I can spend £12 to deliver a single prototype book. That's amazing. That's cheap enough to iterative improvements (prototype -> give to someone -> collect feedback -> edit -> repeat). Cheap enough to also do some A/B testing for different designs/layouts/etc.

An 800-page book would be quite insane, I think. I should probably split it into multiple smaller books, either covering different time periods or continents. Or both - because who really cares if multiple books have overlapping content?

Editor

I could hire an editor. £20/hr for a student to read and offer feedback? I don't know how to identify a good editor though.

Quality

Bible paper (called 'onion skin' paper) is thin but durable - much higher quality than you'd get from print-on-demand services.

If I really, really care about print quality, I'd probably have to ask book designers/publishers for recommendations on a local offset printer company to print a batch of books off.

ISBNs

If you self-publish on Amazon through KDP, you get an 'Amazon ISBN'. But what is that?

I don't know much about ISBNs, but looking at the price of ISBNs (1 for ~$125, 1000 for ~$1250), I assume they have the same problem that IPv4 has - we optimistically handed out large blocks to the original 'big players' who now monopolise it while we switch to a larger ISBN format.

Apparently:

ISBNs can be re-used
ISBNs are not required for most sales; in America, libraries prefer Library of Congress Control Numbers and this

Guides

YCombinator discussions:

As eBook

Leanpub: Probably good for self-publishing, if I make it “pay what you want” instead of requiring payment. It also seems to allow readers to download permanent copies of books they own.

Gumroad: ?

A Gray

Book

TODO

Example of troubleshooting a problem

Recovering from my SSD failure

Publishing

Credentials

Sources

Grammar

Mindset: Book vs Web

As a Web App

Pay Attention!

Beware Copyright

As Paperback

Margins

Typesetting

Word

Cover Art

Choosing A Publisher

Print-On-Demand

Amazon

Lulu

Ingramspark

Don't Be Sloppy

Cost

Editor

Quality

ISBNs

Guides

As eBook