I need to borrow some library books, which haven't been borrowed for many many years, to make sure they don't trash them!
Due to personal stupidity, I only backed up every month. Thus I lost a month of work on my book.
Except I had build 'artifacts' (Markdown, HTML and metadata files), only a few days old, from which I could theoretically reverse-engineer my book's source code (RMarkdown).
And so began what amounted to about 20 hours' work - maybe 3 hours writing and perfecting and testing a script to extract the data, and 17 hours manually going through the data to add it to my source code.
The main problem is that my source code is not in one strict standard. It is compiled through R, and thus my diff
script needs to transform my source code in an R-like way, to produce output that has a standard order, standard function calls, and discards metadata.
For example, newspaper_citation(“New York Times”, pages=list(3,4))
needs to go to citation(author=“New York Times”, pages=“3, 4”)
- discarding information so that it is standardised in the same way that the 'reverse-engineered' output is structured.
My source code, of course, is split over dozens of different files, in over a dozen folders, like a tree. Getting the script to recognise changes-of-file - especially when the changes often involved moving paragraphs from one file to another - was a huge pain.
I'm very aware that publishing content is swimming in the same ocean as billions of AI-created or ghostwritten slop.
If newspapers can't convince people to pay them instead of consuming fake news, how can an unknown author convince people to pay him to read a boring old book?
While learning professional typsetting, and testing how to structure my book, it occurred to me that I would have to split up my book into multiple books, so that each could be more focused.
I had designed my web content purposefully to maximise the chance that it would be shared - meaning there was no visible overarching narrative, that readers would almost read whatever narrative they wanted into the text. All content categorised and sorted into over 100 categories and different orders - designed to appeal to contradictory opinions by confirming their 'priors'.
A book requires direction. Readers read it linearly - it has to build a case while maintaining interest throughout. It can't simply be a list of things - forgettable - because although this works fine on the web (low barrier to sharing - people share articles like this on social media based on quick fleeting emotions) it doesn't work at all for books. Book readers think too highly of themselves for this - a book has to make them feel like they can show off how smart they are to other people, which means it has to have a 'clever idea' that it 'proves'.
I don't expect anyone to pay anything to read anything I've written. I'm obsessive, not delusional.
Thus the main intended publication is a freely-viewable, downloadable, public HTML file. Specifically, a single-page interactive web app (PWA) that is much more intuitive to read than a book.
I want to make it as easy as possible to view my sources, to have different perspectives on the same data (e.g. events on a map, or on a timeline or around key people).
Ink-on-paper is an extremely limited medium, suited for linear reading, but the TikTok generation probably lacks the attention span to actually read through a whole book like that. That is genuinely the main reason for including all this additional useful things - pop-up excerpts, map highlights, people relations, etc - it's all to break up the text and create visual elements to keep the reader engaged in multiple ways.
I think bullet-point lists are like crack cocaine for people. People go head-over-heels for 'news' articles about the '10 best X' or '10 craziest Y' or whatever - because it gives them power to decide how much attention to pay to which parts.
The most important points should stand out. Unimportant parts should be easy for the reader's eyes to skip, if they choose to. Readers should not be forced to parse all the text of every paragraph in a book.
Previously, I considered using a lighter colour font to mark the 'less important' words in the book - but since my paragraphs are quite short I don't think I'd need to do that. The 'important' text already stands out, as it is usually within 'action buttons' (for links to sources, maps and people).
Much of my recent effort has been to trim down my work - especially excerpts that I have included from newspapers. The rule of thumb is not to include more than 5% of a newspaper article when citing it.
I'm extremely aggressive in paraphrasing articles to trim down the size of the book, and this has the side-effect of transforming the quotes to be extremely short and less copyrightable.
(TODO: Show some before-and-after git diffs)
I don't expect anyone to pay to read my books. But I do want several physical copies, just so that I can refer to them in person if my book comes up in conversation.
Really, the physical book should only exist to drive traffic to the Web App 'book'. The physical format is too limited - it loses a lot of its usefulness if the reader is no longer able to 'zap around' to each reference or between each related section.
It's such a simple thing, but it's difficult to find information about what margins you should use.
For example, the default settings in LaTeX, and the default in most paperbacks that I've seen, use larger outer margins than inner margins, and on StackExchange people have mentioned that 'common knowledge' guidelines such as 'the outher margin should be roughly 150% of the inner margin'.
But from my own experience reading paperback books, the outer margin should be thinner than the inner margin.
This question and its answers finally answered a lot of my questions:
There's a few other things too that I've seen around the internet - like that publishers increase the inner margins towards the end of the book, to account for more of the inner margin being 'eaten up' by the spine (or something like that).
Here's a resource explaining why typography is important.
Since I have everything already transpiled into Markdown, writing it into Word would be simply unacceptable. Maybe there is a tool that can convert Markdown or PDF or HTML into Word format - but there's no way my computer wouldn't crash opening an 800-page (or whatever it would be) heavily-formatted file in Word.
Typesetting software:
if calc.odd(here().page()) { ... }
.The crazy thing is, 'normal' people are so far removed from LaTeX-type software that even book specialist websites don't mention any of these. Look at this 'science of typesetting' guide by a subsidiary of selfpublishing.com - the software it recommends for 'DIY typesetting' have none of the real features of LaTeX:
Like some people can sniff out AI-generated content, perhaps it is possible to sniff out 'non-engineer' writers, because these kinds of phrases are so common in their writing but nobody with an 'engineer-brain' writes like this:
This link compares LaTeX, Typst and ConTeXt. Typst is the clear winner for people who don't need to publish works in journals who might require LaTeX source code.
Typesetting is not something most authors will deal with - most publishers will use InDesign for the final print, but before that, most editors will expect to use Word's versioning system to coordinate edits with authors.
This versioning system makes it difficult to avoid using Word:
Build: makefiles and some homebrew perl scripts FTW. Type “make”, check out latest draft and generate HTML, RTF and (eventually, via an external toolchain) Word files I can ship to my editors.Back in those days, copy edits showed up as a bunch of paper print-outs with red ink on them and you mailed them back to production after you added your own chicken scratchings. (If you were smart you scanned/photocopied the pile first, for insurance: this saved my ass on two occasions when CEMs went missing in the post ...). Page proofs ditto. ...
Then the publishers began moving to Microsoft Word tracked changes for processing copy-edits, and annotated PDFs for the page proofs, and it was all over ... Word tracked changes suck, but trying to check the changes on a large document in a third party word processor like LibreOffice sucks even harder ([due to bugs] that nobody triggered before ...), forcing me into Word for the post-writing workflow. And then a Better Way came along for writing books in the shape of Scrivener, for which there is nothing remotely like an open source equivalent
...:
if you have an editor who's willing to not use Word, you can use a tool like Authorea to track changes, and then incorporate those back into the markdown source.
With AI images, you'd expect this to be easy:
1600 x 2560
pixels, which is around 300 DPI for an A5 sized pageThe problem is that upscaling is only good for certain types of images. It is great for faces, landscapes, stock photos, pretty much anything that it has a lot of data for. It is not so good at upscaling completely novel things.
For example, here's what I want to upscale, from Dalle3's native 1024 x 1024
resolution:
It requires upscaling it by a factor of 2.5. I upscaled it using several free tools, such as ImageUpscalerAI.com, and the results were very impressive - but the problem was that the bokeh blur looked completely unrealistic when upscaled.
It's maybe not too noticeable, but I might instead just recreate this scene in Blender and render the scene. That would probably take a few hours (finding the right textures, surface distortion maps, camera angles, and other settings). Maybe it's not worth doing properly.
Should I self-publish on Amazon? Or look for an indie publisher (who might have low reputation)? Or try for a big-name publisher (who probably won't even bother skimming through my book)? Or truly self-publish?
Other peoples' experiences:
Amazon's print-on-demand books seem to be generally low quality. Is this because self-published authors make mistakes (bad fonts, small font size, incorrect margins)?
Blurry covers are presumably caused by self-published authors' ignorance (not preserving aspect ratios, stretching images, using low-resolution images).
Amazon's KDP costs £8.85 per book for 800 pages. The shipping is probably around £3 for individual copies, or £30 in bulk.
So, I can spend £12 to deliver a single prototype book. That's amazing. That's cheap enough to iterative improvements (prototype -> give to someone -> collect feedback -> edit -> repeat). Cheap enough to also do some A/B testing for different designs/layouts/etc.
An 800-page book would be quite insane, I think. I should probably split it into multiple smaller books, either covering different time periods or continents. Or both - because who really cares if multiple books have overlapping content?
I could hire an editor. £20/hr for a student to read and offer feedback? I don't know how to identify a good editor though.
Bible paper (called 'onion skin' paper) is thin but durable - much higher quality than you'd get from print-on-demand services.
If I really, really care about print quality, I'd probably have to ask book designers/publishers for recommendations on a local offset printer company to print a batch of books off.
If you self-publish on Amazon through KDP, you get an 'Amazon ISBN'. But what is that?
I don't know much about ISBNs, but looking at the price of ISBNs (1 for ~$125, 1000 for ~$1250), I assume they have the same problem that IPv4 has - we optimistically handed out large blocks to the original 'big players' who now monopolise it while we switch to a larger ISBN format.
YCombinator discussions:
Leanpub: Probably good for self-publishing, if I make it “pay what you want” instead of requiring payment. It also seems to allow readers to download permanent copies of books they own.
Gumroad: ?