Everything, thus far, is pure CSS and HTML. I wanted to avoid using JavaScript as much as possible.
I found a 'grumpy man's blog' which complains about bad UI interfaces he and his followers encounter, and I've been using that to help me think about how to avoid unintuitive layouts and behaviours.
The sidebar (on desktop) becomes a header (on mobile). Change your browser's window size to see the problem it creates - a table of contents is more natural to display as a sidebar than as a header. To mitigate this, I grouped sub-headers into containers that automatically re-orient depending on whether the screen is in landscape or portrait mode.
Previously, I displayed a list of all blog pages on the sidebar/header of all blog pages. But the sidebar became valuable real estate when their contents grew, so I removed that and replaced it with a single link to the blog overview page.
In the blog overview page, I wanted to display previews of all blog posts. The lazy solution to that was just to copy the raw HTML contents, use CSS to make it unclickable, and add a “click here” link to the full blog article. The problem with that is that it isn't immediately clear to the viewer that links in the preview are not clickable - so I muted the colour contrasts, which is sort of the universal way of signalling that a UI area is 'disabled'. In particular, I ensured that links in the previews have a muted blue colour - I could have achieved the same effect by simply reducing the preview element's opacity, but that can have unintended side-effects (including slower page rendering).
I found myself often clicking my circular avatar as a way to go back to the main website. This is surprising, because my avatar wasn't clickable. This behaviour was instinctively ingrained in me, presumably because it is so common on other websites that peoples' avatars are clickable. Presumably this instinct is even more ingrained in other people, so I moved my avatar into the link to make it clickable too.
This blog has a different style to the rest of my website. A light-themed blog on a broadly dark-themed website - it doesn't sound like a good idea.
The reason for this is that, to me, a dark-themed blog looks too 'l33t' - too edgy, too new-style.
Dark themes are almost the default for programming - GitHub, VSCode etc - seemingly as a way to distinguish themselves from the professional looks of other industries, just like they do in other ways: wearing hoodies to work, using black gaming laptops with RGB lighting, using monospace fonts.
Light themes are the default elsewhere - Word, Excel, Exchange, white printers, white desktops, shirts - because that is the world of paper. It makes sense to me that the simple, low-tech portion of my website that consists only of words - the blog - should fit this theme of being paper on a screen.
The sidebar table-of-contents should highlight the currently-in-focus section, and it should show only the highest-level headers plus ancestors of the currently-in-focus section (see inspiration).
Look into Clay, a C-based HTML layout engine that works with HTML and WebGL.
Only 3% of requests are to my secondary domain (agray.uber.space); cat *.txt | cut -f1 -d':' | sort | uniq -c
. This is probably because most bots find websites by trawling common TLDs such as .org
for DNS or SSL certificate records, rather than trawling .space
's records.
There's no real trend in traffic versus hour of day; cat .txt | grep -v 80[.]5[.]0[.]0 | cut -f2 -d'[' | cut -f2 -d':' | sort | uniq -c | sed -E 's/^ ([0-9]+) +([0-9]+)$/2:00'$'t''1/g' > hours.csv
. Tabs are human-readable and recognised as cell separators in Excel.
These are the bots who identify themselves in their User-Agent headers, and probably mostly obey robots.txt.
Most can be found with cat ~/logs/webserver/access_log* | grep robots.txt | sed -E 's/^.*"(.)/1/g' | sort | uniq -c
/robots.txt
then downloaded my front-page images - without downloading the HTML. So it must be using a cached version of the HTML somewhere; perhaps it crawled my site in the first couple of days, before I had logs.used by Internet marketers from all over the world
assists internet marketers to get information on the link structure of sites and their interlinking on the web
/robots.txt
(as expected)/ogMn03
or /info/q32Q6f
(which don't exist)/new
or /photography/gallery/koishikawa/
(which dont't exist)/photography/gallery/kyoto/travel/crw_2858_2.htm
(which don't exist)/ads.txt
. From the name, I assume it is an advertising broker.HTTP/2.0
instead of HTTP/1.1
Multiple Chinese IP addresses seems to scrape HTML pages on my website with the User-Agent claiming to be iPhone; CPU iPhone OS 13_2_3 like Mac OS X)
. One repeatedly looked at /blog/literature.html
at a 15 minute then 3 second intervals, clearly waiting to see how frequently I updated that page. That felt a bit creepy.
They aren't the only ones. Someone looked at my front page from Warsaw in Poland, then switched to a Czech Republic IP address and repeated it.
A bot claiming to be Bingbot looked at /
and /wp-admin/setup-config.php?step=1&language=en_GB
without ever looking at /robots.txt
. A similar IP address simultaneously claimed to be Googlebot and looked at /wp-admin/install.php?step=1&language=en_GB
. It was from an IP address owned by the Swedish organisation Aleksander Studzinski Trading AS Biruang IT Kb (biruang.se).
An American IP address (owned by GiGstreem) looked for /.env
, /blog/.env
, /blog/.env
, /api/.env
, /laravel/.env
, /docs/.env
, /_profiler/phpinfo
, /config.json
, and it POST
ed to /
. This was a unique attack, looking to see if this website is deployed by laravel.
There are some bots which I suspect are from LLM training companies such as OpenAI. They make up a plausible-sounding URL, I believe to test whether or not the website contents are automatically-generated (to avoid screwing up their datasets), e.g. on one day:
"GET /smart-faucet-no-batteries-required/ HTTP/1.1" 404 196 "-" [BRAVE_USER_AGENT]
"GET /decorating-guest-roomoffice-easter/ HTTP/1.1" 404 196 "-" [CHROME_USER_AGENT]
"GET /like-see-government-help-people-get-fitter/ HTTP/1.1" 404 196 "-" [DIFFERENT_CHROME_USER_AGENT]
"GET /cant-stop-feeling-sleepy-find-solution-right-now/ HTTP/1.1" 404 196 "-" [ANOTHER_DIFFERENT_CHROME_USER_AGENT]
Interestingly: the first three were Brazilian residential IP addresses, then an Argentine residential IP. Presumably part of a botnet, or perhaps just VPN nodes (not that the two are much different - VPN companies sometimes hijack residential IP addresses of their own customers to sell to less ethical businesses).
Here's the first 24 hours of access logs, in order. The logs were disabled by default for the first 3 days, unfortunately, so I couldn't see what might have been the most interesting period.
By far the most common behaviours were:
wlwmanifest.xml
under 15 paths and then sending a completely blank request. Almost always these came from Digital Ocean VPSs, and sent requests in the same order.All bots used fake User-Agent headers unless otherwise noted.
/wp-admin/setup-config.php
Mozilla/5.0 (Windows NT 10.0; WOW64)
), looked at the front page HTML onlyagray.org:443 66.249.0.0 - - “GET /robots.txt HTTP/1.1” 404 196 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
/robots.txt
- as well-behaved bots are meant to - then, using a multitude of different User-Agents, downloaded the front page, its CSS file and the favicon/.env
and /wp-content/
from my secondary domain name (agray.uber.space
) - the first to access that domain name/worldmap.html
, which is a bit strange/.git/index
simultaneously on both my domain names, with two different User-Agent headers/repos/
/ads.txt
, with his User-Agent header Bidtellect/0.0.0.0
/wp-admin/setup-config.php
/4/mindmap1.png
, both using the curl/7.58.0
User-Agent header/ads.txt
Photon; U; QNX x86pc
and Dalvik/2.1.0 (Linux; U; Android 9.0; ZTE BA520 Build/MRA58K)
)/robots.txt
then looked at the front page HTML onlyMozilla/5.0 (Linux; Android 10; LIO-AN00 Build/HUAWEILIO-AN00; wv) MicroMessenger Weixin QQ AppleWebKit/ ... Chrome/... XWEB/2692 MMWEBSDK/200901 Mobile Safari/...
?fbclid=
parameter, but then immediately switched to the 'facebookexternalhit' User-Agent.Here's some of the last 17 hours of this website's access logs, from 7am to midnight. As expected, all traffic thus far has been either myself, malicious bots, or polite scraper bots.
First, here's a summary, created by cat ~/logs/webserver/access_log | cut -f2 -d' ' | sort | uniq -c | sort -n
. My VPS provider zeros out the last 16 bits of each IP address, so similar IP addresses are grouped together:
Number of requests | Entity or ISP, based on reverse IP lookup | Notes |
---|---|---|
140 | Me! | |
105 | Digital Ocean VPSs | They have DNS records. They exclusively looked for exploits. |
21 | Google (Cloud VPS?) | It was looking for exploits, exactly the same as those on Digital Ocean VPSs, even the same User-Agent header |
21 | Digital Ocean VPSs | They did not have DNS records. They exclusively looked for exploits. |
16 | A Saudi Arabian ISP (Shabakah) | It was looking for exploits, exactly the same as the Digital Ocean VPSs |
14 | Akamai or Linode | Olin/AT&T scraped my front page (2 requests), then immediately afterwards, Akamai/Linode scraped my HTML pages |
13 | Planet Telecom Colombia | It was looking for leaked credentials |
12 | CenturyLink or City of Wheat Ridge | It looked like a real human visiting the home page, but days previously it was clearly a bot |
9 | Cloudflare | I summed multiple of its IP ranges together. It only looked for a WordPress setup-config.php |
6 | Tencent | It only scraped my HTML pages |
I've redacted User-Agent headers to reduce identifiable information, and I've cut out several other unnecessary fields.
This one looks human, because it goes back on itself a couple of times, and it is clearly running JavaScript (based on requesting one music file from rand.html), and it paused for 2 minutes reading /repos/
.
[19:20:20] "GET / HTTP/2.0" 200 592 "-"
[19:20:21] "GET /styl.css HTTP/2.0" 200 2025 "https://agray.org/"
[19:20:21] "GET /4/wiki.webp HTTP/2.0" 200 3500 "https://agray.org/"
[19:20:21] "GET /4/mindmap1.png HTTP/2.0" 200 115828 "https://agray.org/"
[19:20:21] "GET /4/literature1.jpeg HTTP/2.0" 200 199327 "https://agray.org/"
[19:20:21] "GET /4/mail1.jpeg HTTP/2.0" 200 213173 "https://agray.org/"
[19:20:21] "GET /4/perfumery8.jpeg HTTP/2.0" 200 121760 "https://agray.org/"
[19:20:21] "GET /4/earth7.jpeg HTTP/2.0" 200 245653 "https://agray.org/"
[19:20:21] "GET /4/newspapers.jpg HTTP/2.0" 200 177353 "https://agray.org/"
[19:20:21] "GET /4/diary1.jpg HTTP/2.0" 200 274343 "https://agray.org/"
[19:20:21] "GET /4/music3.jpeg HTTP/2.0" 200 189532 "https://agray.org/"
[19:20:21] "GET /4/tech.jpeg HTTP/2.0" 200 280563 "https://agray.org/"
[19:20:21] "GET /favicon.ico HTTP/2.0" 200 5694 "https://agray.org/"
[19:20:23] "GET /contact.html HTTP/2.0" 200 884 "https://agray.org/"
[19:20:23] "GET /gh.png HTTP/2.0" 200 719 "https://agray.org/contact.html"
[19:20:23] "GET /hn.png HTTP/2.0" 200 397 "https://agray.org/contact.html"
[19:20:23] "GET /spotify.ico HTTP/2.0" 200 15086 "https://agray.org/contact.html"
[19:20:23] "GET /snoo.png HTTP/2.0" 200 839 "https://agray.org/contact.html"
[19:20:43] "GET /blog/hobbies.html HTTP/2.0" 200 9920 "https://agray.org/"
[19:20:43] "GET /blog/styl.css HTTP/2.0" 200 922 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/img/me1.jpg HTTP/2.0" 200 67035 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/m.jpg HTTP/2.0" 200 1052 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/dw/DSCF1149.JPG HTTP/2.0" 200 36445 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/mugwort_excel_preview.png HTTP/2.0" 200 32510 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/template_thumbnail.png HTTP/2.0" 200 12666 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/dw/DSCF1150.JPG HTTP/2.0" 200 28350 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/MagnusII_green2.smol.jpg HTTP/2.0" 200 52714 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/german.png HTTP/2.0" 200 109210 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/aeryn_snapS.jpg HTTP/2.0" 200 47975 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/ludwig_spectro.jpg HTTP/2.0" 200 90914 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/aeryn_smaller.jpg HTTP/2.0" 200 80430 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/dw/DSCF1155.JPG HTTP/2.0" 200 29060 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/dw/DSCF1156.JPG HTTP/2.0" 200 33495 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/wwii.jpg HTTP/2.0" 200 56747 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/dw/DSCF1162.JPG HTTP/2.0" 200 24810 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/dw/DSCF1165.JPG HTTP/2.0" 200 27911 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/usa.jpg HTTP/2.0" 200 153689 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/duel.jpg HTTP/2.0" 200 41330 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/wwi.jpg HTTP/2.0" 200 41343 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/castle.jpg HTTP/2.0" 200 40850 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/dw/DSCF1163.JPG HTTP/2.0" 200 30536 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/prize.jpg HTTP/2.0" 200 67577 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/KotR.jpg HTTP/2.0" 200 52847 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/cat.JPG HTTP/2.0" 200 45973 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/gramps.jpg HTTP/2.0" 200 14942 "https://agray.org/blog/hobbies.html"
[19:20:43] "GET /blog/pupSqueaks.jpg HTTP/2.0" 200 721494 "https://agray.org/blog/hobbies.html"
[19:20:51] "GET /repos/ HTTP/2.0" 200 1839 "https://agray.org/"
[19:20:52] "GET /repos/profile.png HTTP/2.0" 200 267082 "https://agray.org/repos/"
[19:20:52] "GET /repos/rscraper1.png HTTP/2.0" 200 674 "https://agray.org/repos/"
[19:20:52] "GET /repos/tagem1.png HTTP/2.0" 200 5954 "https://agray.org/repos/"
[19:20:52] "GET /repos/bpcs-icon.png HTTP/2.0" 200 2407 "https://agray.org/repos/"
[19:20:52] "GET /repos/ResuMaker-icon.jpg HTTP/2.0" 200 205712 "https://agray.org/repos/"
[19:20:52] "GET /repos/stackless-server-icon.jpg HTTP/2.0" 200 253622 "https://agray.org/repos/"
[19:20:52] "GET /repos/webcache-icon.png HTTP/2.0" 200 59137 "https://agray.org/repos/"
[19:20:52] "GET /repos/adhole3.jpeg HTTP/2.0" 200 249990 "https://agray.org/repos/"
[19:20:52] "GET /repos/egix1.png HTTP/2.0" 200 160158 "https://agray.org/repos/"
[19:20:52] "GET /repos/highlighter-icon.png HTTP/2.0" 200 422058 "https://agray.org/repos/"
[19:20:52] "GET /repos/ld-icon.webp HTTP/2.0" 200 369034 "https://agray.org/repos/"
[19:22:42] "GET /contact.html HTTP/2.0" 200 888 "https://agray.org/"
[19:22:47] "GET /mindmap.html HTTP/2.0" 200 1031 "https://agray.org/"
[19:22:48] "GET /mindmap.css HTTP/2.0" 200 608 "https://agray.org/mindmap.html"
[19:22:48] "GET /mindmap.js HTTP/2.0" 200 16222 "https://agray.org/mindmap.html"
[19:22:48] "GET /hobbies.json HTTP/2.0" 200 8040 "https://agray.org/mindmap.html"
[19:23:29] "GET /wiki.html HTTP/2.0" 200 40291 "https://agray.org/"
[19:23:43] "GET /worldmap.html HTTP/2.0" 200 280848 "https://agray.org/"
[19:23:45] "GET /blog/literature.html HTTP/2.0" 200 16155 "https://agray.org/"
[19:23:50] "GET /news_articles.html HTTP/2.0" 200 714 "https://agray.org/"
[19:24:01] "GET /rand.html HTTP/2.0" 200 5086 "https://agray.org/"
[19:24:01] "GET /all_files.json?v=32 HTTP/2.0" 200 1640545 "https://agray.org/rand.html"
[19:24:02] "GET /static/0222?v=32 HTTP/2.0" 206 3770522 "https://agray.org/rand.html"
[19:24:05] "GET /contact.html HTTP/2.0" 304 0 "https://agray.org/"
They left for 2.5 hours, then returned:
[21:57:54] "GET / HTTP/2.0" 200 592 "-"
[21:57:54] "GET /styl.css HTTP/2.0" 200 2025 "https://agray.org/"
[21:57:54] "GET /4/tech.jpeg HTTP/2.0" 200 76804 "https://agray.org/"
[21:57:54] "GET /4/mail1.jpeg HTTP/2.0" 200 213173 "https://agray.org/"
[21:57:54] "GET /4/mindmap1.png HTTP/2.0" 200 115828 "https://agray.org/"
[21:57:54] "GET /4/newspapers.jpg HTTP/2.0" 200 177353 "https://agray.org/"
[21:57:54] "GET /4/literature1.jpeg HTTP/2.0" 200 199327 "https://agray.org/"
[21:57:54] "GET /4/wiki.webp HTTP/2.0" 200 3500 "https://agray.org/"
[21:57:54] "GET /4/earth7.jpeg HTTP/2.0" 200 245653 "https://agray.org/"
[21:57:54] "GET /4/music3.jpeg HTTP/2.0" 200 189532 "https://agray.org/"
[21:57:54] "GET /4/perfumery8.jpeg HTTP/2.0" 200 121760 "https://agray.org/"
[21:57:54] "GET /4/diary1.jpg HTTP/2.0" 200 274343 "https://agray.org/"
[21:57:54] "GET /favicon.ico HTTP/2.0" 200 5694 "https://agray.org/"
[21:58:11] "GET /repos/ HTTP/2.0" 200 1818 "https://agray.org/"
[21:58:11] "GET /repos/tagem1.png HTTP/2.0" 200 5954 "https://agray.org/repos/"
[21:58:11] "GET /repos/profile.png HTTP/2.0" 200 267082 "https://agray.org/repos/"
[21:58:11] "GET /repos/bpcs-icon.png HTTP/2.0" 200 2407 "https://agray.org/repos/"
[21:58:11] "GET /repos/tech.jpeg HTTP/2.0" 200 180313 "https://agray.org/repos/"
[21:58:11] "GET /repos/adhole3.jpeg HTTP/2.0" 200 249990 "https://agray.org/repos/"
[21:58:11] "GET /repos/highlighter-icon.png HTTP/2.0" 200 422058 "https://agray.org/repos/"
[21:58:11] "GET /repos/stackless-server-icon.jpg HTTP/2.0" 200 253622 "https://agray.org/repos/"
[21:58:11] "GET /repos/egix1.png HTTP/2.0" 200 160158 "https://agray.org/repos/"
[21:58:11] "GET /repos/rscraper1.png HTTP/2.0" 200 674 "https://agray.org/repos/"
[21:58:11] "GET /repos/ResuMaker-icon.jpg HTTP/2.0" 200 205712 "https://agray.org/repos/"
[21:58:11] "GET /repos/ld-icon.webp HTTP/2.0" 200 369034 "https://agray.org/repos/"
[21:58:11] "GET /repos/webcache-icon.png HTTP/2.0" 200 59137 "https://agray.org/repos/"
[21:58:13] "GET / HTTP/2.0" 200 592 "https://agray.org/repos/"
[21:58:14] "GET /favicon.ico HTTP/2.0" 200 5694 "https://agray.org/"
[21:58:25] "GET /repos/ HTTP/2.0" 200 1818 "https://agray.org/"
[21:58:25] "GET /repos/tech.jpeg HTTP/2.0" 200 180313 "https://agray.org/repos/"
[22:00:15] "GET /repos/ HTTP/2.0" 304 0 "https://agray.org/"
[22:00:15] "GET /repos/tech.jpeg HTTP/2.0" 304 0 "https://agray.org/repos/"
[22:00:51] "GET /repos/ HTTP/2.0" 200 1818 "https://agray.org/"
[22:00:51] "GET /styl.css HTTP/2.0" 200 2025 "https://agray.org/repos/"
[22:00:51] "GET /repos/tagem1.png HTTP/2.0" 200 5954 "https://agray.org/repos/"
[22:00:51] "GET /repos/rscraper1.png HTTP/2.0" 200 674 "https://agray.org/repos/"
[22:00:51] "GET /4/wiki.webp HTTP/2.0" 200 3500 "https://agray.org/repos/"
[22:00:51] "GET /4/mindmap1.png HTTP/2.0" 200 115828 "https://agray.org/repos/"
[22:00:51] "GET /repos/webcache-icon.png HTTP/2.0" 200 59137 "https://agray.org/repos/"
[22:00:51] "GET /repos/bpcs-icon.png HTTP/2.0" 200 2407 "https://agray.org/repos/"
[22:00:51] "GET /repos/adhole3.jpeg HTTP/2.0" 200 249990 "https://agray.org/repos/"
[22:00:51] "GET /repos/profile.png HTTP/2.0" 200 267082 "https://agray.org/repos/"
[22:00:51] "GET /repos/tech.jpeg HTTP/2.0" 200 180313 "https://agray.org/repos/"
[22:00:51] "GET /repos/stackless-server-icon.jpg HTTP/2.0" 200 253622 "https://agray.org/repos/"
[22:00:51] "GET /repos/egix1.png HTTP/2.0" 200 160158 "https://agray.org/repos/"
[22:00:51] "GET /repos/highlighter-icon.png HTTP/2.0" 200 422058 "https://agray.org/repos/"
[22:00:51] "GET /repos/ld-icon.webp HTTP/2.0" 200 369034 "https://agray.org/repos/"
[22:00:51] "GET /repos/ResuMaker-icon.jpg HTTP/2.0" 200 205712 "https://agray.org/repos/"
[22:00:52] "GET /favicon.ico HTTP/2.0" 200 6354 "https://agray.org/repos/"
[22:01:23] "GET /4/tech.jpeg HTTP/2.0" 200 76804 "https://agray.org/"
[22:09:03] "GET / HTTP/2.0" 200 592 "https://agray.org/repos/"
[22:09:03] "GET /4/tech.jpeg HTTP/2.0" 200 62552 "https://agray.org/"
[22:09:05] "GET /repos/ HTTP/2.0" 304 0 "https://agray.org/"
[22:09:05] "GET /repos/tech.jpeg HTTP/2.0" 304 0 "https://agray.org/repos/"
[22:09:11] "GET /4/tech.jpeg HTTP/2.0" 304 0 "https://agray.org/"
Oh wait. That one was me!
One bot who only accessed the home HTML and favicon, but did so through 2 IP addresses (the first from Poland's HyperNET, the second from Turkey's Radore):
[07:12:09] "GET / HTTP/1.1" 200 609 "-"
[07:12:10] "GET /favicon.ico HTTP/1.1" 200 5694 "-"
[07:12:10] "GET /favicon.ico HTTP/1.1" 200 5694 "-"
Tencent's bot - claiming to be an iPhone - downloaded the front page, then came back hours later - with a slightly different IP address - and scraped all HTML pages:
[16:57:23] "GET / HTTP/1.1" 200 609 "http://agray.org"
[19:06:37] "GET / HTTP/1.1" 200 609 "http://agray.org"
[19:11:09] "GET /blog/ HTTP/1.1" 200 10750 "-"
[19:11:09] "GET /wiki.html HTTP/1.1" 200 40507 "-"
[19:11:45] "GET /news_articles.html HTTP/1.1" 200 731 "-"
[19:11:47] "GET /repos/ HTTP/1.1" 200 1856 "-"
[19:11:48] "GET /blog/literature.html HTTP/1.1" 200 16553 "-"
Olin/AT&T scraped my front page (2 requests), then immediately afterwards, Akamai/Linode scraped all assets from the home HTML. They all had the same User-Agent, claiming to be Chrome on Linux.
[07:37:13] "HEAD / HTTP/1.1" 200 0 "http://agray.org"
[07:37:13] "GET / HTTP/1.1" 200 609 "http://agray.org"
[07:37:12] "GET / HTTP/1.1" 200 609 "http://agray.org"
[07:37:13] "GET /cdn-cgi/trace HTTP/1.1" 404 196 "-"
[07:37:13] "HEAD / HTTP/1.1" 200 0 "-"
[07:37:13] "GET /4/diary1.jpg HTTP/1.1" 200 274343 "-"
[07:37:13] "GET /4/tech.jpeg HTTP/1.1" 200 280563 "-"
[07:37:13] "GET /4/mail1.jpeg HTTP/1.1" 200 213173 "-"
[07:37:13] "GET /4/music3.jpeg HTTP/1.1" 200 189532 "-"
[07:37:13] "GET /4/mindmap1.png HTTP/1.1" 200 115828 "-"
[07:37:13] "GET /4/earth7.jpeg HTTP/1.1" 200 245653 "-"
[07:37:13] "GET /4/wiki.webp HTTP/1.1" 200 3500 "-"
[07:37:13] "GET /4/literature1.jpeg HTTP/1.1" 200 199327 "-"
[07:37:13] "GET /4/perfumery8.jpeg HTTP/1.1" 200 121760 "-"
[07:37:13] "GET /4/newspapers.jpg HTTP/1.1" 200 177353 "-"
[07:37:13] "GET /styl.css HTTP/1.1" 200 2006 "-"
That bot's requests all look normal, except for /cdn-cgi/trace
. This is a Cloudflare-specific URL, which they are probably requesting purely to test if this website is behind Cloudflare.
Two different Amazon IP addresses - AWS? - hosting a bot who somehow already knows that my domain is hosted on two domains, and tests to see if they are different:
agray.org:443 - - [12:14:40] "GET / HTTP/1.1" 200 609 "-" [user agent claimed to be Windows, Chrome version 58]
agray.uber.space:443 - - [12:14:47] "GET / HTTP/1.1" 200 609 "-" [user agent claimed to be Windows, Chrome version 58]
agray.org:443 - - [12:42:00] "GET / HTTP/1.1" 200 609 "-" [user agent claimed to be Linux, Chrome version 88]
agray.uber.space:443 - - [12:42:00] "GET / HTTP/1.1" 200 609 "-" [user agent claimed to be Linux, Chrome version 80]
Some basic automated requests:
from Microsoft IP [08:33:53] "GET / HTTP/1.1" 200 609 "-" "curl/8.6.0"
from L3 IP [08:52:13] "GET / HTTP/1.0" 200 1499 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50728)"
from Amazon IP [12:11:16] "GET / HTTP/1.1" 200 609 "-" [claimed to be Windows with Chrome version 66]
from Amazon IP [12:11:18] "GET / HTTP/1.1" 200 609 "-" [claimed to be Windows with Chrome version 66]
from Wecom IP [15:38:36] "GET / HTTP/1.1" 200 609 "-" [claimed to be Windows with Firefox version 120]
from OVH IP [15:40:53] "GET / HTTP/1.1" 200 609 "-" [claimed to be Windows with Chrome version 91]
from JCom IP [15:47:19] "GET / HTTP/1.1" 200 609 "-" [claimed to be Windows with Chrome version 79]
from Google IP [16:16:14] "GET / HTTP/1.1" 200 609 "-" [claimed to be Windows with Chrome version 114]
from Servers.com IP [18:53:03] "GET / HTTP/1.1" 200 609 "-" [claimed to be Linux with Chrome version 83]
from China Mobile IP [22:37:58] "GET / HTTP/1.1" 200 609 "http://agray.org/" [claimed to be Windows with Chrome version 125]
Notice how this request - from an OVH IP address - sends an Origin header claiming that it found my website through Google search, which can't be true as Google hadn't yet put my site in search results for my own domain name:
[16:42:06] "GET / HTTP/1.1" 200 609 "https://www.google.com/"
Sometimes they'll ask for the favicon too:
[20:54:25] "GET / HTTP/1.1" 200 609 "-"
[20:54:28] "GET /favicon.ico HTTP/1.1" 200 5694 "https://agray.org/"
Some script-kiddie - who the only IPv6 address in the logs - forgot to change the User-Agent header in his code:
[23:28:16] "GET /contact HTTP/2.0" 404 196 "-" "Symfony BrowserKit"
[23:28:16] "GET / HTTP/2.0" 200 592 "-" "Symfony BrowserKit"
The first human-looking requests who aren't me. Notice that it requests the CSS first, like a browser would, because the CSS is in the head
and is required for rendering the page. Its User-Agent header claimed Chrome 117, which is a more recent version than most of the bots claim to be.
[08:37:19] "GET / HTTP/2.0" 200 592 "-"
[08:37:21] "GET /styl.css HTTP/2.0" 200 1989 "https://agray.org/"
[08:37:38] "GET /4/mindmap1.png HTTP/2.0" 200 115828 "https://agray.org/"
[08:37:41] "GET /4/tech.jpeg HTTP/2.0" 200 280563 "https://agray.org/"
[08:37:46] "GET /4/literature1.jpeg HTTP/2.0" 200 199327 "https://agray.org/"
[08:37:49] "GET /4/newspapers.jpg HTTP/2.0" 200 177353 "https://agray.org/"
[08:37:50] "GET /4/mail1.jpeg HTTP/2.0" 200 213173 "https://agray.org/"
[08:37:50] "GET /4/wiki.webp HTTP/2.0" 200 3500 "https://agray.org/"
[08:37:52] "GET /4/diary1.jpg HTTP/2.0" 200 152708 "https://agray.org/"
[08:37:52] "GET /4/music3.jpeg HTTP/2.0" 200 189532 "https://agray.org/"
[08:37:52] "GET /4/earth7.jpeg HTTP/2.0" 200 245653 "https://agray.org/"
[08:37:52] "GET /4/perfumery8.jpeg HTTP/2.0" 200 24576 "https://agray.org/"
There were many clearly malicious bots, testing to see if my server is vulnerable to known exploits. Most were hosted on Digital Ocean IP addresses, but one set was from a Google IP address, and one set was from a Saudi Arabian ISP's IP address.
They tested exactly the same URLs in exactly the same order, they almost all came from the same IP range 138.197.N.N
, and they all had the exact same User-Agent header: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36
, except for several with a slightly different Chrome version.
For example:
[08:57:52] "GET / HTTP/1.1" 200 1499 "-"
[08:57:52] "GET //wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:53] "GET //xmlrpc.php?rsd HTTP/1.1" 404 196 "-"
[08:57:53] "GET / HTTP/1.1" 200 1499 "-"
[08:57:53] "GET //blog/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:53] "GET //web/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:53] "GET //wordpress/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:53] "GET //website/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:53] "GET //wp/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:53] "GET //news/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:53] "GET //2018/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:53] "GET //2019/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:53] "GET //shop/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:53] "GET //wp1/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:54] "GET //test/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:54] "GET //media/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:54] "GET //wp2/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:54] "GET //site/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:54] "GET //cms/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:54] "GET //sito/wp-includes/wlwmanifest.xml HTTP/1.1" 404 196 "-"
[08:57:54] "" 400 0 "-" "-"
A bot from a Colombian IP address, trying to see if I've accidentally published my PHP or AWS credentials in dotfiles. This time it is accessing my server through its secondary domain name, agray.uber.space
:
[11:12:54] "GET /.env HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
[11:12:54] "GET /.env.exemple HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
[11:12:54] "GET /config.json HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
[11:12:54] "GET /sendgrid.env HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
[11:12:54] "GET /.aws/credentials HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
[11:12:54] "GET /phpinfo.php HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
[11:12:54] "GET /phpinfo HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
[11:12:54] "GET /info HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
[11:12:55] "GET /php_info HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
[11:12:55] "GET /php_info.php HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
[11:12:55] "GET /info.php HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
[11:12:55] "GET /_profiler/phpinfo.php HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
[11:12:55] "GET /_profiler/phpinfo HTTP/1.1" 404 196 "-" "Go-http-client/1.1"
A Cloudflare bot using 3 IP addresses, then returning with 7 hours later with 2 other IP addresses:
[14:47:34] "GET /wordpress/wp-admin/setup-config.php HTTP/2.0" 404 196 "-"
[14:49:24] "GET /wordpress/wp-admin/setup-config.php HTTP/2.0" 404 196 "-"
[14:49:39] "GET /wp-admin/setup-config.php HTTP/2.0" 404 196 "-"
[14:49:39] "GET /wp-admin/setup-config.php HTTP/2.0" 404 196 "-"
[21:43:36] "GET /wp-admin/setup-config.php HTTP/2.0" 404 196 "-"
[21:44:16] "GET /wp-admin/setup-config.php HTTP/2.0" 404 196 "-"
[21:45:10] "GET /wordpress/wp-admin/setup-config.php HTTP/2.0" 404 196 "-"
[21:45:34] "GET /wordpress/wp-admin/setup-config.php HTTP/2.0" 404 196 "-"
User-Agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/NNN.NN (KHTML, like Gecko) Chrome/NNN.0.0.0 Safari/NNN.NN"
I just spotted my first definitely-real human: someone who landed on an URL I had published. Hello! Actually, they just loaded this page up but didn't click on anything after that, so they didn't read my blog or anything :(. They used Edge browser, and during the day, so probably behind a corporate network.
I have heard that Google bots sometimes cause problems by checking URLs out before their users click on them. But I don't believe this is what happened:
Not only was it a British IP address, but none of the other 'landing page' URLs I've emailed have been touched. Google - having already scraped my website - has probably not put me on some kind of 'naughty list', or perhaps it only performs these checks periodically.
As this website is hosted almost entirely as static web pages, I can't control a lot of things - e.g. I can't geoblock IPs, I can't get the server to display different content for different User-Agents.
But I can control the landing pages I send out. If I send a unique URL to each person, with an invisible HTTP redirect to the 'final' landing page, I could identify in the logs who exactly has viewed my website.
It needs to:
I'm going to place a few honeypots around, just for fun.
In my robots.txt
file, I tell all bots not to access this random directory. I thought it might lead to some bots actively looking at it, but none have so far.
User-agent: *
Disallow: /672acbaeb318ca495a6f311485ba9ce5dac567a4/
I tested by accessing my website behind a corporate firewall. It sent requests to HTML and CSS via a different IPv4 address than the request to JPEG images, and used HTTP/1.1
(whereas modern browsers use HTTP/2.0
), thus it is probably being MITM'd by the firewall, with images going through a separate proxy or something.
'Amazon Data Services NoVa' in Ashburn in Virginia was the first bot to behave exactly like a browser - using HTTP/2 (like a modern browser), loading HTML then CSS then images then ICO, using an 'origin' header on all page resources. Most likely it was using a headless Chrome agent:
“GET / HTTP/2.0” 200 695 “-” [CHROME_ON_LINUX_USER_AGENT]
“GET /styl.css HTTP/2.0” 200 2172 “https://agray.org/” [CHROME_ON_LINUX_USER_AGENT]
“GET /4/tech.jpeg HTTP/2.0” 200 62552 “https://agray.org/” [CHROME_ON_LINUX_USER_AGENT]
“GET /4/map3.jpeg HTTP/2.0” 200 17782 “https://agray.org/” [CHROME_ON_LINUX_USER_AGENT]
“GET /4/wiki.webp HTTP/2.0” 200 3500 “https://agray.org/” [CHROME_ON_LINUX_USER_AGENT]
“GET /4/mindmap1.jpg HTTP/2.0” 200 26788 “https://agray.org/” [CHROME_ON_LINUX_USER_AGENT]
“GET /4/literature.jpg HTTP/2.0” 200 63861 “https://agray.org/” [CHROME_ON_LINUX_USER_AGENT]
“GET /4/perfumery8.jpeg HTTP/2.0” 200 121760 “https://agray.org/” [CHROME_ON_LINUX_USER_AGENT]
“GET /4/newspapers.jpg HTTP/2.0” 200 47914 “https://agray.org/” [CHROME_ON_LINUX_USER_AGENT]
“GET /4/mail1.jpeg HTTP/2.0” 200 84634 “https://agray.org/” [CHROME_ON_LINUX_USER_AGENT]
“GET /4/music3.jpeg HTTP/2.0” 200 53177 “https://agray.org/” [CHROME_ON_LINUX_USER_AGENT]
“GET /4/diaryB4.jpeg HTTP/2.0” 200 120617 “https://agray.org/” [CHROME_ON_LINUX_USER_AGENT]
“GET /favicon.ico HTTP/2.0” 200 6354 “https://agray.org/” [CHROME_ON_LINUX_USER_AGENT]
The same User-Agent header was used by IP addresses on Digital Ocean VPSs and other obvious bots.
49.51.0.0 04:35:36 “GET /wiki.html HTTP/1.1” 200 40423 “-” [SAME_USER_AGENT]
49.51.0.0 04:43:54 “GET /wiki.html HTTP/1.1” 200 40605 “-” [SAME_USER_AGENT]
43.130.0.0 04:52:28 “GET /wiki.html HTTP/1.1” 200 40423 “-” [SAME_USER_AGENT]
43.153.0.0 05:06:36 “GET /wiki.html HTTP/1.1” 200 40423 “-” [SAME_USER_AGENT]
43.130.0.0 05:11:42 “GET /wiki.html HTTP/1.1” 200 40592 “-” [SAME_USER_AGENT]
43.159.0.0 05:20:07 “GET /wiki.html HTTP/1.1” 200 40592 “-” [SAME_USER_AGENT]
49.51.0.0 05:35:36 “GET /wiki.html HTTP/1.1” 200 40423 “-” [SAME_USER_AGENT]
43.156.0.0 05:46:19 “GET /wiki.html HTTP/1.1” 200 40605 “-” [SAME_USER_AGENT]
43.130.0.0 06:06:58 “GET /wiki.html HTTP/1.1” 200 40605 “-” [SAME_USER_AGENT]
43.135.0.0 06:09:53 “GET /wiki.html HTTP/1.1” 200 40423 “-” [SAME_USER_AGENT]
49.51.0.0 06:19:53 “GET /wiki.html HTTP/1.1” 200 40423 “-” [SAME_USER_AGENT]
170.106.0.0 06:33:20 “GET /wiki.html HTTP/1.1” 200 40592 “-” [SAME_USER_AGENT]
43.153.0.0 06:48:15 “GET /repos/mediawiki.html HTTP/1.1” 200 1056 “-” [SAME_USER_AGENT]
Why are numerous different IP addresses accessing the same page, minutes apart, with the same User-Agent header, asking for different compression encodings (that's why the response is 40423 or 40605 or 40592 bytes)? If it were a human, it would send POST requests (due to the nature of this web page) and would probably be using HTTP/2 (most browsers do).
The IP addresses were from Tencent Cloud IP addresses, in Virginia (170.106.0.0), California (43.153.0.0 and 43.130.0.0), California (49.51.0.0), Hong Kong (43.159.0.0 and 43.135.0.0), and Singapore (43.156.0.0).
Looking into my logs for this exact User-Agent, I see exactly the same behaviour for /blog/hobbies.html
(accessing it dozens of times in a row, as though checking for updates; three times it tries to access /blog/Hobbies.html
)
Why do big websites use multiple different domains? Not just sub-domains - completely separate domains. For example, github.com
uses githubassets.com
and githubusercontent.com
.
It turns out that web cookies were badly specified, and now browsers scope them by guessing who the top-most ownership domain name is, using a compiled-in list (hence why .co.uk
is treated like a TLD like .com
instead of as the cookie ownership domain of example.co.uk
). To avoid user-uploaded files from somehow stealing cookies from the main website, using a completely separate domain name is the easiest solution - the alternative is for the domain to apply to join this compiled-in list (but even then it relies on GitHub not making other subtle configuration mistakes).
I've heard that marketing departments at big companies often get their own domain names for exactly this reason - they like making flashy websites, they change/iterate assets more frequently, and have less understanding of security, so they might use insecure toolkits or unstable APIs that leak user credentials. Little harm can be done if this domain is kept completely separate from the company's main domain.
For 'legal-y' reasons, content that users upload also need to be on completely-separate domains. For example, if a user uploads an illegal image as their avatar to a certain website, it might trip an automated ban of the entire domain name that the image is hosted on - and then everything associated with that website is temporarily down. Far better to mitigate the risk by partitioning user-uploaded content on their own separate domain.
If you use a script-injection browser addon, such as GreaseMonkey or TamperMonkey, then you have probably noticed that it is broken on my website.
My host - UberSpace - allows me to set or override HTTP headers for any page.
CSP headers are one of the browser's main methods to avoid cross-site scripting (XSS) attacks. Essentially, websites can tell browsers to refuse to load any script or stylesheet or image or video - or anything - that was not served from the website itself, or from any other specified trusted source.
If you're reading this, you probably already know what an XSS attack is, but if not, here's an explanation.
For my single-page apps, such as my 'world essays', I like to keep everything in the one page, and restrict CSP to the exact SHA hashes of the script and stylesheet. That way, I can keep them inline, without enabling execution of all inline code (inline code is the main danger - if you allow inline code execution, you may as well not have CSP headers). That way, a user can still download the app and run it locally, without needing to download the JS and CSS separately.
There's several points of annoyance though. For example, connect-src
applies both to fetch
calls (i.e. downloading JSON) and to ws://
(websocket connections). I think there should be finer control over protocols and ports, although maybe the justification is that you can emulate ws://
through fetch
calls (i.e. periodic polling, and POST and GET, basically gets you a very primitive websocket).
HTML allows you to set 'HTTP-equivalent headers' in meta
tags. For example, with <meta http-equiv=“Refresh” content=“0; url='https://example.com'”/>
I could basically run an URL shortener for any links that I choose, even just as static HTML.
This is just a workaround, because at the moment I can only server static HTML. The fastest way would be to serve the usual 302 Found
or 301 Moved Permanently
HTTP responses - which avoids the overhead of sending and parsing HTML.
The only legitimate reason to use URL shorteners is to replace long URLs.
URL shorteners are often used for phishing attempts and such too, of course, which is why I think it's a terrible policy to allow or even encourage using them.
The only times I encounter long URLs is when the URL is part of the authentication - i.e. it is 'security through obscurity', for example for an important document I've been emailed that contains semi-confidential information. Using an URL shortener for this would allow an attacker to iterate over the short URLs to discover these interesting long URLs.
Thus it's not worth implementing.
Once you start using a domain name for emails and website, you are locked in to that name for a long time. If someone else bought my domain name, they would gain access to all my future emails.
Considering the email address they addressed this to - `pgp@agray.org` - a human did not select the email address, it must have taken the first email address it saw on my website.
A WHOIS lookup on `agray.net` suggests it is not registered by anyone. Their website offers their own 'payment processor' to handle payments, suggesting to me that it might be a scam (more of a scam than the basic domain-squatting rent-seeking behaviour). Which suggests that the emails are sent automatically to thousands of website owners, before they purchase web domains. Thus by clicking on the link or replying to the email, it would tell them of my interest and might trigger them to purchase `agray.net`.
I chose Gandi, a French company. Although they were bought out by a VC around 2021, they still seem solid enough.
The alternatives:
I'm actually planning to transfer away from Gandi, because their renewal fees are twice that of cheaper alternatives, and because I've heard that their email service is unreliable.
NOTE: Don't use websites to search for domain availability - use the whois
command line utility. Some domain registrars will buy the domains you searched and resell for a premium, before you get a change to purchase it yourself.
Also note: Reading the terms-and-conditions of registrars, some reserve the right to 'halt services' for extremely vague reasons. I'm not a lawyer so I just hope not to annoy anyone enough to target me.
I wanted a .uk address, but decided against it because of:
X.uk
is less common than X.org.uk
or X.co.uk
, so people might automatically add the 'missing' .org
or .co
X.org.uk
might be mistaken for X.org
www.
prefix, .co.uk
suffix).gray.uk
- the plain HTTP response redirects to the true domain.www.
subdomain - to avoid cookies spilling into other subdomains - but modern cookie instructions make this unnecessary.After making this decision, I had a closer look at .uk
. It is owned by Nominet, which has had some controversies, particularly around domain-squatting and censorship of its critics.
Here are some other traditional TDLs:
Country TLDs completely depend on the legal system of the jurisdiction - some countries allow outright theft of domain names.
These should be avoided, because:
www.mcdonalds
is missing .com
Examples include:
It used to be standard practice to use .local
as a test/internal fake TLD for Active Directory, but it is not reserved, so one day maybe it will also become a vanity TLD and all these badly-configured networks will send their traffic to it. This is what happened with .dev
- many developers used it as a test TLD, then Google bought it and caused a headache. Semi-reserved TLDs include .corp
and .mail
, so I'm confused why they didn't also reserve .dev
instead of giving it to Google.
My biggest fear was buying a domain for a year and then getting hit with an extortionately-high renewal fee.
But, AFAIK, there are strict rules that the traditional TLDs (.com, .org, .net) owners must abide by, preventing this rent-seeking behaviour. The .org owner apparently attempted to change those rules years ago, but the California anti-trust regulator threatened to investigate them if it went ahead.
TLDs are allowed to mark a domain name as a 'premium' and thus charge huge fees for it - but it must be premium at the time of purchase. So some registrars try to trick you by offering a huge discount for the first year of a premium name, and reverting to the full price for renewals.
I chose to use a VPS, instead of running on my own hardware.
At the moment, it is hosted on uberspace.de, a German company known to me as hosts of youtube-dl.
The alternatives:
It's all static, generated either:
The WebSocket messaging feature, although implemented by my hand-written server, is not yet deployed on the VPS until I satisfy all the security concerns I have with it.
Hand-written JavaScript and CSS.
Images are not optimised - every client sees the same images.
Obviously all stock images are AI generations, because legal precedent is that copyright doesn't apply to AI-generated work.
I'd prefer to use images of real human paintings, but the problem is that even if the paintings themselves are old enough to be public domain, the images are taken by photographers who have copyright over the photographs. So I would have to photograph each painting myself, which I'm not going to do.
For email addresses, I use name-generator.org.uk to generate random names for account signups.
See main article.
Instead of handing out my firstname.lastname123@gmail.com to everyone, I can hand out recipientname@agray.org - a unique email address for each recipient.
So now I can tell who leaks my email address - who sells it to spammers or whose databases were breached.
It's also just funny to email someone from theirname@agray.org.
Sometimes lawyers will contact you and demand to know what trademark you are using their company name for - you have to take the time to explain to them that you have a unique email address for each company you contact, and it's easier to say that it's for email routing/filtering than to give the real answer.
Obviously you don't need a website just to have a domain name - but if you have a domain name, why not make the website too?
But here and here are some anecdotes against custom email domains:
I own firstname@lastname.com and my spouse also uses firstname@lastname.com for most emails. ...
probably 1 in 5 times ... [people do] not believe that it's a real email address ... The recipients assume that my spouse is so epically clueless that they're giving a mistaken email address which goes nowhere. ...
Many people are simply incredulous that there isn't a “gmail” or “yahoo” etc in my email address.
Outside of my small bubble of tech savvy folk, not a single person has looked at my [<my name>.com] email address and thought it wasn't a typo or lame joke.
If I was a company, <name>@<company>.com is totally fine, but as soon as <company> becomes a <name> all bets are off.
In my experience anything that's not Gmail is considered suspicious, .com or not.
I think more (especially in business world) that gmail.com is very unprofessional. Makes me think they are amateurs and cannot even setup a corporate email address.
I had a website break, because they were filtering emails that contain their domain name ... I had no way to apply for the [licence] I needed, [customer support] suggested I use the website.
I've had a few occasions that employees thought I worked for the company based on my email. ... [For example, a Hilton] employee thought I worked for corporate due to my email, hilton@domain.tld.
[Explaining that] I have separate emails for every company due to spam reasons ... caused confusion so now [it's just easier to] go along with what they think or hint that I'm some 'mystery shopper'.
And some notes on email literacy:
A couple of years back, a friend was doing some recruiting. One of the applicants had the email address givesdamngreathead@hotmail.com.
there are a lot of people out there who think you can just type someone's last name into webmail and it's just going to magically go to the person they intended. ... type the name and it autocompletes. ... autocomplete failures that go unnoticed by the sender.
My student email was @tcd.ie - I had plenty of websites I wasn't able to sign up to, offers I couldn't redeem and mail that just never arrived [because many companies have bad email address validation filters].
I have my 6 letter surname at gmail as well, and [I receive] Everything from wedding invitations to banking to schools etc. [probably because people put a space when they type 'Firstname Lastname@gmail.com' and it gets split into two email addresses, only one of which is valid] ... some kid in Arizona who shares my last name has signed up for everything from golf to Epic Games. I found him on facebook ... and politely asked him if he could try to not use my email for things. He told me it was his, called me a creep and blocked me.
[I own [lastname].com and] there’s apparently a ton of people with my last name who sign up for services using their first name @[lastname].com, even though they obviously don’t have access ... I see plenty of bank accounts and other important accounts ... I also get plenty of personal email as well, intended for people with that first name and last name. [When I] reply to tell them they have the wrong email address, [they sometimes reply] back again asking me if I could pass the message along to that person as if I’m supposed to know them.
[My wife's email is] firstnamelastname.gmail ... [but lots of users have the same username but with numbers at the end, and she receives so many emails intended for other people, including] PII ... real estate documents, job offers, legal communications, x-rays, etc.
I own a four char gmail account that is a common word ... I get _everything_. ... I have bank info, hundreds of AT&T cell phone contracts, pro baseball player contracts, mortgages, taxes, paypal, investment accounts.
I own a domain that is very similar to a ballet company for children in Florida and once, a mid-sized CEO emailed me PDF images with credit card details!
Isn't agray.org
better than agray.github.io
? (That's not my github account name, by the way).
Instead of having a bunch of identities tied to a 3rd-party service - like facebook.com/agray123
, tiktok.com/@agray456
, reddit.com/user/agray789
(not my real social media pages) - I can have one central location for my identity. Anyone I meet, if I want to communicate with them, only needs to remember agray.org
, instead of remembering an email or social media user URL I might have abandoned.
Instead of relying on Google for my email identity, I can subscribe to any email provider I want, set the MX DNS records, and the email will go to them. So my identity is no longer tied to Google. The weak point is now my phone number, which some services force me to use for SMS 2FA, even though a software-based OTP is more secure and has the benefit of being transferrable.
It also looks more professional. At the moment I'm not sure how to separate professional and personal sides - I'll do it eventually. But realistically, no boss or client is going to bother reading my blog, I'm not that interesting.
And to be honest, I dislike how the web is turning into 'walled gardens'. Everything is centralising into the big web apps, with people using apps instead of web browsers. It would be silly for me to complain about this without at least trying to see how difficult running a website is (not difficult at all).
Domain names need to be renewed, but the maximum renewal allowed is 10 years. So if I'm in a 10-year-long coma, I could wake up to my domain being owned by a squatter, who would now have access to all my emails.
Compare this to a GMail account, which is deleted after 2 years of inactivity, but whose username is not reassigned to new accounts. Fastmail does allow new users to take old usernames (for *@fastmail.com
accounts), so it has the same downside as using a custom domain name.
Even a WHOIS domain name that was hardcoded into popular net tools was forgotten about and sold for $20.
But even if you properly renew the domain purchase every year, you can lose the domain name to a social engineering attack on your registrar. Or your domain registrar may prevent you from renewing your domain.
I've also heard about DNS hijacking. It seems to be possible to somehow slowly insert yourself as a DNS server into the DNS network, gradually becoming an authoritative server, then go unnoticed as you serve false IP addresses for a small number of domain queries - which would allow the attacker to harvest email but not encrypted communications (HTTPS website, SSH if you don't accept a different fingerprint, etc).
I considered opting out of WHOIS privacy, because I thought it might make it easier to prevent social engineering attacks on my domain registrar. But it might make attacks easier.
Regardless of what you do, among a thousand people who read it, someone will hate you for it - how you look, how you smile, how you write, how your words sound too happy or too snarky or too posh.
coldpie on HN: you really have to say something extremely boneheaded for anyone to care about your opinions enough to do something about it. Stick to topics you're knowledgeable about (this is how you avoid saying boneheaded things), and generally stay positive and constructive, and you won't have any trouble.
As a child, on a tech-related forum (overclockers.co.uk or something?), I remember disagreeing with someone's opinion about NVidia vs AMD GPUs - and from then on, he harassed me around that forum (making dozens of accounts to report all my posts, sending me DMs proving that he's archived my posts and threatening to dox me, finding one of my accounts on another forum and asking 'Is this you?'). I eventually found his Reddit account (by searching for accounts that posted the same links that he posted), and it turned out he was a meth addict all across the ocean in America.
So, by posting things under your real name online, you open yourself up to harassment by any meth-head who sees it. It's not possible to avoid harassment just by being polite or knowledgeable (not that my child-self was either of those things) because there's no subjective measure of politeness or knowledge - what is considered 'polite' varies by culture, and what is considered 'knowledge' depends entirely on the other person's understanding of the world!
tl;dr: Security.
This was my original plan, but the cost of upgrading my consumer-grade broadband to allow for enterprise-level upload speeds and connections would be higher than the cost of renting a VPS.
Many ISPs don't want you to host your own websites from their consumer-grade services; even though I could do it without upgrading, I'd worry they'd cut off my internet for breaking their terms-of-service.
I self-host several web services - including a functioning HTTPS website - on my LAN. Although my self-written server code has not had a segmentation fault for a year now, it occasionally suffers 'peculiar' behaviour that would either cause high bandwidth usage or high CPU usage or cause problems for viewers.
My server software is battle-tested against browser, curl, wget and siege (a fuzzer) - but I'm paranoid enough to worry about what happens if I deploy it and someone finds a vulnerability and uploads illegal content to my site. So I'm not doing it, for now.
There's a very good reason not to self-host your own email: almost all services rely on IP address blacklists which block anything with 'not-good reputation', to avoid spam.
Imagine if you couldn't contact anyone - from friends to companies to government services - by physical mail because they all rely on a 3rd-party filter, and that 3rd-party declares your home address to be prone to spammers.
Well, that's exactly how email is - you have to either:
You'd also first need to contact your ISP and get ports unblocked to allow inbound mail - which I've heard is more difficult today than it used to be, because ISPs are more strict and self-hosting email is rarer today.
My VPS host allows content to be served directly only if it is static content. Otherwise, it is through reverse proxy.
Thus cache optimisations are much more reliable for static content. I haven't even implemented ETags and other caching headers fully on my self-written server, but even if I had, it is likely that the cache often wouldn't be obeyed, because it would be going through Apache/nginx rather than directly to the client.