The Complexity of Downloading Favicons, told in 15+ edge cases

tldr; Downloading that little icon you see in you browser tabs should be a simple exercise. It turned out to be a lot more complicated than I thought. Be vigilant that you are not shaving a Yak.

Today's lesson in WTF, I suck as a developer

Sometimes we think, "I'll just quickly code up this thing to do the thing." Twenty hours later you look up, see commit-log filled with WTF moments and profanity, and nested if statements handling edge cases you never thought possible.

This can be a real kick in the gut of your ego. In my case it taught me to find the balance between "getting it done" and trying to be "complete" in my execution.

Should you want to see the level of bullshit involved in trying to get to a complete solution, read on. I also have a git repo where you can see my shame in all its over-complicated glory.

curabase/favicon-getter
Simple flask app to retrieve and display a website’s favicon. It is more complex than you think! - curabase/favicon-getter

hah. I said "simple"

What is a Favicon?

See all those little icons in the tabs of your browser (if on desktop)? Those are favicons. The icons in the shortcuts on your mobile are similar but come from another spot in the HTML.

The complexity of a single HTML meta tag

The favicon is defined in the <head> of the HTML page. For example, the code defined for this website looks like this:

<link rel="shortcut icon" href="/favicon.png" type="image/png" />

Suffice to say this is not the only way to define it. And, this being the shit-show of the web, you can screw this up and the browser STILL manages to get the icon out.

Edge cases everywhere

Suffice to say, there are many options, and you never really know what you will get when trying to download this little bit of a web page. After doing this little project I have a new appreciation for web browser developers and the infinite number of edge cases they have to handle.

Here is a list of all the different things I encountered along the way to building a robust and aggressive favicon downloader.

  1. What happens when there is a 404 on the linked icon? Do we continue to home page and try again?
  2. What if the domain of the URL does not resolve in DNS?
  3. Some websites work on www but not on bare domain? WTF?
  4. Sites have broken SSL or invalied certs; do we ignore it and continue anyway?
  5. You get redirected off the domain, do you follow?
  6. Change your request headers to look like a real browser because some sites will block.
  7. Don't get rate-limited when aggressively trying different endpoints looking for that icon
  8. Sometimes you get lucky and can simply hit http://example.com/favicon.ico as per conventions.
  9. Sometimes file is actually another format than what its extension might suggest (eg. .ico is actually a .png)
  10. Sometimes a request to the ico without a referral url will get you into a 301 redirect loop
  11. Requesting the icon against a CDN or load balancer multiple times can yield different favicons
  12. Sometimes, for whatever reason, the downloaded file is corrupt. What then?
  13. Sometimes the icon is CMYK and you need to convert it to RGB
  14. IF it is an svg then you have to render it out to raster format
  15. The linked files can be any of .ico, .gif, .png, .jpg, .svg, .tiff, or a base64-encoded data-uri of any of the formats.
  16. Sometimes it is an animated gif (WTF?)
  17. Sometimes there are multiple sizes, according to the spec (https://html.spec.whatwg.org/multipage/links.html#rel-icon)
  18. Sometimes favicons are not square
  19. Sometimes favicons have transparent pixels (GIF) or an alpha channel (PNG)
  20. Sometimes the favicon is dynamically generated by the page JS and is actually a playable arcade game: http://www.p01.org/defender_of_the_favicon/

Or, failing any of those, we now have app icons introduced from the Apple, Android, and now defunct Windows Mobile ecosystems. So that is another several HTML meta tags to check out.

Eventually you get to the end of the rope and just decide to generate an image based on some criteria.

Update: A Redditor (/u/AyrA_ch) was kind enough to answer the 16 questions, which I have linked here. Read /u/AyrA_ch comment

Update: #17, #18, and #19 are courtesy @warpech over at Hacker News.

Update: #20 thanks to @xg15 and @cstuder over at Hacker News.

What to do? The moral of the story

Well, after I spent so much time building this little tool (this was the classic Yak-Shave) I decided to put it out there so maybe someone can learn from it.

The moral here is to be vigilant. Once you detect that this little beging task is rife with complexity then raise the flag.

Come up for air early and often. Get feedback early and often.

Or, maybe you join me in the mission and we write the most complete and robust favicon-getter ever. LOL

Bonus! Follow the conversations out there on HN and Reddit.

Feb 2020: