It’s great to be back! I haven’t posted anything here for quite a while, so thanks for reading!
If you’re reading this in the future, Facebook experienced a major outage on October 4, 2021 that lasted for several hours. This post was written with information available on the evening of October 4th. Any new information released will not be added to this post.
In case you hadn’t noticed, Facebook was down for 6 hours today. There have been small outages before that prevented images from loading, new account signups, or new posts. But this outage is different. Facebook was down, completely. With Facebook went Instagram, WhatsApp, and Oculus VR.
Will this make a dent in Facebook’s prowess? Probably not. However, there are a handful of things to understand about this incident.
1. How did this happen?
To put it simply: A Facebook-owned server announcing the exact address of Facebook’s domains just stopped responding. This meant that when Facebook clients tried to reach the server they got this response: “facebook.com? never heard of it.” The configuration error caused the phonebooks of the internet, DNS servers, to forget the routes to find Facebook’s servers.
Because Facebook engineers relied on the facebook.com domain to connect to the affected server remotely, they had to physically travel to the data center to physically change the configuration of the server. Reports indicate that soon after this change was made, the servers weren’t ready to handle the sudden influx of every device suddenly calling out to them.
In more technical terms: The BGP routing service for Facebook’s domains had a buggy configuration pushed to it, causing it to stop announcing Facebook’s routing information and ask DNS servers to delete all route information. Cloudflare and Fastly have reported that Facebook’s servers asked them to stop providing route information to inquirers. As a result DNS servers were unable to find a destination IP address for Facebook. Because remote access tools were accessed through facebook.com, network engineers had to physically travel to the data center to make the configuration change.
For everyone: THIS WAS NOT A HACK. At press time, there is no indication that the outage was the result of a network intrusion. According to an employee, a new configuration was pushed to the routing server that caused the issue. Do not believe posts indicating otherwise. At this time, there is no evidence that any customer data was lost or that there was any unauthorized access to Facebook’s network.
2. We can all survive without Facebook
The biggest advantage of Facebook’s outage: we all got some needed time away from the toxicity of social media. Facebook (and other networks) are home to many nasty and bitter arguments. The time away no doubt was beneficial to our mental health. I’m not saying that Facebook and social media are cesspools of hate and nasty, but that they do host this content occasionally. I personally use Facebook to see what’s happening in my community, in the lives of my friends and family, and keep up with topics that I care about through Groups.
Today, I found myself looking at other places for mindless entertainment. I actually checked my email after lunch instead of letting it pile up until evening. I also spent some time outdoors admiring nature. Time away from social media gave me a chance to think about its impact on our lives.
2b. … so we all go to Twitter (Facebook’s effect on the wider internet)
While many enjoyed the time away from Facebook, many took to other means of communication to feed the desire for connection. Many turned to Twitter, where #FacebookDown tweets ran abundant. Twitter even experienced brief difficulties with replying to tweets, although the platform remained operational.
Twitter wasn’t the only service to see issues, as DownDetector, Discord, Reddit, Gmail, and Snapchat also having issues. DNS provider Cloudflare also stated seeing increased traffic for DNS requests for facebook.com.
This just goes to show what impact Facebook has on the wider internet. Every day, millions of people use these web services. But today, the traffic of one of the largest social networks leaked and poured onto fellow websites. In my opinion, it is amazing that these services were able to handle the increased traffic without crumbling themselves. Modern websites should be built to handle sudden increases in traffic without missing a beat. Today, most of the Web proved this capability.
3. The need for more decentralized communication
Facebook isn’t just a social network for posting cat photos and arguing about the latest news issue, its messaging products also connect people across the world every day. Personally, I use Facebook Messenger to contact family and friends every single day. It had never fully occurred to me what a Facebook outage would mean for this communication- that it would come to a screeching halt. Thankfully, I was able to reach out to my family via SMS.
Unfortunately, this isn’t always the case. I have a handful of contacts that I have no contact info for other than Facebook or one of its products. The same can be said for other countries where WhatsApp is the de facto messaging app. As The Verge highlights, inquires and payments for businesses across the globe went dark today because of this outage. Facebook has a huge burden- connecting a sizable portion of the planet with millions of messages every day. This outage was not a result of capacity- Facebook seems to be able to handle that fine. But- think how many messages were missed today. Messages from potential customers, messages between family members miles apart, or messages between friends asking when to meet up for coffee. It’s amazing how much the world relies on one company to talk to each other. I suspect that many users found alternative methods of communication (as The Verge highlights in their article – link below), however I suspect many did not.
There have been calls for more decentralized internet resources for years now, but I don’t think many realize what drawbacks that kind of system would bring. SMS is a good example of a decentralized system. It is not controlled by one single carrier, rather a standard that every carrier uses. SMS is an effective communication tool- but it lacks key features of web-enabled systems. SMS was not designed for the smartphone era- meaning that it doesn’t (fully) support features such as read receipts, rich presence (typing indicators, online/offline), high-quality media, multi-device support, and reliability. Personally, I’ve had issues with SMS messages being delivered out of order. This could be a problem with my smartphone, but it highlights a fundamental issue with SMS. Let’s say your phone is having trouble locking on to a network connection. You may never receive a message, or receive it out of order- particularly problematic in group conversations. Systems that use the internet have a system of record to fall back on. When your device has a reliable connection to the Internet, it can download the messages in groups, but fully and in the correct order. Or, if your phone is having issues but your laptop is unaffected- you can log in to the desktop version of the app to see what messages you’ve missed. (this is true for only a few of the most popular messaging platforms.)
My point is that we have mostly moved on from unreliable SMS, but are the web-based services just as unreliable?
4. The fragility of technology – Have a backup
Continuing from the last point, technology is incredibly fragile. It is stunning how one small mistake can cause an issue for millions of people across the globe. According to a Facebook employee, the outage was caused by a small configuration issue made to a server that makes the facebook.com domain available to other sites. (more on that above)
The point of this section: technology is extremely fragile. We all experience small glitches in technology every day, but we usually think that large internet companies are immune from glitches, but that is false. Web services experience issues all of the time, but most of them are covered by redundancy. A special server called a load balancer can dynamically allocate traffic around servers that aren’t working or are having issues. The largest companies have multiple data centers serving traffic 24×7, 365. Because of this level of redundancy, it appears that they have no issues. Sometimes, the issues will get out of hand and cause a big problem, but most of the time we don’t notice at all.
In the case of Facebook, it looks like the part of their network that caused this issue was not redundant, or the problem got out of hand and overcame any redundancy. But, the bigger problem for Facebook is that their internal teams lost all access to resources to fix the issue. According to employees, their internal tools were all down as they were hosted at the facebook.com domain. Employees were unable to log in to communication tools such as Zoom to coordinate a fix. They were able to email each other, but not receive emails from outside the company. Engineers also lost remote access to their servers.
My point here is that it is important to have plans in the case that things go wrong. It is also important to think about the future, to plan for the worst. I’m sure someone at Facebook has thought about this exact incident happening, but yet the company didn’t seem to be ready. It’s impossible to plan for every situation, but when a large percentage of the world relies on your services, it’s important to be ready for the absolute worst that you never think will happen.
5. What’s next after social media?
The time away from social media gave me an opportunity to think about the future of social media. Have social media networks peaked? What comes next?
To answer the question quickly, I don’t think anyone knows what comes next. I can’t see Facebook or any social media platform going away any time soon.
In a previous post on this blog, I remarked that email newsletters were making a comeback, and attempted to write one of my own. At the time, I didn’t understand why they were popular again. I have since learned that the theory behind their rise is algorithms. The content you see on any social media platform is governed by an algorithm that learns what content you like or don’t like. It then serves you the relevant content based on your history. I am not here to discuss the controversies of algorithms. Algorithms are a controversial topic at the center of the general debate about social media. I am not willing to enter into a debate on their use in this form- ask me about that in person. I will say: social media without an algorithm would be incredibly, mind-numbingly boring. Imagine social media as a list of content. Every time you visit the social network, you see the same content. No discovery. No “Suggested for you” posts. However creepily accurate these suggestions may be, they certainly enhance the experience. There are several things I’ve discovered through these suggestions- news websites that I have learned a lot from.
Back to email newsletters (sorry for the aside…)- they don’t use algorithms. To receive an email newsletter, you must deliberately sign up for it. They are not suggested to you unless you find out about them through social media platforms. This means that you can control what content you see.
Personally, I don’t see email newsletters taking over social media’s throne but complimenting it quite nicely.
NOTE: I am not discussing the recent controversy surrounding Facebook including the 60 Minutes segment. I aspire to only focus on the technical repercussions of the situation.
Resources / Further Reading:
- Locked out and totally down: Facebook’s scramble to fix a massive outage – The Verge
- What is BGP, and how might it have helped kick Facebook off the internet? – The Verge
- Twitter, Is It Down, and tons of other sites were struggling due to the Facebook outage – The Verge
- Losing Facebook is bad, but losing WhatsApp is worse – The Verge
- Facebook is coming back after a six-hour outage – The Verge
- Twitter is up, Facebook is down, and Jack Dorsey is laughing – The Verge
- What took Facebook down – ZDNet
- Facebook Down – Google News search (further reading)
Thanks for reading this far!
I appreciate you reading this far into the article! If you liked it, let me know by leaving a comment below.
If you want more posts from me, consider signing up for my email list to get an email when I have something new.
If you’d like to get in contact with me, please do so by visiting https://contact.themattdgreen.com/