Without getting into too much detail, a Websocket is a method of network communication that is used for real time applications. We use Websocket for real time communication and interaction, and these are the modules that uses the Websocket service:
We have two Websocket providers, Google Firebase (realtime database) and our own implementation (Native Websocket). Today we had a large amount of users connecting at the same time and this caused the Websocket servers to halt. Google Firebase couldn’t scale fast enough and the Native Websockets couldn’t handle the scale either. The issue resulted on users having the “Connecting” popup showing up and never disappearing.
We had a major outage with our Caching server (Redis) that caused the entire platform and backend to go offline. The Redis server clogged up and couldn’t handle the server scaling and load, and this resulted in an overall failure of the platform. The landing page and login page were still operational.
For Native Websockets we have implemented a manual scale for now and we will work on the autoscaling mechanism to support a large load in the future. If the connection fails, you will still be able to use the Virtual Lobby normally, with limited interaction.
For Google Firebase we couldn’t implement a fix. We will try sharding the entire operation into multiple micro services for different modules (Chat, Q&A, etc), but since they don’t support replicas, it will be hard to scale on large events. If your event has more than 5,000 users, it’s better to use Native Websockets. If the connection fails, you will still be able to use the Virtual Lobby normally, with limited interaction.
For Caching Server (redis) we are still implementing a fix but we did deploy a temporary workload that should replace the caching server for now and keep the platform and the backend stable. This is an internal fix and shouldn’t affect the user experience.
The platform backend and all its modules should be operational. In case Native Websockets or Google Firebase fails, you will still be able to access the platform and the Virtual Lobby, but users will have a limited experience without realtime interactions – chat, Q&A and the other modules listed above will not be operational.
We are constantly working on improvements and we will announce when we have both realtime Websockets fully functional for large events. Meanwhile, we can guarantee that the backend and the Virtual Lobby will be online – even in case of limited realtime experience.