We wanted to bring to your attention recent latency issues affecting labeler productivity on the Labelbox platform. This postmortem intends to provide you with insights on the solutions in development and rough timelines for their roll out.
Over the past week, we noticed cases when the queueing system experienced degraded performance affecting labeling users’ ability to reserve and label assets. The queueing system that serves asset reservations is a distributed system that uses data row priorities and consensus settings to build queues for projects on an on-demand basis. The process used to incrementally generate the queue is generally fast, typically on the order of milliseconds, but can slow down when projects are large or when complex configurations are set.
The Labelbox team actively monitored the situation and made operational adjustments to minimize impact to customers.
While the situation has improved over the week, we recognize that this is frustrating for our customers. Therefore, we’re taking immediate action to rectify the performance issue with queuing while implementing a sound architectural solution in the near future.
We are implementing the following short-term actions to alleviate labeling queue latency. These actions will be completed within two weeks.
While the steps mentioned above address immediate latency issues, they do not help the platform truly scale to support projects that are orders of magnitude larger.
Over the next month or so, the engineering team will be implementing a data processing pipeline to pre-compute an intermediary representation of the project’s data rows. This materialized view will be structured and maintained to support computationally efficient queries that can generate pages of data rows used for populating the labeling queue.
The ETL (extract, transform, and load) process depicted in the future architecture diagram continuously updates the materialized view as changes to the Project, Datarows, Labels, etc. are processed. Optimized indexes placed on this view will allow the queue generator to quickly fetch data rows to label whenever needed. This architecture effectively decouples the amount of processing required to build a labeling queue from the size of a project, giving the platform the ability to support labeling projects at an even greater scale. We can think of it as a pre-computed “query cache”.
Labelbox is continuously improving platform scalability across the entire product suite, and this is one particular architectural improvement that will specifically address the current queueing latency issues. Thank you for your patience as we continue to scale the platform. We’re here to make Labelbox the most reliable and responsive platform and provide support in any way we can for your continued use.