The evolution of technology is like a slow dance in which most steps are in place, but a few move forward. Motivated by increases of scale and efficiency, applications push the limits of technology and contribute to its advance. This paper presents an example of such an application.
Shasta is a system for interactive reporting of critical business data at Google. Using diverse, large-scale, distributed data, it was developed to satisfy requirements for:
- complex computations to transform large, complex queries to data store schemas,
- low-latency queries that capture recent data store updates, and
- efficient system management of query views.
To satisfy these requirements, Shasta combines new language and system techniques in a four-level architecture stack:
- (1) Relational view language (RVL) compiler to translate parameterized user query views to SQL and to automatically aggregate query results;
- (2) F1 [Google relational database management system (RDBMS)] engine that generates an execution plan for the generated SQL;
- (3) F1 servers and user-defined function (UDF) servers to execute the plan on a central server or distributed servers; and
- (4) Distributed, diverse data stores that balance read versus write optimization using a novel caching scheme.
Shasta provides several benefits over the legacy C++ system it replaced. Views expressed in RVL are more understandable to business users and, using view templates, easier to query than the underlying schemas. Furthermore, by encapsulating view definition in RVL and separating it from query processing, software engineering management of Shasta is significantly improved over that of the legacy system. By providing more support for query planning and distributed execution of query plans, Shasta increases performance two to seven times for medium and large queries. With respect to scalability, as input data increases, query latency growth is sublinear, due to distributed query processing and the data characteristics of the Shasta applications (for Shasta applications, query complexity is largely constant across input sizes and query input size “tends to be determined by view parameters”).
The audience for this paper includes those interested in the application of integrated language and system technologies to improve the usability, performance, and scalability of data-rich Internet-distributed interactive applications. Shasta is an example of an application that pushes the limits of technology and contributes to its evolutionary dance forward.