Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Shasta: interactive reporting at scale
Manoharan G., Ellner S., Schnaitter K., Chegu S., Estrella-Balderrama A., Gudmundson S., Gupta A., Handy B., Samwel B., Whipkey C., Aharkava L., Apte H., Gangahar N., Xu J., Venkataraman S., Agrawal D., Ullman J.  SIGMOD 2016 (Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, Jun 26-Jul 1, 2016)1393-1404.2016.Type:Proceedings
Date Reviewed: Nov 30 2016

The evolution of technology is like a slow dance in which most steps are in place, but a few move forward. Motivated by increases of scale and efficiency, applications push the limits of technology and contribute to its advance. This paper presents an example of such an application.

Shasta is a system for interactive reporting of critical business data at Google. Using diverse, large-scale, distributed data, it was developed to satisfy requirements for:

  • complex computations to transform large, complex queries to data store schemas,
  • low-latency queries that capture recent data store updates, and
  • efficient system management of query views.

To satisfy these requirements, Shasta combines new language and system techniques in a four-level architecture stack:

(1) Relational view language (RVL) compiler to translate parameterized user query views to SQL and to automatically aggregate query results;

(2) F1 [Google relational database management system (RDBMS)] engine that generates an execution plan for the generated SQL;

(3) F1 servers and user-defined function (UDF) servers to execute the plan on a central server or distributed servers; and

(4) Distributed, diverse data stores that balance read versus write optimization using a novel caching scheme.

Shasta provides several benefits over the legacy C++ system it replaced. Views expressed in RVL are more understandable to business users and, using view templates, easier to query than the underlying schemas. Furthermore, by encapsulating view definition in RVL and separating it from query processing, software engineering management of Shasta is significantly improved over that of the legacy system. By providing more support for query planning and distributed execution of query plans, Shasta increases performance two to seven times for medium and large queries. With respect to scalability, as input data increases, query latency growth is sublinear, due to distributed query processing and the data characteristics of the Shasta applications (for Shasta applications, query complexity is largely constant across input sizes and query input size “tends to be determined by view parameters”).

The audience for this paper includes those interested in the application of integrated language and system technologies to improve the usability, performance, and scalability of data-rich Internet-distributed interactive applications. Shasta is an example of an application that pushes the limits of technology and contributes to its evolutionary dance forward.

Reviewer:  J. M. Perry Review #: CR144953 (1702-0151)
Bookmark and Share
  Editor Recommended
Featured Reviewer
Query Processing (H.2.4 ... )
Marketing (J.1 ... )
Relational Databases (H.2.4 ... )
Software Architectures (D.2.11 )
Database Management (H.2 )
Would you recommend this review?
Other reviews under "Query Processing": Date
A correction of the termination conditions of the Henschen-Naqvi technique
Briggs D. Journal of the ACM 31(4): 711-719, 1984. Type: Article
Sep 1 1992
A compression technique to materialize transitive closure
Jagadish H. (ed) ACM Transactions on Database Systems 15(3): 558-598, 1990. Type: Article
Oct 1 1992
Efficient and optimal query answering on independent schemes
Atzeni P. (ed), Chan E. Theoretical Computer Science 77(3): 291-308, 1990. Type: Article
Nov 1 1991

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy