Facebook Combines OSquery Tool With RocksDB
Facebook reported progress this week toward integrating its open source osquery tool with the open source RocksDB that serves as an embedded database to store and speed up access to data.
Facebook opened up osquery late last year. The social media giant describes the open source project as an operating system instrumentation framework for OS X and Linux. A key component of the osquery tools is a daemon called osqueryd that runs on operating systems that link to low-level APIs to extract metrics on OS settings and performance. The results end up in a key-value store so they can be analyzed in a time series.
Osquery tools also are intended to allow system administrators to use normal SQL statements to query tables built by osquery, providing a query language that spans multiple and incompatible operating systems. Facebook engineers explained that osquery tools "expose an operating system as a high-performance relational database" in order to explore operating system data.
RocksDB, a variant of Google's LevelDB key-value store, is described in a Facebook blog posted on Thursday (April 30) as an "embeddable persistent key-value store for local storage." Facebook engineers said RocksDB is "the foundation for a more traditional, client-server database, but its primary focus is its use as an embeddable storage system."
Among its attributes is the ability to leverage fast storage, Facebook said. Hence, osquery uses RocksDB in its operations as an embedded datastore.
In the blog post, Facebook engineers provided a use-case example focusing on how the osquery daemon is used to schedule queries executed across hyper-scale infrastructure. The daemon aggregates query results over time, they explained. It then generates logs on infrastructure state changes. If a resulting dataset appears worth pursuing, a query can be "scheduled" for, say, every 60 seconds.
When the daemon executes a query, it checks to see if previous results of that query are already stored in RocksDB. "If there is no data, osqueryd will store the results and emit all the rows as having been 'added'," Facebook said. "If previous results already exist in the database, osqueryd will compare the two datasets and emit differential results."
In some cases, the engineers noted, no query results will be generated. Facebook said its solution to that problem is "event-based monitoring" system, noting that there are at least two reason why administrators need to react to operating system events: "First, if data changes between our query schedule and then changes back, we won't be able to monitor that change effectively. Secondly, repeatedly polling the same resource, which doesn't change often, is inefficient," they said. "We can do better."
Ultimately, the daemon can be used to schedule a query on a "hardware_events" table in RocksDB. The results, the Facebook team said, are accurate logs indicating the addition, removal or a change in hardware on host infrastructure.
Facebook's released its most recent version of osquery, version 1.4.4, in March.
Related
George Leopold has written about science and technology for more than 30 years, focusing on electronics and aerospace technology. He previously served as executive editor of Electronic Engineering Times. Leopold is the author of "Calculated Risk: The Supersonic Life and Times of Gus Grissom" (Purdue University Press, 2016).