SQL Specification
Query Structure
In general, a Burroughs query takes the following form:
[with <with_list>]
select <field_list>
from <from_clause>
[where <condition>]
group by <field_list>
[having <condition>]
[limit <integer>]
The with_list consists of an arbitrary number of comma-separated named subqueries.
A subquery is a second query (with out a with list) that is enclosed in parentheses and can be named with the as
keyword.
The field_list is a comma-separated list of fields. Fields can be renamed using the as
keword, wrapped in an arbitrary number of function calls, and identified using the table_alias.field_name
syntax. In the case of the group by
clause, the field list can contain integers which reference the position (starting with position 1) of an entry in the select list.
The from_clause can be either a topic name, common table expression name, or an arbitrary number of joins, where each join operates on any combination of topics, subqueries, or common table expressions. The allowed joins are inner
, left outer
which can be abbreviated to left
, and full outer
which can be abbreviated as full
. The join condition must be an equality comparison. Any object referenced in the from clause can be aliased using the as
keyword.
Limitations
Group By
First of all, you may have noticed that the group by clause is not optional: each outer-level query must have a group by
. Subqueries on the other hand, may not have a group by clause, although common table expressions can. When a group by is included within a common table expression, the select list may not include the aggregate functions min
or max
and the group by can only contain a single field.
Order By
Because Burroughs is a stream processing application and sorting is inherently a blocking operation, Burroughs does not support any kind of order by
. This also means that any limit clause applied to Burroughs will be non-deterministic.
Data Types
Since Burroughs requires all of its data items to be serialized using Avro, the range of data types supported is much smaller than the typical relational database. The full list is as follows:
- INTEGER (32-bit integer)
- BIGINT (64-bit integer)
- BOOLEAN
- DOUBLE
- STRING
KsqlDB also supports the compound data structures array, struct, and map, but Burroughs does not support these because they don't have obvious equivalents in standard SQL.
Dates
More likely than not, you are going to want to work with dates in Burroughs. A date can be encoded as a BIGINT (days since the epoch) and Burroughs provides convenient ways to work with these integer dates and display them. See the examples section for the details on how to do this
Supported Functions
Aggregate Functions
Burroughs supports the following SQL aggregate functions:
count
sum
avg
min
max
It also supports the following KsqlDB aggregate functions
earliest_by_offset
- returns the value in the column with the smallest offsetlatest_by_offset
- return the value in the column with the largest offest
Finally, Burroughs supports one additional aggregate function:
group_concat([distinct] field [, seperator])
which will return all of the values or distinct values of a given field concatenated together with the given separator.
Scalar Functions
In general, Burroughs supports all of the standard KsqlDB scalar functions which don't operate on an unsupported type. Burroughs also extends the cast
function to work with dates.