Worksheets Technical specs¶
Scope:
- This document lays out challenges and potential solutions to worksheets from a technical perspective.
- This document does not concern itself with any UX issues.
Status:
- This document is in early draft state. No decisions have been made yet!
Book List Example¶
We use the following example in a number of places throughout the document:
-
Query:
This is the SQL that is generated by the query section of the worksheet. Exactly how we got this SQL doesn’t really matter here. It could have come from a UI-based query builder or could have been typed out manually by the user.
with genres as ( select book_id, array_agg(genres.id) as genres from books_genres join genres on books_genres.genre_id = genres.id group by book_id ), copies as ( select book_id, count(*) as copies from items group by book_id ) select books.id as book_id, books.title, books.author_id as author, authors.url as author_url, genres.genres, coalesce(copies.copies, 0) as copies from books left join authors on authors.id = book.author_id left join genres on books.id = genres.book_id left join copies on books.id = copies.book_id;
Side note
In Querydown this would be:
With a the worksheets having a modular approach to queries and displays, my hope is that a Mathesar extension could eventually provide a Querydown-based query editor. This would be a really fast way to build powerful worksheets!
-
Raw query results:
This is the raw data returned by Postgres.
book_id title author author_url genres copies 921
'Foo'
5
NULL
{101,102}
10
32
'Bar'
6
'http://example.com'
{}
0
1007
'Apple'
6
'http://example.com'
{103}
2
-
Sheet display:
Here’s (roughly) how the sheet would appear to the user.
book_id title author author_url genres copies 921 Foo Alice Davis
NULL
Adventure
Fantasy
10 32 Bar Bob Scott
http://example.com 0 1007 Apple Bob Scott
http://example.com Non-fiction
2
OIDs vs names¶
🤔 Challenges:
- Do worksheets reference PostgreSQL tables and columns using names or OIDs?
- If we use OIDs, how do we allow the user to edit the SQL using names?
TODO
Query safeguards¶
🤔 Challenges:
- Do we want to prevent users from manually executing DML and DDL in their queries?
- If so, how?
- Are temporary tables okay?
TODO
Record summaries¶
🤔 Challenges:
- Where is the record summary template stored?
- How does the worksheet fetch the record summary data? (We need to handle direct FK columns as well as array aggregates.)
Display settings¶
-
Two types of columns can be configured to display record summaries:
- Scalar FKs: columns with source data having a single-column FK.
- Aggregate PKs: array aggregates of single-column PK values.
-
The record summary can be turned off in the display settings too. When it’s off, the raw PK values will display.
-
By default: record summaries are enabled wherever possible.
In the Book List Example, this is how we get record summaries for
author
andgenres
by default, without getting record summaries forid
. -
Limitations: Record summaries cannot be configured for
- Multi-column FKs (because they point to more than one PK column).
- FK columns that point to non-PK columns.
Record summary templates¶
-
The record summary template is stored in the worksheet’s display settings, and is associated with a column in the worksheet.
ⓘ Note:
Before worksheets, Mathesar associated record summaries with each table so that all FKs which referenced the same table would automatically use the same record summary. With worksheets, it is not (yet?) possible to configure a default per-table record summary template to always be used for references to that table. This behavior simplifies some things, but also has the following consequences:
- In some cases, there might be some more tedium associated with creating new worksheets because you can’t easily reuse a record summary template that you created elsewhere.
- The Record Page can’t show a record summary for the record.
I think these are acceptable tradeoffs though. The benefit of the worksheet approach is that different worksheets can have different record summary templates to refer to the same table.
-
Each record summary template has one and only one “source table” — the table which contains the PK column on which the record summary is operating.
-
The sheet can display record summaries even when no template is explicitly configured by the user. In this case it will be smart enough to auto-generate a template that works well in most cases.
- The auto-generated template is not persisted — it is computed on the fly. This allows us to update the smart logic used to auto-generate these templates.
-
The smart logic works as follows:
TODO
-
In the Book List Example:
-
By default the “author” column shows a record summary and uses smart logic to formulate the following template on the fly:
(The source table for this record summary is
authors
, so column references pertain to columns in that table.)The worksheet is then customized to store this improved template, which is persisted within the worksheet:
-
The “genres” column also shows record summaries by default. With no template being explicitly configured, the sheet is smart enough to use the following template:
The source table for this record summary is
books_genres
which does not have aname
column. But the template is able to use dot notation to refer to a field in the directly relatedgenres
table.
-
-
The precise syntax of the record summary template still needs a bit more specification with regard to quoting and escaping.
Fetching record summary data¶
Here’s how we fetch the record summary data for the Book List Example.
-
When the search query runs, the service layer stores the results in a temporary table.
-
Before closing the transaction, the service layer issues additional queries using this temporary table. For the sake of this example, we’ll call this temporary table
results
. -
The additional queries look like this:
-
This query fetches the data necessary to render all record summaries within the “authors” column.
select authors.id, authors.first_name, authors.last_name from results join authors on authors.id = results.author group by authors.id
The
first_name
andlast_name
columns are included because they are present in the record summary template. -
This query fetches the data necessary to render all the record summaries within the “genres” column
with ids as (select distinct unnest(genres) as ids from results) select books_genres.id, genres.name from ids join books_genres on books_genres.id = ids.id left join genres on genres.id = books_genres.genre_id
The
genres.name
column is included because it’s present in the record summary template, and we left-join thegenres
table into the query to access that data.
-
DML¶
🤔 Challenges:
- How do we do DML from within a worksheet?
Metadata¶
-
Most of the time, Mathesar is able to use static analysis of SQL to determine metadata about the columns in a query’s result set. This metadata gives us a clear picture of where the columns came from and what their types are. For some queries, Mathesar won’t be smart enough to trace the origin of every column.
-
In the Book List Example, we’d get this metadata:
{ "book_id": { "classification": "pk_cell", "table": "books", "column": "id" }, "title": { "classification": "data_cell", "table": "books", "column": "title", "pk_lookup_reference": [ "book_id" ] }, "author": { "classification": "data_cell", "table": "books", "column": "author_id", "pk_lookup_reference": [ "book_id" ] }, "author_url": { "classification": "data_cell", "table": "author", "column": "url", "pk_lookup_reference": [ "author" ] }, "genres": { "classification": "pk_array", "table": "books_genres", "conditions": [ { "column": "book_id", "equals": { "query_column": "book_id" } } ] }, "copies": { "classification": "aggregate_count", "table": "items", "conditions": [ { "column": "book_id", "equals": { "query_column": "book_id" } } ] } }
-
The column metadata gives the worksheet enough information to formulate
INSERT
,UPDATE
, andDELETE
queries in most cases. -
The primary key column(s) from the origin table must be present in the result set for the worksheet to be able to perform cell updates.
Update examples¶
In the Book List Example, we want to make some changes to the first row…
-
In the title column, we can change “Foo” to “FOO”. This is an
UPDATE
query on theauthors
table. -
In the author and author_url, we can make the same kinds of edits.
-
In the genres column, we can remove “Fantasy”. This is a
DELETE
query on thebooks_genres
table. -
In the genres column, we can add “Sci-fi”. This is a
INSERT
query on thebooks_genres
table. It uses thegenres
PK value from the “Sci-fi” genre the user selected via the record selector. -
In the genres column, we can change “Non-fiction” to “Nonfiction”. This is an
UPDATE
query on thegenres
table.There’s still a lot of UX to figure out here. Here’s one UX idea:
- The user will be given the option to edit any fields referenced within the record summary. In this case the record summary only references one field. But theoretically it could reference multiple. And the record summary could also include static text in the template too. When the user edits, they don’t edit the text exactly as it’s shown in the record summary — they edit the text from a specific field, one field at a time. So then the worksheet gets a new value for
genres.name
. The worksheet submits a request to updategenres.name
. The worksheet doesn’t know thegenres
PK value, but it does know thebooks_genres
PK value and that should be enough to tell the backend to update a specificgenres
record, changing thename
value.
- The user will be given the option to edit any fields referenced within the record summary. In this case the record summary only references one field. But theoretically it could reference multiple. And the record summary could also include static text in the template too. When the user edits, they don’t edit the text exactly as it’s shown in the record summary — they edit the text from a specific field, one field at a time. So then the worksheet gets a new value for
-
From the copies column, we might later implement some UI for editing too! Ideas:
- Hyperlink to a new worksheet (to be opened in a separate tab) which shows all those counted items (with the right filters applied).
- Do something similar in a modal
These approaches would allow users to quickly add/delete records that would in turn affect the total count.
Reactivity¶
🤔 Challenges:
- How do dependent cells refresh when DML changes are made?
Short term solution¶
- Worksheets must be refreshed manually in order to refresh dependent cells.
Long term solution¶
- We use the column metadata to build a primitive reactivity graph that allows the front end to update dependent UI without full refresh in most cases.
Pagination¶
🤔 Challenges:
- How does pagination work if the SQL query defines its own LIMIT/OFFSET?
- How do we apply LIMIT/OFFSET to user-written queries?
- How do we communicate that LIMIT will be automatically applied to user-written queries?
- How do we show the total number of records in the sheet?
Display-centric pagination¶
-
Control over pagination is delegated to the display portion of the worksheet — not the query portion. Each display type must implement its own pagination controls as it sees fit.
-
User-written SQL queries are not permitted to contain LIMIT or OFFSET clauses. If either of these clauses is present at the top level of the query, Mathesar will display an error.
-
The worksheet combines the SQL from the query section with the appropriate LIMIT and OFFSET as specified from the pagination controls.
-
We don’t worry about communicating this auto-applied LIMIT/OFFSET to users up front. If they try to add their own, LIMIT and OFFSET clauses, then they’ll see an appropriate error message and learn how it works at that point.
Record totals¶
- A supplemental query is run by pipelining the user-written worksheet query into
count(*)
. That count data is passed to the front end in the same API response as the query results.
Outline view¶
🤔 Challenges:
- How do we show the total number of items in an outline group without necessarily having all the items present in the result set (due to pagination)?
Example¶
-
Query
-
Sheet display
author Alice Davis (1)
book_id title author 1 921 Foo Alice Davis
author Bob Scott (2)
book_id title author 2 32 Bar Bob Scott
3 1007 Apple Bob Scott
Nested pagination¶
- Each outline group has its own pagination control.
-
The service layer issues one query per outline group with its own limit and offset.
-
These queries can still be batched at the API layer. This way if the user refreshes the worksheet, only one API request is needed to reload all outline groups.
-
These queries could potentially even be batched to make one TCP request to the Postgres server as well. Since they’ll all have the same columns, they could be combined with
UNION ALL
and then separated out afterwards.
-
-
All outline groups are collapsed by default. Some UI exists to collapse and expand all groups together.
-
When the worksheet first loads, only the group headers are fetched and shown. To fetch all those group headers and counts, the worksheet query, as-written, is transformed so as to produce the outline group data. First any specified limit and offset are removed. Then the query is pipelined via CTE to give the group header data.
In the example above, the raw outline group data is fetched via:
-
For super large data sets, the groups themselves are also paginated — hence the “nested” pagination. There are multiple levels of pagination.
-
Per-group pagination controls are placed in the group header with a condensed UI.
-
Sheet-wide pagination of outline group headers is placed in the lower status bar and indicates that it operates on outline groups instead of data rows.
Default worksheets¶
🤔 Challenges:
- How do we allow default worksheets to be set for tables without messing things up? For example, we want to avoid a default which has hidden columns.
Behavior¶
-
A table’s “Standard Worksheet” is the worksheet that Mathesar auto-generates for that table. It cannot be modified by the user.
-
A table’s “Default Worksheet” is the worksheet that opens when a user clicks to browse data in a table. It can be modified
-
A worksheet is eligible to be set as the default only if it has the following deviations from the standard worksheet:
- Per-column display options, including column width
- Column re-ordering
No other modifications are permitted to be done on the default worksheet. The rationale for this rule is that we want the default worksheet to provide an accurate picture of the table’s underlying structure within PostgreSQL.
-
Functionality is also available to revert a default worksheet to the standard worksheet.
Column control¶
🤔 Challenges:
- Should column reordering modify query or modify sheet display options?
- How are new columns added (or not added) to existing worksheets after being added to a table?
- Can a PK column be added to the query (to support DML) but hidden from the sheet (to simplify the view)?
- For more complex queries, is there a way we can auto-generate column aliases in order to have sensible defaults?
TODO