Skip to content

Move DDL Operations to SQL Functions

  • Name: Move DDL operations to SQL functions
  • Status: Complete
  • Review status: Approved
  • Theme: Code quality, maintainability, performance, Removing SQLAlchemy

Team

Role Assignee Notes
Owner Brent
Approver (project plan) Kriti Needs to approve project plan
Approver (backend) Brent Needs to approve back end spec
Contributor (requirements) Brent Creates product spec, requirements, GitHub issues
Contributor (requirements) Dom Creates product spec, requirements, GitHub issues
Contributor Anish Coding and reviewing
Contributor Dom Coding and reviewing
Contributor Mukesh Coding and reviewing

Problem

Data Definition Language (DDL) operations are those that manipulate the actual data model on the database. Some relevant SQL words are CREATE, ALTER, and DROP. These operations require knowledge of the database to do their work. E.g., a function must know the name of a table to ALTER it. Our current architecture requires reflecting the state of the database into memory in Python, then manipulating that state’s representation in Python, then stamping that representation back down onto the database.

Our current setup for this is:

  • Inefficient (reflection is slow)
  • Complicated (hard to maintain)
  • Prone to bugs (managing state in Python memory is constantly tripping us up)

All of these problems are related to the fact that we’re building the SQL queries to run DDL operations in Python.

Solution

Create DDL functions in database

Create a function for each desired DDL operation on the database using SQL or PL/pgSQL.

  • Each such function should be overloaded to have the signature needed for calling from Python with minimal fuss.
  • Each such function should have a main implementation which uses the most reasonable signature for the task at hand.

Replace Python DDL functions with wrappers of DB functions

Replace the current Python functions performing DDL operations with thin wrappers for these functions.

  • Be mindful of looking out for functions which may be deleted, rather than replaced, once this is done.
  • Map the original Python function signatures to an appropriate function call of the database functions.
  • It’s completely fine to create scaffolding functions at this point to avoid letting changes sprawl.
  • After this phase, no SQLAlchemy imports should be used in any module whose functions are modified in this way, i.e., DDL operation modules.

Refactor and clean up results

Refactor to remove SQLAlchemy objects from calls using Python DDL functions:

  • Remove any SQLAlchemy objects from DDL function signatures (This may require modifying callers slightly)
  • Remove SQLAlchemy from the entire call stack calling a given function, all the way up to the API (within reason).
  • Modify affected function signatures to avoid using schema_name, (schema_name, table_name), or (schema, table_name, column_name) identifiers. Instead, prefer schema_oid, table_oid or (table_oid, attnum) identifiers (may require modifying callers slightly, or scaffolding).
  • Delete any unneeded functions.

Risks

  • This is a major overhaul of the codebase. There’s always a possibility of unforseen problems
  • This will probably make the codebase less approachable for outsiders.
  • The testing may be trickier (though current prototyping didn’t require much changing of tests at all).

Resources

Meta-issue tracking this project

Timeline

Note: Parts of this timeline are delayed due to Brent’s parental leave.

Date Outcome
2023-03-20 Work starts
2023-03-24 Implementation spec and prototyping complete
2023-03-31 Implementation spec approved
2023-07-24 All needed DDL SQL Functions written
2023-07-26 All thin python wrappers written
2023-08-02 Refactor and clean up complete