February 2022 meeting notes¶

Asynchronous discussions¶

2022-02-23¶

Discouraging Money type and directing user towards Number type with Currency display option¶

Added by: Pavish
Summary:
- The design spec: Update number type to support custom currency format introduces a Currency display option for Number. We also have a separate type for Money, which is restricted to one currency based on server locale.
- Having two different ways to represent monetary values might be confusing for users. Based on the current UX, users are bound to first use the Money type, which we’d like to discourage.
- Relevant discussions and potential solutions are captured on the PR for the design spec
- We’d like this discussion to yield a solution we’d want to follow for the alpha
Participants: Kriti, Ghislaine, Brent, Pavish

Notes¶

Create DB type for Money that’s just an alias to Numeric
- Move currency formatting options to it
The only time we’ll use PostgreSQL money type is if custom type is not installed
We will show the PostgreSQL money type if there’s existing column
- If people want to switch from an existing PG money type to our type, they have to switch to number and back to money
- We need a design for a warning for existing money type columns
Which display options to use will be discussed as part of spec review
previous custom money type will be renamed to multicurrency

Next steps¶

Brent will make backend ticket and work on it
Kriti will update design ticket
Ghislaine will do a second round of the spec and re-request review

Query Builder & Formulas¶

Added by: Kriti
Summary: I’d like to go through the query builder flow and how formulas will work in Views. I’m still figuring out how to represent formulas and I’d like to talk through the following questions:
- Should we have a textual representation of formulas or should they be GUI form only?
- For GUI form based formulas, how do we allow users to do math or combine AND/OR/NOT logic for booleans?
- For textual formulas, How do we represent links between tables in formulas?
  - e.g. if there are multiple ways to get to a table from another table, which way do we use?
- Read beforehand:
  - The Query Builder
  - Formulas
- Reference for complicated relationships: Example Schema
  - e.g.
Participants: Kriti, Ghislaine, Brent, Pavish, Mukesh, Dominykas, Sean

Notes¶

Questions from Brent:

Will we present foreign keys as directional?
- Yes, it’s been discussed
filters: WHERE, HAVING, or both?
- All current filter mentions refer to WHERE.
- It might be confusing to users to have HAVING filters, filtering output
- Maybe different words for filtering input and output
- Maybe show output filters within the summarization section so they know those filters are part of the summarization
Aggregation: Are we not planning to give users the ability to look at proper joins? I.e,. will they always only be able to get 1 row per reference table row?
- We’ll need to support this for “putting back a de-normalized spreadsheet” use case
- Product requirements should be updated to include “single row per unique pair” as an aggregation option
Reference Table: When starting from a formula, what if the function takes columns from different tables?
- We’ll need to pick a table that’s used in the formula
- For now, we’ll always be working from the perspective of a single table.
Generally: How will we handle / present order and its effect on output?
- Trying to make it flexible for the user, not depend on order too much.
- Implementation should encapsulate ordering problems (for the UI API)

Potential UX issues:

What happens if the user deletes the first column? It’ll change the entire query.

Backend implementation:

Trouble picturing how it’ll come together
Backend brainstorming call after design is finalized
We can talk about whether we can put stuff off after alpha for now

Editing View Data:

Editing lists
Editing direct representation

Editing View Definition:

We’ll figure out how much we can do here once we’ve figured out backend implementation
- Ideal outcome is pre-filled query builder
People can edit it once we have a SQL editor, but that’s not for the alpha.
API will let frontend know if view definition is editable

General observations:

Might have been easier to start with the query builder before filter/sort/grouping
It’s hard to think through complex use cases at the beginning
The most difficult thing should be parsing

Formula feedback:

Formula currently covers everything that’s computed, each of which may need different UX
List shouldn’t be only a formula, it should be the default aggregation of multiple records. Should it be encapsulated as a formula or should we have a separate name for it?

Formulas - GUI vs. Textual:

We shouldn’t do textual formulas, it’s just dumbing down SQL
We should stick to GUI
Maybe a specialized UI for math alone?
- We can allow people to enter mathematical operations as text
  - Need to parse column references
  - maybe domain specific parsing for columns (e.g. @ or $)
- possibility: interface to import columns you’re interested in and name them
- Other option is formula pipeline and make addition etc. formulas
  - People would probably hate that
Multiple formulas - avoid for the alpha since order matters

Long term query builder features:

Could we represent the query builder as a graph in the long term for a more powerful experience?
We shouldn’t try to make the GUI query builder work for all use cases

Next Steps¶

Kriti will refine formulas further and open a GitHub discussion

Which column identifier should we use?¶

Added by: Mukesh
Summary: Based on Issue #839, we have planned to make changes to way column’s are referenced/accessed at the db layer. The options are:
Attnum - Stable Identifier which needs to be reflected and converted into column name when doing an operation on a column
Column Name - Last piece of information used by most sql operations but Prone to changes, as any operation to column name will invalidate the reference. Suggestions
- Brent suggests using attnums. Since we would have lots of DDL operations, by using a stable identifier we don’t have to worry about functions that alter a column name.
- Mukesh suggests using column names. As the column names are derived from attnum and number of functions that alter a column name are very few, we can be careful when dealing with such functions and use conventions to get around it. This would make the column access same as sqlalchemy and reduce any unwanted overhead
Participants: Mukesh, Brent, Kriti, Dominykas

Notes¶

Concerns with using attnums:

Not the way SQLAlchemy does things
- SQLAlchemy is not created for DDL, it’s optimized for DML
- SQLAlchemy uses attnum under the hood
Large codebase change for only a small number of column alteration operations
- More queries
- Might make things harder for new contributors to know about attnums

Using attnums will probably actually make things easier for contributors since it’ll be a single context-less pattern across the entire codebase and doesn’t need the contributor to think about whether the column name is being altered

Another option could be passing column objects to functions. Column name is fetched from the attnum internally.

Performance hit is still there, but we’re not sure if the performance hit is significant. In the long run, we could optimize this using the database or use a long term cache.

We also noted that users directly using the db module will need to use attnums, which might be harder for some users. But we’re not prioritizing db module usability, if we release it as its own module, we’ll consider it then.

Safety is

What about usage of tables.columns[column_name]?

If we adopt the convention of only calling one SQLAlchemy function per Mathesar function, it’s fine to use this.

Refactor priorities: 1. Make sure index is gone (replace with attnum if possible) 2. Make sure top-level (API-facing) functions take attnums as args 3. Make sure public functions take attnums as args 4. Reduce functions to one SQLAlchemy operation per function (makes taking attnum args easier)

We also need to do this for tables and oids.

Next steps¶

Use attnums, either directly or through column objects.
Refactor the entire codebase for priorities 1 and 2 right now.
Make issues for 3 and 4
- We’ll reevaluate priority of 3 and 4 later.

2022-02-16¶

UX for failure to save a cell¶

Added by: Sean
Summary: To implement #775, I need some clarification on our desired UX for the situation where a user tries to create a blank record within a table that does not specify a default value for a NOT NULL column. We’ll likely want to reuse this UX pattern in situations where the user edits an existing cell and enters invalid data (which is arguably simpler to consider, though not a pressing open issue). I’d like to take a quick stab a clarifying this UX synchronously, or figure out the next steps towards doing more async design work if needed.
Participants: Sean, Ghislaine, Pavish, Kriti, Dominykas

Notes¶

We need cell-level errors for editing
- Proposal: remove highlighting entire row, only highlight error cell
- This is fine, but we should have row highlighting in some way, the row number instead of the entire row
- Errors take precedence over save state in terms of colors
We need different errors for adding new records
- We can use the same pattern for displaying individual errors
- We’ll still have the whole row red since nothing would be saved at this point

Outcome¶

Sean will implement front end changes to match the following specs:

(A) After editing an existing cell results in a validation error (either client-side or server-side):
- The cell background will be light red.
- A darker-red triangle will appear in the upper right corner of the cell. Hovering or clicking on the triangle will display an error message popover.
- The row header (which displays the row index and deletion checkbox) will also have a light red background to better surface the error and help guide users towards thinking of their data in terms of records.
  
  Note: for the time being (and due to task prioritization), the row header will not display the triangle or display any error messages. We may choose to add more functionality here in the future to better surface error messages.
- Other cells which have contributed to the same error will also display the same error style and message (e.g. when a multi-column unique constraint fails.)
(B) After editing an existing cell succeeds:
- The cell background will turn green. No other cells will be re-styled.
(C) After inserting a new record results in a validation error (either client-side or server-side):
- All the behavior from (A) will apply.
- Additionally, all cells in the row will have the light red background to communicate that no data has been saved.
(D) After inserting a new record succeeds:
- All cells will have a green background.

Visual design and improvisation¶

Added by: Sean
Summary: I’d like to clarify the intended specificity of our visual designs. Rough like a wire frame? Pixel perfect like a mockup? Somewhere in the middle? This has been confusing to me thus far, and I’ve gotten mixed messages.
Originating discussion
Participants: Sean, Ghislaine, Kriti, Pavish, Mukesh, Dominykas

Notes¶

There’s a tradeoff between being overly specific and not following the design at all
- It’s taken us a while to get the product and design process collaboration right
- We ended up with designs in the previous specs that were out-of-scope for the design issue, so improvisation was needed from both design and frontend to implement it.
- First step here is to improve the design creation process so that more specific designs can be produced, this is happening already.
  - Using Balsamiq for low fidelity wireframes
Three phases of design:
- Product design: user of goals
- UX design: fields and interactions
- Visual design: colors, fonts
Sean is most confused about visual design
- Ghislaine is not focused on visual design at the moment.
- We’re focused on UX design and visual design - interactions.
Should we treat output from Ghislaine as wireframes?
- No, because there are some visual aspects to the design and we should follow them as closely as possible
- As long as we can follow them closely without compromising future work
- Improvisation should only be done at the visual level
We should eventually consider having UI guidelines and error states guidance for Mathesar clients that outside developers could build
- Document error states associated with wireframes

E2E testing strategies¶

Added by: Sean
Summary: We’ve had some disagreement regarding E2E testing.
- Here Mukesh wanted individual E2E tests to perform more actions and suggested Page Object Model to reduce repetition between tests – whereas Sean wanted E2E tests to be focused on narrow features, using Django to setup the necessary prerequisites.
- Here Mukesh expressed concern about creating too many E2E tests and suggested we prioritize E2E tests for the most common user scenarios – whereas Sean wanted to add E2E tests for some less-common user scenarios (e.g. attempting to add a unique constraint to a column with duplicate entries) with an eye towards prioritizing scenarios that touch a lot of different code.
Prompts for discussion:
- Terminology – We seem to be using the terms “E2E” and “Integration” synonymously. Is that a problem? Do we want to make a distinction?
- E2E test case setup – For one Playwright test which asserts that the user can delete a table row, what process do we want to use seed Mathesar with the data for the user to delete? Use the UI? Or do it through Django?
- E2E test quota – In Write integration tests for some important user scenarios, we’ve agreed on a set of 15 E2E tests to write. After closing that ticket, how should we decide when to write E2E tests and how rigorous we want to be with them?
- Page Object Model – Mukesh has brought this testing pattern up several times with me. Playwright supports it. Is it something we’d like to invest in soon? Or maybe later?
Participants: Sean, Mukesh, Pavish, Brent, Dominykas

Notes¶

Naming Convention
According to Pavish E2E and integration test would be written in the same fashion on the frontend with the only difference being integration test would mock the backend API, so with regards to the terminology, we should be calling our current testing strategy with playwright as End-to-End testing. And Sean seems to be on the same page
Brent did not want to have a dogmatic approach to the naming convention, but he wanted to have a distinction.
Mukesh wanted to have distinction between integration test and E2E, where playwright should be used for writing E2E test and integration test should be based on jsdom and api mocks
Mukesh expressed concerns that the E2E test is flaky as it has to deal with unpredictable things like http calls, cache, async queues. Moreover as the app adds in additional layer like async queue or a caching layer, the set-up and teardown would become complex and increases both the development/maintenence time as well as the time to run the tests(which won’t be much of a concern, as we run only specific test related to the feature we are working on during development). So we should be writing E2E tests that should test high level features like deleting a row and more specific tests like deleting multiple rows should be done with the integration layer.
- For time being, Mukesh agreed to have integration test written with UI automation using playwright.
Fixture
Mukesh did not want to completely rely on fixtures, rather wanted to use fixtures sparsely. Reason being that fixtures interact with database directly, so if we introduce a intermediatery layer, like say a async queue, our current testing method would not accurately reflect what the user would be doing. For example if we introduce async task, there could be a delay with when the data is actually written to the database, but since the test cases bypasses these intermediate layer, the test could end up passing but the actual UX flow would be failing from user standpoint
According to Pavish, we would be testing the complex layer(cache, async queues) too and other tests would be based on that. So it shouldn’t be a problem.
Mukesh feels like those are the test we should be calling as E2E and the other test should be called as integration test and should not be written using playwright.
Mukesh on the whole agrees with Pavish and Sean that we can use fixtures and write test cases as are writing them now, since the app is fairly simple, provided there is a backlog to revisit and discuss over the strategy once we get the alpha out.
Brent understands the concern Mukesh brought up

Decisions¶

We will use “End to end” or “E2E” (and not “integration”) when referring to the tests that we run via Playwright.
It’s okay to use pytest fixtures which call Django functionality to set up prerequisites for E2E tests. (Mukesh still feels this is not “proper” E2E testing, but accepts this pattern for the time being.)
We didn’t get time to discuss “E2E tests quota” in depth.
We will talk about Page Object Model another time. It’s up to Mukesh to bring this topic up by making specific suggestions for how we would use it.

2022-02-10¶

Topic: Weekly sync meeting

We’re trying out weekly synchronous meetings to resolve things faster. Anyone can add agenda items ahead of time and we’ll cancel the meeting if no agenda items are added 12 hours before the meeting.

Backend plan for Initial Data Types milestone¶

Added by: Kriti
Summary: We have a lot of backend work to complete for the Initial Data Types milestone, and the frontend is catching up to the backend quickly. We should coordinate on how we can split up the remaining items in the most efficient way.
Participants: Brent, Dominykas, Kriti, Mukesh

Notes:

Backend priorities for this week
We removed https://github.com/centerofci/mathesar/issues/648 from the milestone, will replace with a new issue for ensure default values work well with types
Action items:
- Brent: File bug with SQLAlchemy for interval.
- Kriti:
  - update custom money type issue to use display options instead
    - create related design issue
  - create a separate issue for making sure all types have tests for default values
    - See test_alter_column_casts_data_gen in test_cast.py for the current tests on that
Plan:
- Dominykas:
  - All filtering issues
  - All data type API update issues
- Brent:
  - INTERVAL
  - Date & time parsing spec and related updates
  - All grouping issues
- Mukesh:
  - Display options for money
  - Replace column name usage with ID issue
  - Display options for duration
  - Default values testing and fixes
Whoever is done first will start working on the foreign key APIs.

Google Summer of Code project idea brainstorming¶

Added by: Kriti
Summary: I’d like to spend some time brainstorming project ideas for Google Summer of Code. We can also review the existing ideas if time permits.
Participants: Brent, Dominykas, Kriti, Mukesh, Pavish

Notes:

Current Project Ideas
Dominykas likes automation hint reflection and SQL query builder
SQL query builder difficulty is medium
File and image types should be JS-heavy
- Most of the work is the frontend
- Potentially use Django storage for the backend or a more generic Python library offering storage
  - Could only support a couple of storage options in the beginning if no suitable library can be found.
- Can also specify external URLs
- Frontend shouldn’t deal with complicated vendor-specific stuff like AWS tokens yet, use URLs instead
- Difficulty should be High
PostGIS idea could be hard, involves a fair bit of research
- Integrate with different mapping services maybe?
Idea: Exporting data from Mathesar
Idea: Async infrastructure for Mathesar
- Better done by core team, will not add
- We also need caching infrastructure, separately