跳到主要内容

Database design principles

1. Avoid redundancy

  • can lead to inconsistencies: data is updated in one place, but not updated in the other place, sync
  • impact the performance: increase the storage space
  • exception: dimensional schemas used for analytical processing and data warehousing

2. Primary keys and unique identifiers

  • every table must have a primary key
  • it guarantees the uniqueness of each row within the table
  • it is used to establish relationships with other tables

3. Null values

  • only allow the column to support null values if you're certain that at some point the value may be unknown, missing, or not applicable
  • it's important to differentiate null values from empty values, such as zero or empty string
  • primary keys cannot store null values
  • a null value can be applied to columns of any data type (as long as the column supports null values)
  • null values are ignored in unique and foreign key constraints

4. Referential integrity

  • establishing relationships between tables using foreign keys and primary keys

5. Atomicity

  • don't use a single column for complex or compound data, instead use separate columns to represent the atomic data, for example: divide name into firstName and lastName, divide address into state, city, postcode.

6. Normalization

  • normalizing a model is a matter of bringing it to the 3rd normal form (3NF)
  • it helps maintain data integrity, reduces redundancy and optimizes the storage space

7. Data typing

  • use appropriate data types

8. Indexing

  • create index for primary keys and foreign keys (automatically by RDBMS)
  • create index based on use cases
  • create index for fields that are frequently used for querying or filtering data, but avoid creating unnecessary or duplicate indexes
  • consider columns' cardinality before you index. If you create an index for a column with low cardinality, it will rarely speed up queries
  • prioritize narrow columns for index since indexes take up storage space
  • index creation if often part of database maintenance rather than design

Design databases for maximum usability, reliability and maintainability

1. Understand the requirements

  • understand the requirements of the users and business context
  • identify data sources, types, and relationships
  • functional and non functional requirements: the expected queries, transactions, operations, constraints and quality attributes
  • common technique: data dictionary

2.Choose a data model

  • choose a data model that suites the requirements and the the nature of the data
  • a relational model: based on tables, columns, and rows, and is widely used for structural and consistent data

3. Design a schema

  • ERD (entity-relationship diagrams): graphical representations of the entities, attributes, and relationships in the database
  • Normalization: the process of organizing data into tables that minimize redundancy and anomalies
  • Denormalization: the process of combining or duplicating data to improve performance and readability

4. Ensure security and privacy

  • Encryption: the process of transforming the data into a unreadable form, using a key or a algorithm
  • Authentication: the process of verifying the identity of the users using username/password or other credentials
  • Authorization: the process of granting or denying access the data based on the roles and privileges of the users
  • Auditing: the process of recording and monitoring the activities and events in the database, such as queries, transactions and errors
  • Masking: the process of hiding or replacing sensitive data such as personal or financial information with dummy or random values

5. Plan for backup and recovery

  • backup and recovery are essential for ensuring the reliability and maintainability of the database as they prevent or minimize the loss or corruption of data due to hardware failures, software errors, human errors or natural disasters
  • methods: full, incremental, differential backups, restore points, snapshots, replication, mirroring