Comparison of PostgreSQL and MySQL: An In-Depth Look at Architecture and Performance

Comparison of PostgreSQL and MySQL: An In-Depth Look at Architecture and Performance

Choosing a database management system (DBMS) is one of the key decisions when developing applications. PostgreSQL and MySQL are two of the most popular open-source relational DBMSs, each with its own architectural features, strengths, and weaknesses. Let’s take a closer look at their main differences.

1. Architecture: Processes vs. Threads

The fundamental difference between PostgreSQL and MySQL lies in how they handle client connections.

  • PostgreSQL: Uses a "process per connection" architecture. When PostgreSQL starts, it launches a main server process, which then spawns several background worker processes (background writer, checkpointer, autovacuum, WAL writer, statistics collector, log writer, archiver) that interact through shared memory. When a client connects to the database, a *separate* dedicated process (backend process) is created for it. This means each connection is served by its own operating system process.
  pstree -p 43545
  • MySQL: Traditionally associated with a multiprocess model, but the modern mysqld operates as a single-process server using multiple threads. When started, there is usually a wrapper process mysqld_safe (for proper startup on Linux) and the main server process mysqld. However, to handle each new client connection, mysqld creates not a new process, but a new *thread* within its own address space.
  -- Show active connections in mysql  
  show processlist \G

Impact:

  • Resources: Threads are generally "lighter" than processes because they share the same address space, which speeds up context switching. Creating a new process (like in PostgreSQL) is more resource-intensive.
  • Memory Management: Memory management in a multi-threaded environment (MySQL) can be simpler than coordinating memory across multiple independent processes (PostgreSQL).
  • Connection Pooling: The overhead of creating processes/threads makes using a connection pool crucial for both DBMS. PostgreSQL often relies on external tools like PgBouncer. MySQL, thanks to its thread-based architecture, has built-in thread caching mechanisms, allowing it to reuse resources more efficiently after a connection is closed.

2. Query Planner and Optimizer

SQL query processing in both databases goes through similar stages: Parser, Analyzer, Rewriter, Planner, and Executor. However, their optimization approaches differ.

  • MySQL: Historically, MySQL has supported various storage engines, such as InnoDB and MyISAM. To ensure compatibility with different engines, the MySQL planner interacts with the executor via a standardized API. This means the planner doesn't always have full insight into the internal workings and specifics of a particular storage engine.
  • PostgreSQL: PostgreSQL doesn't use the concept of swappable storage engines at the same level of abstraction as MySQL. Its optimizer has direct access to all internal information about data storage structures, indexes, and statistics, without intermediate APIs.

Statistics: A key difference lies in the volume and granularity of the statistics collected by each DBMS to build query plans.

MySQL collects statistics on: PostgreSQL collects statistics on:
:---- :----
Clustered index size Table size
Number of rows Number of rows and number of pages in memory
Data distribution, including null fraction Null fraction
  Average column width in bytes
  Number of distinct values in a column
  Statistical correlation between physical and logical row order
  Data distribution (histograms)
  Most common values (MCVs) and their frequencies

Impact: Thanks to more detailed and diverse statistics, the PostgreSQL optimizer is often able to build more efficient plans for complex queries involving JOINs, aggregations, and analytical functions. This is one reason why PostgreSQL is often considered preferable for analytical workloads (OLAP).

3. Data Storage and Transactionality (MVCC)

Both DBMS use Multi-Version Concurrency Control (MVCC) to manage concurrent access, but they implement it differently.

  • PostgreSQL: Implements MVCC by creating a new version of a row for each UPDATE. The old version of the row is not immediately deleted but is marked as inactive using the system columns `xmin` (the transaction ID that created the row) and `xmax` (the transaction ID that deleted or updated the row). Physically, old ("dead") rows remain in the table until they are removed by the VACUUM process.
    Pros: UPDATE is conceptually similar to an INSERT + marking the old row.
    Cons: Tables become "bloated" due to dead rows, requiring regular cleanup (VACUUM). Queries scanning the entire table might process inactive row versions, reducing performance until cleanup occurs.
  • MySQL (InnoDB): Implements MVCC differently. On UPDATE, the record is modified in place (in the main data structure). Information needed for transaction rollback or providing an old row version to other transactions is written to a separate segment—the undo log. Only the current data versions are always present on disk in the main storage.
    Pros: Table size on disk is more predictable and reflects the actual data volume. No need for a VACUUM equivalent to remove old row versions from the main storage.
    Cons: INSERT operations might be slightly slower than in PostgreSQL, as writing to the undo log is required in addition to inserting the record itself. UPDATE operations also involve writing to the undo log.

4. Garbage Collection (VACUUM in PostgreSQL)

As mentioned above, due to PostgreSQL's MVCC implementation, "dead" rows accumulate. The VACUUM process performs two main tasks:

  1. Reclaiming space: Marks the space occupied by dead rows as available for reuse by new data.
  2. Updating statistics: Collects current statistics for the query planner.
  3. Preventing Transaction ID Wraparound: An important background task for maintaining database integrity.

There is an automatic version—autovacuum—which triggers cleanup as needed based on the number of changes in tables. Effective autovacuum tuning is critical for maintaining PostgreSQL performance. Running it too infrequently leads to table bloat and slower queries; running it too often causes excessive server load.

5. Working with Indexes

Indexes are necessary to speed up data retrieval.

  • PostgreSQL: Offers a rich variety of index types: B-tree (standard), Hash, GiST, SP-GiST, GIN (used for full-text search, JSONB, etc.), BRIN. This allows optimizing data access for various data types and queries. However, PostgreSQL does not support clustered indexes in the way they are implemented in InnoDB (MySQL). The CLUSTER command in PostgreSQL allows physically reordering table rows according to an index once, but the DBMS does not automatically maintain this order during subsequent inserts and updates.
  • MySQL (InnoDB): In InnoDB, tables are always organized as a clustered index (usually by the primary key). This means the physical order of rows on disk corresponds to the logical order of the primary key. This can significantly speed up queries over a primary key range, as related data is located close together on disk. All secondary indexes in InnoDB contain a pointer to the primary key.

Impact: The absence of true clustered indexes in PostgreSQL can affect the performance of range reads by key compared to InnoDB, but the flexibility in choosing index types gives PostgreSQL an advantage in other scenarios.

Conclusion

Both MySQL and PostgreSQL are powerful and mature DBMS, but their architectural differences lead to different performance profiles and operational characteristics:

  • MySQL: Often considered easier for initial setup and management. Its thread-per-connection architecture and InnoDB's clustered indexes can be advantageous for OLTP workloads (frequent, short transactions) and primary key range queries.
  • PostgreSQL: Often preferred for its strict adherence to SQL standards, extensibility, rich set of data types and indexes, and more advanced query optimizer, making it a strong candidate for complex queries, analytics (OLAP), and geospatial data. Its MVCC implementation requires careful attention to VACUUM tuning.

The choice between MySQL and PostgreSQL should be based on the specific requirements of your application, the expected workload, query complexity, and the experience of your development and administration team.

Popular Posts

My most popular posts

Maximum productivity on remote job
Business

Maximum productivity on remote job

I started my own business and intentionally did my best to work from anywhere in the world. Sometimes I sit with my office with a large 27-inch monitor in my apartment in Cheboksary. Sometimes I’m in the office or in some cafe in another city.

Hello! I am Sergey Emelyanov and I am hardworker
Business PHP

Hello! I am Sergey Emelyanov and I am hardworker

I am a programmer. I am an entrepreneur in my heart. I started making money from the age of 11, in the harsh 90s, handing over glassware to a local store and exchanging it for sweets. I earned so much that was enough for various snacks.

Hire Professional CRM developer for $25 per hour

I will make time for your project. Knowledge of Vtiger CRM, SuiteCRM, Laravel, Vue.js, Wordpress. I offer cooperation options that will help you take advantage of external experience, optimize costs and reduce risks. Full transparency of all stages of work and accounting for time costs. Pay only development working hours after accepting the task. Accept PayPal and Payoneer payment systems. How to hire professional developer? Just fill in the form

Telegram
@sergeyem
Telephone
+4915211100235
Email