10 frequent PostgreSQL errors and the best way to keep away from them

So much can go fallacious with a PostgreSQL set up. Worse, many issues might lurk undetected as the difficulty builds over a time frame, then all of a sudden strike with a serious impression that brings it to the forefront of everybody’s consideration. Whether or not it’s a obtrusive drop in efficiency, or a dramatic rise in useful resource consumption and billing prices, it’s vital to determine such issues as early as potential—or, higher but, keep away from them by configuring your implementation to swimsuit the specified workload.
Drawing on Percona’s expertise serving to numerous PostgreSQL retailers over time, we’ve compiled an inventory of the commonest errors. Even in case you suppose you’ve configured your PostgreSQL set up the appropriate method, you should still discover this checklist helpful in validating your setup.
Mistake #1: Operating the default configuration
PostgreSQL works proper out of the field, however it’s not very properly configured on your wants. The default configuration may be very fundamental and never tuned for any particular workload. This excessively conservative configuration permits PostgreSQL to run any surroundings, with the expectation that customers will configure it for his or her wants.
The pgtune device affords a subset of configurations primarily based on {hardware} sources and the kind of workload. That’s place to begin for configuring your PostgreSQL cluster primarily based on what your workload wants. Moreover, you might have to configure the autovacuum, log, checkpoint, and WAL (write-ahead log) retention variables.
It’s actually vital that your server is optimally configured for any instant future must keep away from any pointless restarts. So check out all GUCs with the “postmaster” context within the pg_settings catalog view.
SELECT identify, setting, boot_val
FROM pg_settings
WHERE context="postmaster";
That is particularly crucial when establishing a excessive availability (HA) cluster as a result of any downtime for the first server will degrade the cluster and trigger the promotion of a standby server to the first server position.
Mistake #2: Unoptimized database design and structure
This level can’t be emphasised sufficient. I’ve personally seen organizations pay greater than 5 occasions the associated fee they wanted to, merely due to unoptimized database design and structure.
Probably the greatest suggestions right here is to have a look at what your workload wants proper now, and within the close to future, slightly than what is likely to be required in six months to a 12 months’s time. Trying too far forward implies that your tables are designed for future wants which will by no means be realized. And that’s only one facet of it.
Alongside this, overreliance on object-relational mapping (ORM) can also be a serious reason for poor efficiency. ORMs are used to attach purposes to databases utilizing object-oriented programming languages, and they need to simplify life on your builders over time. Nevertheless, it’s crucial that you just perceive what an ORM gives and how much efficiency impression it introduces. Below the hood, an ORM could also be executing a number of queries, whether or not that’s to mix a number of relations, to carry out aggregations, and even to separate up question knowledge. General, you’ll expertise greater latency and decrease throughput in your transactions when utilizing an ORM.
Past ORMs, enhancing your database structure is about structuring knowledge in order that your reads and write operations are optimum for indexes in addition to for relations. One strategy that may assistance is to denormalize the database, as this reduces SQL question complexity and the related joins so that you could be fetch knowledge from fewer relations.
In the long run, the efficiency is pushed by a easy three-step technique of “definition, measurement, and optimization” in your surroundings on your utility and workload.
Mistake #3: Not tuning the database for the workload
Tuning for a workload requires insights into the quantity of knowledge you plan to retailer, the character of the appliance, and the kind of queries to be executed. You possibly can at all times tune and benchmark your setup till you’re pleased with the useful resource consumption underneath a extreme load.
For instance, can your whole database match into your machine’s obtainable RAM? If sure, then you definately clearly would wish to enhance the shared_buffers worth for it. Equally, understanding the workload is essential to the way you configure the checkpoint and the autovacuum processes. For instance, you’ll configure these very in another way for an append-only workload in comparison with a blended on-line transaction processing workload that meets the Transaction Processing Efficiency Council Kind C benchmark.
There are a variety of helpful instruments on the market that present question efficiency insights. You may take a look at my weblog submit on question efficiency insights, which discusses a number of the open supply choices obtainable, or see my presentation on YouTube.
At Percona, we’ve two instruments that may aid you immensely in understanding question efficiency patterns:
- PMM – Percona Monitoring and Administration is a free, absolutely open supply challenge that gives a graphical interface with detailed system statistics and question analytics. Be at liberty to check out the PMM demo that caters to MySQL, MongoDB, and PostgreSQL.
- pg_stat_monitor – That is an enhanced model of pg_stat_statements that gives extra detailed insights into question efficiency patterns, precise question plan, and question textual content with parameter values. It’s obtainable on Linux from our obtain web page or as RPM packages from the PostgreSQL group yum repositories.
Mistake #4: Improper connection administration
The connections configuration seems innocuous at first look. Nevertheless, I’ve seen cases the place a really giant worth for max_connections has precipitated out of reminiscence errors. So configuring max_connection requires some consideration.
The variety of cores, the quantity of reminiscence obtainable, and the kind of storage should be factored in when configuring max_connections. You don’t wish to overload your server sources with connections which will by no means be used. Then there are kernel sources which might be additionally being allotted per connection. The PostgreSQL kernel documentation has extra particulars.
When shoppers are executing queries that take little or no time, a connection pooler considerably improves efficiency, because the overhead of spawning a connection turns into vital in this sort of workload.
Mistake #5: Vacuum isn’t working correctly
Hopefully, you haven’t disabled autovacuum. We’ve seen in lots of manufacturing environments that customers have disabled autovacuum altogether, normally on account of some underlying concern. If the autovacuum isn’t actually working in your surroundings, there may be solely three causes for it:
- The vacuum course of isn’t being triggered, or no less than not as often accurately.
- Vacuuming is simply too sluggish.
- The vacuum isn’t cleansing up lifeless rows.
Each 1 and a pair of are instantly associated to configuration choices. You possibly can see the vacuum-related choices by querying the pg_settings view.
SELECT identify
, short_desc
, setting
, unit
, CASE
WHEN context="postmaster" THEN 'restart'
WHEN context="sighup" THEN 'reload'
ELSE context
END "server requires"
FROM pg_settings
WHERE identify LIKE '%vacuum%';
The velocity can doubtlessly be improved by tuning autovacuum_work_mem and the variety of parallel employees. The triggering of the vacuum course of could also be tuned through configuring scale components or thresholds.
When the vacuum course of isn’t cleansing up lifeless tuples, it’s a sign that one thing is holding again key sources. The culprits could possibly be a number of of those:
- Lengthy-running queries or transactions.
- Standby servers in a replication surroundings with the hot_standby_feedback choice turned on.
- A bigger than required worth of vacuum_defer_cleanup_age.
- Replication slots that maintain down the xmin worth and forestall the vacuum from cleansing lifeless tuples.
If you wish to handle the vacuum of a relation manually, then comply with Pareto’s legislation (aka the 80/20 rule). Tune the cluster to an optimum configuration after which tune particularly for these few tables. Do not forget that autovacuum or toast.autovacuum could also be disabled for a particular relation by specifying the related storage choice in the course of the create or alter assertion.
Mistake #6: Rogue connections and long-running transactions
Numerous issues can maintain your PostgreSQL cluster hostage, and rogue connections are one among them. Apart from holding onto connection slots that could possibly be utilized by different purposes, rogue connections and long-running transactions maintain onto key sources that may wreak havoc all through the system. To a lesser extent, in a replication surroundings with hot_standby_feedback turned on, long-running transactions on the standby might forestall the vacuum on the first server from doing its job.
Consider a buggy utility that opens a transaction and stops responding thereafter. It is likely to be holding onto locks or just stopping the vacuum from cleansing up lifeless tuples as these stay seen in such transactions. What if that utility have been to open an enormous variety of such transactions?
As a rule, you possibly can do away with such transactions by configuring idle_in_transaction_session_timeout to a price tuned on your queries. After all, at all times maintain the habits of your utility in thoughts everytime you begin tuning the parameter.
Past tuning idle_in_transaction_session_timeout, monitor pg_stat_activity for any long-running queries or any classes which might be ready for client-related occasions for longer than the anticipated period of time. Control the timestamps, the wait occasions, and the state columns.
backend_start | 2022-10-25 09:25:07.934633+00
xact_start | 2022-10-25 09:25:11.238065+00
query_start | 2022-10-25 09:25:11.238065+00
state_change | 2022-10-25 09:25:11.238381+00
wait_event_type | Shopper
wait_event | ClientRead
state | idle in transaction
Apart from these, ready transactions (particularly orphaned ready transactions) can also maintain onto key system sources (locks or xmin worth). I’d advocate establishing a nomenclature for ready transactions to outline their age. Say, a ready transaction with a max age of 5 minutes could also be created as PREPARE TRANSACTION 'foo_prepared 5m'
.
SELECT gid
, ready
, REGEXP_REPLACE(gid, '.* ', '') AS age
FROM pg_prepared_xacts
WHERE ready + CAST(regexp_replace(gid, '.* ', '') AS INTERVAL) < NOW();
This gives a scheme for purposes to outline the age of their ready transactions. A cronjob or a scheduled job may then monitor and roll again any ready transactions that stay energetic past their meant age.
Mistake #7: Over-indexing or under-indexing
Absolutely there’s nothing fallacious with over-indexing a relation. Or is there? To get one of the best efficiency out of your PostgreSQL occasion, it's crucial that you just perceive how PostgreSQL manages indexes.
There are a number of kinds of indexes in PostgreSQL. Every has a distinct use case, and every has its personal overheads. B-tree is essentially the most generally used index sort. It's used for major keys as properly. The previous few main releases have seen a variety of performance-related (and debloating) enhancements in B-tree indexes. Right here is one among my weblog posts that discusses duplicate model churns in PostgreSQL 14.
When an index scan is executed on a relation, for every matching tuple, it accesses the heap to fetch each knowledge and visibility data, in order that solely the model seen to the present transaction is chosen. Over-indexing will trigger updates to extra indexes, subsequently consuming extra sources with out reaping the specified advantages.
Equally, under-indexing will trigger extra heap scans, which is able to doubtlessly result in extra I/O operations and subsequently a drop in efficiency.
Indexing isn't just in regards to the variety of indexes you've got on a relation. It's how optimized these indexes are for the specified use instances. Ideally, you'll wish to hit an index-only scan every time, however there are limitations. Though B-tree indexes assist index-only scans for all operators, GiST and SP-GiST indexes assist them just for some operators. See the documentation for extra particulars.
Following a easy guidelines will help you validate that your system is optimally arrange for indexes:
- Guarantee configuration is correctly set (e.g., random web page value is tuned on your {hardware}).
- Verify that statistics are updated, or no less than that the analyze or vacuum instructions run on the relations with indexes. It will make sure that statistics are kind of updated in order that the planner has a greater likelihood of selecting an index scan.
- Create the appropriate sort of index (B-tree, hash, or one other sort).
- Use indexes on the appropriate columns. Don’t neglect to incorporate non-indexed columns to keep away from heap entry. Not all index varieties permit masking indexes, so do verify the documentation.
- Do away with pointless indexes. See pg_statio_user_indexes for extra insights into indexes and block hits.
- Perceive the impression of masking indexes on options like deduplication, duplicate model churns, and index-only scans.
See this wiki web page on index upkeep for extra helpful queries.
Mistake #8: Insufficient backups and HA
HA isn't just about holding a service up and working. It’s additionally about making certain that the service responds throughout the outlined acceptance standards and that it satisfies the RPO (restoration level goal) and RTO (restoration time goal) targets. To match the uptime necessities and the variety of nines you're focusing on, confer with this wiki web page for share calculations.