Restore a postgres cluster from files

All we need is an easy explanation of the problem, so here it is.

Not a postgres pro and the DBs in question are not that important (personal NAS), but being a technically minded person and knowing MS SQL / MySQL fairly well I thought that maybe I can give this a go.

I backed up the complete database folder via rsync -a to another HDD. And yes, I know this is not the way to go about things, but yet I did.

After moving the database folder back, the postgres instance (cluster? whatever the name) seems to be broken.

I guess the OIDs of the database are out of sync (or whatever the correct name)? See the following:

postgres=# \l
 caldav         | unknown (OID=16728) | SQL_ASCII | C       | C     |
 calendar       | unknown (OID=16726) | SQL_ASCII | C       | C     |
 mediaserver    | MediaIndex          | SQL_ASCII | C       | C     |
 notestation    | unknown (OID=16730) | SQL_ASCII | C       | C     |
 ong            | unknown (OID=16734) | SQL_ASCII | C       | C     |
 photo          | unknown (OID=16732) | SQL_ASCII | C       | C     |
 postgres       | postgres            | SQL_ASCII | C       | C     |
 synocontacts   | Contacts            | UTF8      | C       | C     |
 synodrive      | postgres            | SQL_ASCII | C       | C     |
 template0      | postgres            | SQL_ASCII | C       | C     | =c/postgres          +
                |                     |           |         |       | postgres=CTc/postgres
 template1      | postgres            | SQL_ASCII | C       | C     | =c/postgres          +
                |                     |           |         |       | postgres=CTc/postgres
 video_metadata | unknown (OID=16735) | SQL_ASCII | C       | C     |

or via select:

postgres=# SELECT * FROM pg_database
postgres-# WHERE datistemplate = false;
 postgres       |     10 |        0 | C          | C        | f             | t            |           -1 |         11814 |          179 |          1 |          1663 |
 mediaserver    |  16384 |        0 | C          | C        | f             | t            |           -1 |         11814 |          179 |          1 |          1663 |
 calendar       |  16726 |        0 | C          | C        | f             | t            |           -1 |         11814 |          179 |          1 |          1663 |
 caldav         |  16728 |        0 | C          | C        | f             | t            |           -1 |         11814 |          179 |          1 |          1663 |
 notestation    |  16730 |        0 | C          | C        | f             | t            |           -1 |         11814 |          179 |          1 |          1663 |
 photo          |  16732 |        0 | C          | C        | f             | t            |           -1 |         11814 |          179 |          1 |          1663 |
 video_metadata |  16735 |        0 | C          | C        | f             | t            |           -1 |         11814 |          179 |          1 |          1663 |
 synodrive      |     10 |        0 | C          | C        | f             | t            |           -1 |         11814 |          179 |          1 |          1663 |
 ong            |  16734 |        0 | C          | C        | f             | t            |           -1 |         11814 |          179 |          1 |          1663 |
 synocontacts   |  18876 |        6 | C          | C        | f             | t            |           -1 |         11814 |          179 |          1 |          1663 |

When trying to change to any of the databases I get a cache lookup failed:

postgres=# \c caldav
FATAL:  cache lookup failed for database 16385
Previous connection kept

Frankly pretty much only template0, template1 work. When checking the /base folder the folder names don’t match with the OIDs I see via \l:

--- /.../@database/pgsql/base --------------------------------------------------------------------------------------------------------------------------------------------------------
                         /..
  641.0 MiB [##########] /16403
   79.9 MiB [#         ] /16397
   29.6 MiB [          ] /16385
   21.8 MiB [          ] /18877
   12.9 MiB [          ] /16400
    9.6 MiB [          ] /31870
    7.7 MiB [          ] /16736
    7.4 MiB [          ] /16729
    7.4 MiB [          ] /16394
    6.9 MiB [          ] /16399
    6.8 MiB [          ] /17940
    6.7 MiB [          ] /16731
    6.6 MiB [          ] /16398
    6.1 MiB [          ] /16744
    6.1 MiB [          ] /16401
    6.0 MiB [          ] /11819
    5.9 MiB [          ] /16402
    5.9 MiB [          ] /23529
    5.9 MiB [          ] /16733
    5.9 MiB [          ] /11814
    5.9 MiB [          ] /1
    5.8 MiB [          ] /16727
e   0.0   B [          ] /pgsql_tmp

But as you can tell the folders with the database files contain data.

I also believe the users seem to be "of" since they’re all not member of anything.

postgres=# \du
                                      List of roles
         Role name          |                   Attributes                   | Member of
----------------------------+------------------------------------------------+-----------
 AudioStation               |                                                | {}
 CardDAV                    | Create DB                                      | {}
 Contacts                   | Create DB                                      | {}
 MediaIndex                 |                                                | {}
 NoteStation                |                                                | {}
 PhotoStation               |                                                | {}
 SynologyApplicationService | Create DB                                      | {}
 VideoStation               | Create DB                                      | {}
 postgres                   | Superuser, Create role, Create DB, Replication | {}
 synocalendar               |                                                | {}

I’d be happy about any pointers. How do I figure out what databases belong to what folder name / OID? Should I just go ahead and edit pg_database once I figure out the right IDs?

I didn’t try anything else with this cluster yet and I do still have the rsync -a folder tucked away. I just don’t know how to continue now. Aren’t the folders just containing database files? In other DBs I know I could just reattach the files to the new server – I guess not with postgres?


EDIT: I actually copied more than just base:

drwx------ 1 postgres postgres  412 Jul 11 12:57 .
drwxr-xr-x 1 admin    users     452 Jul  5 10:12 ..
drwx------ 1 postgres postgres  230 Jul  5 00:16 base
drwx------ 1 postgres postgres  750 Jul 11 18:00 global
drwx------ 1 postgres postgres    8 Dec 31  2019 pg_clog
-rw------- 1 postgres postgres 4564 Jul 11 12:54 pg_hba.conf
-rw------- 1 postgres postgres 1636 Dec 31  2019 pg_ident.conf
drwx------ 1 postgres postgres   28 Dec 31  2019 pg_multixact
drwx------ 1 postgres postgres    8 Jul 11 12:57 pg_notify
drwx------ 1 postgres postgres    0 Dec 31  2019 pg_serial
drwx------ 1 postgres postgres    0 Dec 31  2019 pg_snapshots
drwx------ 1 postgres postgres    0 Dec 31  2019 pg_stat
drwx------ 1 postgres postgres    0 Dec 31  2019 pg_stat_tmp
drwx------ 1 postgres postgres   16 Jul  2 15:33 pg_subtrans
drwx------ 1 postgres postgres   10 Jul  3 08:32 pg_tblspc
drwx------ 1 postgres postgres    0 Dec 31  2019 pg_twophase
-rw------- 1 postgres postgres    4 Dec 31  2019 PG_VERSION
drwx------ 1 postgres postgres  604 Jul  8 03:20 pg_xlog
lrwxrwxrwx 1 postgres postgres   31 Jul  3 16:05 postgresql.conf -> /etc/postgresql/postgresql.conf
-rw------- 1 postgres postgres   45 Jul 11 12:57 postmaster.opts
-rw------- 1 postgres postgres   88 Jul 11 12:57 postmaster.pid
-rw------- 1 postgres postgres 1094 Jan 13  2020 .psql_history

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Using rsync is fine as long as the database were shut down at the time, which apparently was not the case here. But what were you doing while the rsync was running? Did you start the rsync when the cluster was empty, and then run it very slowly (low --bwlimit) while you also restored large dump files into the system? The amount of corruption you are seeing here just in the pg_user and pg_database tables is baffling and bodes ill for the rest of your data. I would say your best course is to learn your lesson and start over from scratch.

I’m not sure I could create this level of corruption on a still-runnable system if I intentionally set out to do so. There are so many subdirectories in ‘base’ which have no corresponding entry in pg_database, and so many entries in pg_database whose owner has no corresponding entry in pg_user.

folder names don’t match with the OIDs I see via \l:

The OID you see in \l are the OID of owners, not of the databases. It tried to convert the owner OID to the owner name, but couldn’t due to corruption in pg_user, so it shows the owners raw OID instead.

To see the OID of the databases themselves (in your version of PostgreSQL), you would do

select oid, * from pg_database

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply