Postgres (9.6) hangs on start after storage upgrade

All we need is an easy explanation of the problem, so here it is.

I have been running Postgres on a Linode with the data directory mapped to an external volume. Yesterday Linode prompted me to upgrade my storage to NVMe so I did. Unfortunately, following that, Postgres is unable to start.

When I attempt to start the process it just hangs with no output. It is also impossible to stop at that point, implying that it is in an "uninterruptable sleep" state.

I’ve started Postgres with debug enabled and it doesn’t output anything useful (as best I can tell):

2022-03-11 01:39:36 EST [1752-1] DEBUG:  postgres: PostmasterMain: initial environment dump:
2022-03-11 01:39:36 EST [1752-2] DEBUG:  -----------------------------------------
2022-03-11 01:39:36 EST [1752-3] DEBUG:     TERM=xterm-256color
2022-03-11 01:39:36 EST [1752-4] DEBUG:     LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
2022-03-11 01:39:36 EST [1752-5] DEBUG:     PATH=/home/user/bin:/home/user/.nvm/versions/node/v14.17.0/bin:/home/user/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
2022-03-11 01:39:36 EST [1752-6] DEBUG:     LANG=en_ZA.UTF-8
2022-03-11 01:39:36 EST [1752-7] DEBUG:     HOME=/home/user
2022-03-11 01:39:36 EST [1752-8] DEBUG:     MAIL=/var/mail/postgres
2022-03-11 01:39:36 EST [1752-9] DEBUG:     LOGNAME=postgres
2022-03-11 01:39:36 EST [1752-10] DEBUG:    USER=postgres
2022-03-11 01:39:36 EST [1752-11] DEBUG:    USERNAME=postgres
2022-03-11 01:39:36 EST [1752-12] DEBUG:    SHELL=/bin/bash
2022-03-11 01:39:36 EST [1752-13] DEBUG:    SUDO_COMMAND=/usr/lib/postgresql/9.6/bin/postgres -d 3 -D /mnt/project-backup/postgres/project/data -c config_file=/etc/postgresql/9.6/project_db/postgresql.conf
2022-03-11 01:39:36 EST [1752-14] DEBUG:    SUDO_USER=user
2022-03-11 01:39:36 EST [1752-15] DEBUG:    SUDO_UID=1000
2022-03-11 01:39:36 EST [1752-16] DEBUG:    SUDO_GID=1000
2022-03-11 01:39:36 EST [1752-17] DEBUG:    PGLOCALEDIR=/usr/share/locale
2022-03-11 01:39:36 EST [1752-18] DEBUG:    PGSYSCONFDIR=/etc/postgresql-common
2022-03-11 01:39:36 EST [1752-19] DEBUG:    LC_COLLATE=en_ZA.UTF-8
2022-03-11 01:39:36 EST [1752-20] DEBUG:    LC_CTYPE=en_ZA.UTF-8
2022-03-11 01:39:36 EST [1752-21] DEBUG:    LC_MESSAGES=en_ZA.UTF-8
2022-03-11 01:39:36 EST [1752-22] DEBUG:    LC_MONETARY=C
2022-03-11 01:39:36 EST [1752-23] DEBUG:    LC_NUMERIC=C
2022-03-11 01:39:36 EST [1752-24] DEBUG:    LC_TIME=C
2022-03-11 01:39:36 EST [1752-25] DEBUG:  -----------------------------------------

When I look in the process’s file descriptor folder I also don’t see anything obviously weird:

lrwx------ 1 postgres postgres 64 Mar 11 01:39 0 -> /dev/pts/0
lrwx------ 1 postgres postgres 64 Mar 11 01:39 1 -> /dev/pts/0
lrwx------ 1 postgres postgres 64 Mar 11 01:39 2 -> /dev/pts/0
lr-x------ 1 postgres postgres 64 Mar 11 01:39 3 -> /dev/urandom
lrwx------ 1 postgres postgres 64 Mar 11 01:39 4 -> /mnt/project-backup/postgres/project/data/postmaster.pid

The postmaster.pid file looks like this:

1752
/mnt/project-backup/postgres/project/data
1646980776
5437

Any idea what could be happening here and how I can fix it? If I can’t recover the current situation is there at least any way to recover the data through some other means?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

The root cause of this issue turned out to be an incompatibility with the OS and the associated volume. The entire directory / filesystem was broken in a way that caused IO to hang, and Postgres was just a symptom of the larger issue.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply