Postdoc для postgres на докере для окон - PGPOOL не соединяет dbs

thePetester спросил: 26 августа 2018 в 04:45 в: postgresql

Я пытаюсь реализовать это, и я не могу понять, почему это не сработает. Я прочитал много людей, которые скачивают и работают как есть, но pgpool никогда не подключается к главному или подчиненному. Я вытащил файл докера из примера Паунина в вопрос 57 и изменил изображение на текущий постдокс / postgres.

Мой сборщик докеров выглядит следующим образом, и я начинаю со следующей команды:

docker-compose -f .\basic.yml up -d

version: '2'
networks:
    cluster:
        driver: bridgeservices:
    pgmaster:
        image: postdock/postgres
        environment:
            PARTNER_NODES: "pgmaster,pgslave1"
            NODE_ID: 1 # Integer number of node
            NODE_NAME: node1 # Node name
            CLUSTER_NODE_NETWORK_NAME: pgmaster
            POSTGRES_PASSWORD: monkey_pass
            POSTGRES_USER: monkey_user
            POSTGRES_DB: monkey_db
            CONFIGS: "listen_addresses:'*'"
        ports:
            - 5431:5432
        networks:
            cluster:
                aliases:
                    - pgmaster
    pgslave1:
        image: postdock/postgres
        environment:
            PARTNER_NODES: "pgmaster,pgslave1"
            REPLICATION_PRIMARY_HOST: pgmaster
            NODE_ID: 2
            NODE_NAME: node2
            CLUSTER_NODE_NETWORK_NAME: pgslave1
        ports:
            - 5441:5432
        networks:
            cluster:
                aliases:
                    - pgslave1    pgpool:
        image: postdock/pgpool
        environment:
            PCP_USER: pcp_user
            PCP_PASSWORD: pcp_pass
            WAIT_BACKEND_TIMEOUT: 60
            CHECK_USER: monkey_user
            CHECK_PASSWORD: monkey_pass
            CHECK_PGCONNECT_TIMEOUT: 3
            DB_USERS: monkey_user:monkey_pass
            BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::"
            CONFIGS: "num_init_children:250,max_pool:4"
        ports:
            - 5432:5432
            - 9898:9898 # PCP
        networks:
            cluster:
                aliases:
                    - pgpool
```

Как хозяин, так и репликация db кажутся подходящими. Я могу видеть как в pgAdmin, так и я могу создать таблицу и увидеть ее в monkey_db. Тем не менее, он никогда не перемещается к реплике.

Вот журнал для главного контейнера:

PS C:\platform\docker\basic> docker logs basic_pgmaster_1
>>> Setting up STOP handlers...
>>> STARTING SSH (if required)...
No pre-populated ssh keys!
cp: cannot stat '/home/postgres/.ssh/keys/*': No such file or directory
>>> SSH is not enabled!
>>> STARTING POSTGRES...
>>> SETTING UP POLYMORPHIC VARIABLES (repmgr=3+postgres=9 | repmgr=4, postgres=10)...
>>> TUNING UP POSTGRES...
>>> Cleaning data folder which might have some garbage...
>>> Check all partner nodes for common upstream node...
>>>>>> Checking NODE=pgmaster...
psql: could not connect to server: Connection refused
        Is the server running on host "pgmaster" (172.22.0.3) and accepting
        TCP/IP connections on port 5432?
>>>>>> Skipping: failed to get master from the node!
>>>>>> Checking NODE=pgslave1...
psql: could not connect to server: Connection refused
        Is the server running on host "pgslave1" (172.22.0.2) and accepting
        TCP/IP connections on port 5432?
>>>>>> Skipping: failed to get master from the node!
>>> Auto-detected master name: ''
>>> Setting up repmgr...
>>> Setting up repmgr config file '/etc/repmgr.conf'...
>>> Setting up upstream node...
>>> Sending in background postgres start...
>>> Waiting for local postgres server recovery if any in progress:LAUNCH_RECOVERY_CHECK_INTERVAL=30
>>> Recovery is in progress:
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".Data page checksums are disabled.fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... okSuccess. You can now start the database server using:    pg_ctl -D /var/lib/postgresql/data -l logfile startWARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
waiting for server to start....2018-09-20 06:03:29.170 UTC [85] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-09-20 06:03:29.197 UTC [86] LOG:  database system was shut down at 2018-09-20 06:03:28 UTC
2018-09-20 06:03:29.202 UTC [85] LOG:  database system is ready to accept connections
 done
server started
CREATE DATABASECREATE ROLE/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/entrypoint.sh
>>> Configuring /var/lib/postgresql/data/postgresql.conf
>>>>>> Config file was replaced with standard one!
>>>>>> Adding config 'listen_addresses'=''*''
>>>>>> Adding config 'shared_preload_libraries'=''repmgr_funcs''
>>> Creating replication user 'replication_user'
CREATE ROLE
>>> Creating replication db 'replication_db'waiting for server to shut down...2018-09-20 06:03:30.494 UTC [85] LOG:  received fast shutdown request
.2018-09-20 06:03:30.514 UTC [85] LOG:  aborting any active transactions
2018-09-20 06:03:30.517 UTC [85] LOG:  worker process: logical replication launcher (PID 92) exited with exit code 1
2018-09-20 06:03:30.517 UTC [87] LOG:  shutting down
2018-09-20 06:03:30.542 UTC [85] LOG:  database system is shut down
 done
server stoppedPostgreSQL init process complete; ready for start up.2018-09-20 06:03:30.608 UTC [47] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2018-09-20 06:03:30.608 UTC [47] LOG:  listening on IPv6 address "::", port 5432
2018-09-20 06:03:30.616 UTC [47] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-09-20 06:03:30.646 UTC [131] LOG:  database system was shut down at 2018-09-20 06:03:30 UTC
2018-09-20 06:03:30.664 UTC [47] LOG:  database system is ready to accept connections
>>>>>> RECOVERY_WAL_ID is empty!
>>> Not in recovery state (anymore)
>>> Waiting for local postgres server start...
>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90)
>>>>>> Schema replication_db.public exists on host pgmaster:5432!
>>> Registering node with role master
INFO: connecting to master database
INFO: master register: creating database objects inside the 'repmgr_pg_cluster' schema
INFO: retrieving node list for cluster 'pg_cluster'
[REPMGR EVENT] Node id: 1; Event type: master_register; Success [1|0]: 1; Time: 2018-09-20 06:03:56.560674+00;  Details:
[REPMGR EVENT] will execute script '/usr/local/bin/cluster/repmgr/events/execs/master_register.sh' for the event
[REPMGR EVENT::master_register] Node id: 1; Event type: master_register; Success [1|0]: 1; Time: 2018-09-20 06:03:56.560674+00;  Details:
[REPMGR EVENT::master_register] Locking master...
[REPMGR EVENT::master_register] Unlocking standby...
NOTICE: master node correctly registered for cluster 'pg_cluster' with id 1 (conninfo: user=replication_user password=replication_pass host=pgmaster dbname=replication_db port=5432 connect_timeout=2)
>>> Starting repmgr daemon...
[2018-09-20 06:03:56] [NOTICE] looking for configuration file in current directory
[2018-09-20 06:03:56] [NOTICE] looking for configuration file in /etc
[2018-09-20 06:03:56] [NOTICE] configuration file found at: /etc/repmgr.conf
[2018-09-20 06:03:56] [INFO] connecting to database 'user=replication_user password=replication_pass host=pgmaster dbname=replication_db port=5432 connect_timeout=2'
[2018-09-20 06:03:56] [INFO] connected to database, checking its state
[2018-09-20 06:03:56] [INFO] checking cluster configuration with schema 'repmgr_pg_cluster'
[2018-09-20 06:03:56] [INFO] checking node 1 in cluster 'pg_cluster'
[2018-09-20 06:03:56] [INFO] reloading configuration file
[2018-09-20 06:03:56] [INFO] configuration has not changed
[2018-09-20 06:03:56] [INFO] starting continuous master connection check```

Вот журнал для ведомого. Похоже, что первичный db клонирован успешно:

> ```
> 
> >>> Setting up STOP handlers...
> >>> STARTING SSH (if required)...
> No pre-populated ssh keys!
> cp: cannot stat '/home/postgres/.ssh/keys/*': No such file or directory
> >>> SSH is not enabled!
> >>> STARTING POSTGRES...
> >>> SETTING UP POLYMORPHIC VARIABLES (repmgr=3+postgres=9 | repmgr=4, postgres=10)...
> >>> TUNING UP POSTGRES...
> >>> Cleaning data folder which might have some garbage...
> >>> Check all partner nodes for common upstream node...
> >>>>>> Checking NODE=pgmaster...
> psql: could not connect to server: Connection refused
>         Is the server running on host "pgmaster" (172.22.0.3) and accepting
>         TCP/IP connections on port 5432?
> >>>>>> Skipping: failed to get master from the node!
> >>>>>> Checking NODE=pgslave1...
> psql: could not connect to server: Connection refused
>         Is the server running on host "pgslave1" (172.22.0.2) and accepting
>         TCP/IP connections on port 5432?
> >>>>>> Skipping: failed to get master from the node!
> >>> Auto-detected master name: ''
> >>> Setting up repmgr...
> >>> Setting up repmgr config file '/etc/repmgr.conf'...
> >>> Setting up upstream node...
> cat: /var/lib/postgresql/data/standby.lock: No such file or directory
> >>> Previously Locked standby upstream node LOCKED_STANDBY=''
> >>> Waiting for upstream postgres server...
> >>> Wait schema replication_db.repmgr_pg_cluster on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
> psql: could not connect to server: Connection refused
>         Is the server running on host "pgmaster" (172.22.0.3) and accepting
>         TCP/IP connections on port 5432?
> >>>>>> Host pgmaster:5432 is not accessible (will try 30 times more)
> >>>>>> Schema replication_db.repmgr_pg_cluster is still not accessible on host pgmaster:5432 (will try 29 times more)
> >>>>>> Schema replication_db.repmgr_pg_cluster is still not accessible on host pgmaster:5432 (will try 28 times more)
> >>>>>> Schema replication_db.repmgr_pg_cluster is still not accessible on host pgmaster:5432 (will try 27 times more)
> >>>>>> Schema replication_db.repmgr_pg_cluster exists on host pgmaster:5432!
> >>> REPLICATION_UPSTREAM_NODE_ID=1
> >>> Sending in background postgres start...
> >>> Waiting for upstream postgres server...
> >>> Wait schema replication_db.repmgr_pg_cluster on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
> >>>>>> Schema replication_db.repmgr_pg_cluster exists on host pgmaster:5432!
> >>> Starting standby node...
> >>> Instance hasn't been set up yet.
> >>> Clonning primary node...
> >>> Waiting for upstream postgres server...
> >>> Wait schema replication_db.repmgr_pg_cluster on pgmaster:5432(user: replication_user,password: *******), will try 30 times with delay 10 seconds (TIMEOUT=300)
> NOTICE: destination directory '/var/lib/postgresql/data' provided
> INFO: connecting to upstream node
> INFO: Successfully connected to upstream node. Current installation size is 37 MB
> INFO: checking and correcting permissions on existing directory /var/lib/postgresql/data ...
> >>>>>> Schema replication_db.repmgr_pg_cluster exists on host pgmaster:5432!
> >>> Waiting for cloning on this node is over(if any in progress): CLEAN_UP_ON_FAIL=, INTERVAL=30
> >>> Replicated: 4
> NOTICE: starting backup (using pg_basebackup)...
> INFO: executing: '/usr/lib/postgresql/10/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/postgresql/data -h pgmaster -p 5432 -U replication_user -c fast -X stream -S repmgr_slot_2 '
> NOTICE: standby clone (using pg_basebackup) complete
> NOTICE: you can now start your PostgreSQL server
> HINT: for example : pg_ctl -D /var/lib/postgresql/data start
> HINT: After starting the server, you need to register this standby with "repmgr standby register"
> [REPMGR EVENT] Node id: 2; Event type: standby_clone; Success [1|0]: 1; Time: 2018-09-20 06:04:08.427899+00;  Details: Cloned from host 'pgmaster', port 5432; backup method: pg_basebackup; --force: Y
> >>> Configuring /var/lib/postgresql/data/postgresql.conf
> >>>>>> Will add configs to the exists file
> >>>>>> Adding config 'shared_preload_libraries'=''repmgr_funcs''
> >>> Starting postgres...
> >>> Waiting for local postgres server recovery if any in progress:LAUNCH_RECOVERY_CHECK_INTERVAL=30
> >>> Recovery is in progress:
> 2018-09-20 06:04:08.517 UTC [163] LOG:  listening on IPv4 address "0.0.0.0", port 5432
> 2018-09-20 06:04:08.517 UTC [163] LOG:  listening on IPv6 address "::", port 5432
> 2018-09-20 06:04:08.521 UTC [163] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
> 2018-09-20 06:04:08.549 UTC [171] LOG:  database system was interrupted; last known up at 2018-09-20 06:04:06 UTC
> 2018-09-20 06:04:09.894 UTC [171] LOG:  entering standby mode
> 2018-09-20 06:04:09.903 UTC [171] LOG:  redo starts at 0/2000028
> 2018-09-20 06:04:09.908 UTC [171] LOG:  consistent recovery state reached at 0/20000F8
> 2018-09-20 06:04:09.908 UTC [163] LOG:  database system is ready to accept read only connections
> 2018-09-20 06:04:09.916 UTC [175] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
> >>> Cloning is done
> >>>>>> WAL id: 000000010000000000000003
> >>>>>> WAL_RECEIVER_FLAG=1!
> >>> Not in recovery state (anymore)
> >>> Waiting for local postgres server start...
> >>> Wait schema replication_db.public on pgslave1:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90)
> >>>>>> Schema replication_db.public exists on host pgslave1:5432!
> >>> Unregister the node if it was done before
> DELETE 0
> >>> Registering node with role standby
> INFO: connecting to standby database
> INFO: connecting to master database
> INFO: retrieving node list for cluster 'pg_cluster'
> INFO: registering the standby
> [REPMGR EVENT] Node id: 2; Event type: standby_register; Success [1|0]: 1; Time: 2018-09-20 06:04:38.676889+00;  Details:
> INFO: standby registration complete
> NOTICE: standby node correctly registered for cluster pg_cluster with id 2 (conninfo: user=replication_user password=replication_pass host=pgslave1 dbname=replication_db port=5432 connect_timeout=2)
>  Locking standby (NEW_UPSTREAM_NODE_ID=1)...
> >>> Starting repmgr daemon...
> [2018-09-20 06:04:38] [NOTICE] looking for configuration file in current directory
> [2018-09-20 06:04:38] [NOTICE] looking for configuration file in /etc
> [2018-09-20 06:04:38] [NOTICE] configuration file found at: /etc/repmgr.conf
> [2018-09-20 06:04:38] [INFO] connecting to database 'user=replication_user password=replication_pass host=pgslave1 dbname=replication_db port=5432 connect_timeout=2'
> [2018-09-20 06:04:38] [INFO] connected to database, checking its state
> [2018-09-20 06:04:38] [INFO] connecting to master node of cluster 'pg_cluster'
> [2018-09-20 06:04:38] [INFO] retrieving node list for cluster 'pg_cluster'
> [2018-09-20 06:04:38] [INFO] checking role of cluster node '1'
> [2018-09-20 06:04:38] [INFO] checking cluster configuration with schema 'repmgr_pg_cluster'
> [2018-09-20 06:04:38] [INFO] checking node 2 in cluster 'pg_cluster'
> [2018-09-20 06:04:38] [INFO] reloading configuration file
> [2018-09-20 06:04:38] [INFO] configuration has not changed
> [2018-09-20 06:04:38] [INFO] starting continuous standby node monitoring
> ```
```

Вот журнал pgpool:

> >>> STARTING SSH (if required)...
> cp: cannot stat '/home/postgres/.ssh/keys/*': No such file or directory
> No pre-populated ssh keys!
> >>> SSH is not enabled!
> >>> TURNING PGPOOL...
> >>> Opening access from all hosts by md5 in /usr/local/etc/pool_hba.conf
> >>> Adding user pcp_user for PCP
> >>> Creating a ~/.pcppass file for pcp_user
> >>> Adding users for md5 auth
> >>>>>> Adding user monkey_user
> >>> Adding check user 'monkey_user' for md5 auth
> >>> Adding user 'monkey_user' as check user
> >>> Adding user 'monkey_user' as health-check user
> >>> Adding backends
> >>>>>> Waiting for backend 0 to start pgpool (WAIT_BACKEND_TIMEOUT=60)
> 2018/09/20 06:03:26 Waiting for host: tcp://pgmaster:5432
> 2018/09/20 06:04:26 Timeout after 1m0s waiting on dependencies to become available: [tcp://pgmaster:5432]
> >>>>>> Will not add node 0 - it's unreachable!
> >>>>>> Waiting for backend 1 to start pgpool (WAIT_BACKEND_TIMEOUT=60)
> 2018/09/20 06:04:26 Waiting for host: tcp://pgslave1:5432
> 2018/09/20 06:05:26 Timeout after 1m0s waiting on dependencies to become available: [tcp://pgslave1:5432]
> >>>>>> Will not add node 1 - it's unreachable!
> >>> Checking if we have enough backends to start
> >>>>>> Will start pgpool REQUIRE_MIN_BACKENDS=0, BACKENDS_COUNT=0
> >>> Configuring /usr/local/etc/pgpool.conf
> >>>>>> Adding config 'num_init_children' with value '250'
> >>>>>> Adding config 'max_pool' with value '4'
> >>> STARTING PGPOOL...
> 2018-09-20 06:05:26: pid 62: LOG:  Backend status file /var/log/postgresql/pgpool_status does not exist
> 2018-09-20 06:05:26: pid 62: LOG:  Setting up socket for 0.0.0.0:5432
> 2018-09-20 06:05:26: pid 62: LOG:  Setting up socket for :::5432
> 2018-09-20 06:05:26: pid 62: LOG:  find_primary_node_repeatedly: waiting for finding a primary node
> 2018-09-20 06:05:26: pid 320: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:05:26: pid 320: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:05:26: pid 320: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:05:26: pid 62: LOG:  child process with pid: 320 exits with status 256
> 2018-09-20 06:05:26: pid 62: LOG:  fork a new child process with pid: 333
> 2018-09-20 06:06:26: pid 319: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:06:26: pid 319: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:06:26: pid 319: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:06:26: pid 62: LOG:  child process with pid: 319 exits with status 256
> 2018-09-20 06:06:26: pid 62: LOG:  fork a new child process with pid: 351
> 2018-09-20 06:07:26: pid 333: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:07:26: pid 333: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:07:26: pid 333: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:07:26: pid 62: LOG:  child process with pid: 333 exits with status 256
> 2018-09-20 06:07:26: pid 62: LOG:  fork a new child process with pid: 370
> 2018-09-20 06:08:26: pid 370: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:08:26: pid 370: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:08:26: pid 370: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:08:26: pid 62: LOG:  child process with pid: 370 exits with status 256
> 2018-09-20 06:08:26: pid 62: LOG:  fork a new child process with pid: 388
> 2018-09-20 06:09:27: pid 302: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:09:27: pid 302: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:09:27: pid 302: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:09:27: pid 62: LOG:  child process with pid: 302 exits with status 256
> 2018-09-20 06:09:27: pid 62: LOG:  fork a new child process with pid: 406
> 2018-09-20 06:10:27: pid 316: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:10:27: pid 316: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:10:27: pid 316: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:10:27: pid 62: LOG:  child process with pid: 316 exits with status 256
> 2018-09-20 06:10:27: pid 62: LOG:  fork a new child process with pid: 424
> 2018-09-20 06:11:27: pid 351: FATAL:  pgpool is not accepting any new connections
> 2018-09-20 06:11:27: pid 351: DETAIL:  all backend nodes are down, pgpool requires at least one valid node
> 2018-09-20 06:11:27: pid 351: HINT:  repair the backend nodes and restart pgpool
> 2018-09-20 06:11:27: pid 62: LOG:  child process with pid: 351 exits with status 256
> 2018-09-20 06:11:27: pid 62: LOG:  fork a new child process with pid: 442
> ``` ```

Я думал, что это проблема с доставкой WAL, но, похоже, успешно клонирует db, а также регистрируется на основе журналов. Это похоже на что-то с PGPOOL, и я не вижу, чего мне не хватает.

Любая помощь будет принята с благодарностью.

Спасибо.

0 ответов