View Issue Details

IDProjectCategoryView StatusLast Update
0016411CentOS-7sambapublic2019-10-09 23:16
Reporterdipsspid 
PriorityhighSeveritycrashReproducibilityalways
Status newResolutionopen 
Product Version7.7-1908 
Target VersionFixed in Version 
Summary0016411: packaging issues with ctdb
Descriptionctdb package appears to have something missing for it to function properly.
if you install ctdb and set it up it errors at start of the daemon with:

ctdbd[5547]: CTDB starting on node
ctdbd[5548]: Starting CTDBD (Version 4.9.3) as PID: 5548
ctdbd[5548]: Created PID file /var/run/ctdb/ctdbd.pid
ctdbd[5548]: Removed stale socket /var/run/ctdb/ctdbd.socket
ctdbd[5548]: Listening to ctdb socket /var/run/ctdb/ctdbd.socket
ctdbd[5548]: Set real-time scheduler priority
ctdbd[5548]: Starting event daemon /usr/libexec/ctdb/ctdb-eventd -P 5548 -S
ctdbd[5548]: Set runstate to INIT (1)
ctdbd[5548]: ctdb exiting with error: Failed to run init event
ctdbd[5548]:
ctdbd[5548]: CTDB daemon shutting down
ctdb-eventd[5550]: PID 5548 gone away, exiting

with no hint as to why it fails.
running ctdbd thru strace gave me a clue.

there is a missing folder that's causing this, so running:
mkdir -p /etc/ctdb/events/legacy
to create the missing legacy folder gets me a bit further.

after this i get a bit further but this time it errors on:
ctdbd[4472]: tdb(/var/lib/ctdb/state/persistent_health.tdb.0): tdb_open_ex: could not open file /var/lib/ctdb/state/persistent_health.tdb.0: No such file or directory

so create the missing state folder:
mkdir -p /var/lib/ctdb/state

this now results in a daemon that starts up. But the daemon has some issue taking over the cluster IP.
Steps To ReproduceSteps to Reproduce:
1. install ctdb
2. setup ctdb
3. try to start the daemon and watch it fail with next to no clues in log
TagsNo tags attached.
abrt_hash
URL

Activities

dipsspid

dipsspid

2019-09-18 13:32

reporter   ~0035116

The version of CTDB used is CTDBD (Version 4.9.1)
dipsspid

dipsspid

2019-09-23 07:34

reporter   ~0035198

# cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
dipsspid

dipsspid

2019-09-27 07:15

reporter   ~0035249

Any news on this ? It was possible to replicate the issue ?
ailiev

ailiev

2019-09-29 11:29

reporter   ~0035265

I can confirm that this is reproducible using the steps from this report and also that the fixes mentioned (creating the /etc/ctdb/events/legacy and /var/lib/ctdb/state directories) helps.
dipsspid

dipsspid

2019-10-01 12:57

reporter   ~0035282

Hi ailiev,

creating the needed directory do not fix the take-over process. Is is needed to permit to ctdb to start only.
farkey_2000

farkey_2000

2019-10-05 14:22

reporter   ~0035336

I am experiencing the same problem. Adding the directories, as Ailiev suggests allows the service to start, however none of the VIP's/public addresses seem to coming up.
ailiev

ailiev

2019-10-05 16:09

reporter   ~0035337

dipssid, farkey_2000, this is correct. I also needed to create the /var/lib/ctdb/persistent and /var/lib/ctdb/volatile direcotries before ctdb was functional.
dipsspid

dipsspid

2019-10-07 07:30

reporter   ~0035354

Tried to create all needed directories, the daemon starts but still the takeover process has some issues:

The log report this:
<cut>
2019/10/07 09:21:39.701502 ctdbd[7631]: monitor event OK - node re-enabled
2019/10/07 09:21:39.702375 ctdbd[7631]: Node became HEALTHY. Ask recovery master to reallocate IPs
2019/10/07 09:21:39.703981 ctdb-recoverd[7636]: Node 0 has changed flags - now 0x0 was 0x2
2019/10/07 09:21:39.724479 ctdb-recoverd[7636]: Unassigned IP 192.168.190.100 can be served by this node
2019/10/07 09:21:39.724519 ctdb-recoverd[7636]: Unassigned IP 192.168.189.100 can be served by this node
2019/10/07 09:21:39.724735 ctdb-recoverd[7636]: Trigger takeoverrun
2019/10/07 09:21:39.725001 ctdb-recoverd[7636]: Takeover run starting
2019/10/07 09:21:39.730178 ctdbd[7631]: Takeover of IP 192.168.190.100/24 on interface ens224
2019/10/07 09:21:39.730629 ctdbd[7631]: Takeover of IP 192.168.189.100/24 on interface ens192
2019/10/07 09:21:39.731128 ctdb-recoverd[7636]: Takeover run completed successfully
2019/10/07 09:21:40.726758 ctdb-recoverd[7636]: Assigned IP 192.168.190.100 not on an interface
2019/10/07 09:21:40.726887 ctdb-recoverd[7636]: Assigned IP 192.168.189.100 not on an interface
2019/10/07 09:21:40.726916 ctdb-recoverd[7636]: Trigger takeoverrun
<cut>
The messages is repeated more time and no virtual Ip is assigned.
Tried to change permissions on previous create directories to 777, but no luck :(

@farkey_2000: which is you experience, the suggested actions from aliev fix on your side?
dipsspid

dipsspid

2019-10-07 07:33

reporter   ~0035355

I'm also notice that no lockfile is created.....(from ctdbd.conf , variable CTDB_RECOVERY_LOCK). Tried different location, shared storage or local storage.
farkey_2000

farkey_2000

2019-10-09 23:16

reporter   ~0035419

No all of the fixes, folder creation etc, only allows for the service to start. None of the public addresses are "pingable" or get assigned. I followed up with the SAMBA group and their response, which is the title of this bug is the back ported 4.9 version CENTOS is pushing in the repos. Which they suggest upgrading to 4.11 and the RPM is currently on FC 32. I have not tried samba this option yet.
thank

Issue History

Date Modified Username Field Change
2019-09-18 13:16 dipsspid New Issue
2019-09-18 13:32 dipsspid Note Added: 0035116
2019-09-23 07:34 dipsspid Note Added: 0035198
2019-09-27 07:15 dipsspid Note Added: 0035249
2019-09-29 11:29 ailiev Note Added: 0035265
2019-10-01 12:57 dipsspid Note Added: 0035282
2019-10-05 14:22 farkey_2000 Note Added: 0035336
2019-10-05 16:09 ailiev Note Added: 0035337
2019-10-07 07:30 dipsspid Note Added: 0035354
2019-10-07 07:33 dipsspid Note Added: 0035355
2019-10-09 23:16 farkey_2000 Note Added: 0035419