View Issue Details

IDProjectCategoryView StatusLast Update
0002416CentOS-44Suitepublic2007-11-27 17:02
Reporterbutty 
PrioritynormalSeverityblockReproducibilityalways
Status assignedResolutionopen 
Product Version4.5 - x86_64 
Target VersionFixed in Version 
Summary0002416: GFS with DLM blocks after a while
DescriptionI have a cluster of 2 nodes (HPProLiant BL460c G1) attached to a fibre channel SAN (HP EVA 4000) to share a GFS filesystem.

It works correctly for 4 or 5 days but then its logged in /var/log/messages file several times:
"kernel: Extra connection from node * attempted"
being * the name of the other server. Sometimes we can find this log in the two servers and sometimes only in one of them.

In this moment we can't use the gfs mounted disk. If we use it, it waits there forever (so the processes which need it wait there too). The only way to free the session is too reboot one of the machines. When we reboot one of them all the reteined processes (in the server which is still running) go well again.

This happens every some days.

I paste my cluster.conf file. Every thing you need only ask for it.

Thank you
Additional Information<?xml version="1.0"?>
<cluster alias="tibcocluster" config_version="9" name="tibcocluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="perseo-01-hb" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="perseo-01-ilo"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="perseo-02-hb" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="perseo-02-ilo"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ilo" hostname="10.15.187.174" login="tibcouser" name="perseo-01-ilo" passwd="PASSWORD"/>
                <fencedevice agent="fence_ilo" hostname="10.15.187.175" login="tibcouser" name="perseo-02-ilo" passwd="PASSWORD"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources>
                        <script file="/opt/scripts/aliasrelocate" name="aliasrelocate"/>
                </resources>
                <service autostart="1" exclusive="1" name="ServicioAlias" recovery="relocate">
                        <script ref="aliasrelocate"/>
                </service>
        </rm>
</cluster>
TagsNo tags attached.

Activities

2007-11-27 17:00

 

cluster.conf (2,793 bytes)
mikeytag

mikeytag

2007-11-27 17:01

reporter   ~0006414

I would like to confirm that I am seeing the same thing on my cluster. Attached below is my cluster.conf file.

Is it possible that this bug is caused by the fact that kmod-gfs version is:
kmod-gfs.x86_64 0.1.16-6.2.6.18_8.1.15 installed

According to this url: https://rhn.redhat.com/errata/RHBA-2007-0577.html
the new version put out by RH is 0.1.19 and the notes there list several bug fixes. I am wondering if this would fix our issues?

BTW, I can verify that no node is failing when my GFS locks up, even cman_tool reports that everything is active and nothing has been fenced. I also do not get any messages in /var/log/messages, the GFS mount simply hangs when trying to use it (cd, ls, anything)
mikeytag

mikeytag

2007-11-27 17:01

reporter   ~0006415

I would like to confirm that I am seeing the same thing on my cluster. Attached below is my cluster.conf file.

Is it possible that this bug is caused by the fact that kmod-gfs version is:
kmod-gfs.x86_64 0.1.16-6.2.6.18_8.1.15 installed

According to this url: https://rhn.redhat.com/errata/RHBA-2007-0577.html
the new version put out by RH is 0.1.19 and the notes there list several bug fixes. I am wondering if this would fix our issues?

BTW, I can verify that no node is failing when my GFS locks up, even cman_tool reports that everything is active and nothing has been fenced. I also do not get any messages in /var/log/messages, the GFS mount simply hangs when trying to use it (cd, ls, anything)
mikeytag

mikeytag

2007-11-27 17:02

reporter   ~0006416

Sorry guys, I was having a hard time uploding my cluster.conf file. Here it is:

<?xml version="1.0"?>
<cluster config_version="17" name="san1">
        <fence_daemon post_fail_delay="0" post_join_delay="120"/>
        <clusternodes>
                <clusternode name="db1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="fence-e1.0" mac="00:30:48:2D:7C:F5"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="db2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="fence-e1.0" mac="00:11:09:5B:59:55"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="web1" nodeid="3" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="fence-e1.0" mac="00:11:09:5B:59:27"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="web2" nodeid="4" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="fence-e1.0" mac="00:30:48:20:B7:0D"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="web3" nodeid="5" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="fence-e1.0" mac="00:30:48:20:B7:21"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="web4" nodeid="6" votes="1">
                        <fence>
                 <method name="1">
                                        <device name="fence-e1.0" mac="00:30:48:20:B7:8F"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="web5" nodeid="7" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="fence-e1.0" mac="00:30:48:20:E4:9D"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
                <fencedevice agent="fence_aoemask" name="fence-e1.0" shelf="1" slot="0" interface="eth1"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
        </rm>
</cluster>

Issue History

Date Modified Username Field Change
2007-10-30 09:39 butty New Issue
2007-10-30 09:39 butty Status new => assigned
2007-11-27 17:00 mikeytag File Added: cluster.conf
2007-11-27 17:01 mikeytag Note Added: 0006414
2007-11-27 17:01 mikeytag Note Added: 0006415
2007-11-27 17:02 mikeytag Note Added: 0006416