Video Screencast Help

VCS IP, IPMultiNICB - option to start interface instances at interface:x

Created: 19 Oct 2011 • Updated: 27 Jan 2012
fmthard's picture
0 Agree
0 Disagree
0 0 Votes
Login to vote
Status: In Review

Here's the problem:

Through some wierd chain of circumstances (core switch/router upgrades and reboots, IP changes), we wound up with a couple of systems who had their system IPs bound to a higher instance than one or more application IPs.

(I suspect a switch would reboot, which would cause mpathd to move the system IP over to another interface. Then VCS failed a service group over to the system (by which time the first interface was back up), and any IP/IPMNB resources in those SG's would bind to the first available instance on an interface.)

Here's how we figured out this had happened: syslog collection servers were reporting that NodeA had not reported in in 72 hours. We snooped NodeA, and determined that messages were instead being tagged as from an application IP. To wit:

root@nodea# ping -a nodea
nodea (192.168.20.69) is alive
root@nodea# ping -a appip1
appip1 (192.168.20.50) is alive
root@nodea# logger -p local4.notice "testing syslog - jcy"
appip1.xyz.com -> syslog01.xyz.com SYSLOG C port=54650 local4.notice: <165>Oct 13 15:58:25

We determined that it didn't matter that nodea has a lower IP than appip1. It has to be a lower instance on all interfaces (in other words, if the nodea IP was bound to interface1:1, and the appip1 was bound to interface2:1, we could still get messages tagged as coming from appip1. Very weird.

Here's the ifconfig, pre-fix:

root@dban3# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
ce0: flags=9040843 mtu 1500 index 2
        inet 192.168.20.70 netmask ffffff00 broadcast 192.168.20.255
        groupname public
        ether 0:3:ba:12:53:9a
ce0:1: flags=1000843 mtu 1500 index 2
        inet 192.168.20.50 netmask ffffff00 broadcast 192.168.20.255
ce0:2: flags=1000843 mtu 1500 index 2
        inet 192.168.20.17 netmask ffffff00 broadcast 192.168.20.255
ce0:4: flags=1000843 mtu 1500 index 2
        inet 192.168.20.69 netmask ffffff00 broadcast 192.168.20.255
ce2: flags=9040843 mtu 1500 index 3
        inet 192.168.20.71 netmask ffffff00 broadcast 192.168.20.255
        groupname public
        ether 0:3:ba:23:37:2e
ce2:1: flags=1000843 mtu 1500 index 3
        inet 192.168.20.77 netmask ffffff00 broadcast 192.168.20.255

haipswitch didn't really help in this matter; we wound up having to move things around ourselves.

ifconfig ce0:3 plumb
ifconfig ce2:2 plumb
ifconfig ce0:1 unplumb
ifconfig ce0:3 192.168.20.50 netmask + up
ifconfig ce2:1 unplumb
ifconfig ce2:2 192.168.20.77 netmask + up
ifconfig ce0:1 plumb
ifconfig ce0:1 nodea netmask + up

Note that running the above as a script resulted in not even a blip from the IPMNB agent or mpathd.

So, the enhancement request. I would have actually cloned the IPMultiNICB agent, made the code changes myself (start at instance x), and submitted here for use, but y'all COMPILED the sucker, and I am too lazy to bother with resource types we aren't using (IP, IPMultiNIC).

Can we have another attribute, say, StartInstanceNumbering, could be a boolean or integer. Default behavior is to do as now, ie, look for the first instance available on an interface and use that. New behavior would be to start looking for an "empty" instance slot at integer, or just start at 2 if you are doing boolean. This way the instance:1 slots are always reserved for mpathd and the system IP, and logging remains consistent.

Thanks

fdiskit (now fmthard)