Rajeev's Blog for aix: 2017

Wednesday, 8 March 2017

AIX TL upgrade

AIX Patching

Upgrading AIX 5.3 using alt_disk_install

Assumptions

1. The rootvg volume group is mirrored across hdisk0 and hdisk1

2. "bos.alt_disk_install" and "bos.alt_disk_install.boot_images " are installed

3. The dump device is hd7

Commands (Assumptions)

lsvg -l rootvg
lslpp -al " bos.alt_disk_install " or lslpp –l | grep –I bos.alt_disk_install
sysdumpdev –l

note: To designate logical volume hd7 as the primary dump device, enter: sysdumpdev – P /dev/hd7

Performing mksysb and pre-installation tasks:

Mount nimserver in a temporary mount point and take mksysb

1. Create an mksysb image and copy it to <NIM SERVER>:/mksysb_images

Steps:

a) Mkdir mnt-tma in host server and now follow steps....

b)mount nim1.srv.uk.deuba.com:/usr/sys/inst.images/SOURCE /mnt-tma

2. Remove Relicore packages, if already installed on the system

3. Check the current AIX level:

# oslevel -s | tee -a /tmp/oslevel.txt

# lslpp -l | tee -a /tmp/lslpp.txt

And Copy the above two files to a remote server for safekeeping.

4. Check the kernel mode (TL11 requires a 64-bit kernel):

# bootinfo –K and see if it’s 64

5. If the TL version to be updated is prior to AIX 5.3 TL10 and above or AIX 6.1 TL3 and above, make sure all interim fixes (ifix) have been removed from the system.

# emgr –l

To remove an ifix:

# emgr –r –L <ifix Label>

6. Check if all filesets are applied and are valid:

# instfix –i | grep ML

# lppchk –v

7. Check filesets in APPLIED state and commit them:

# installp -s

# installp -c all <== Commit applied filesets (optional)

# installp -s

0503-459 installp: No filesets were found in the Software

Vital Product Database in the APPLIED state.

8. Stop all applications and databases .

9.Set the auto-varyon flag of all non-rootvg volume groups to no:

# chvg -a n <vgname>

10.Check last boot device:

Bootinfo –b

11.Create boot image on hdisk0 and hdisk1

Bosboot –ad /dev/hdisk0

Bosboot –ad /dev/hdisk1

12.Check bootlist

Bootlist –m normal –o

13.Change Boot device if its required:

Bootlist –m normal hdisk0 hdisk1

14.Reboot the server to ensure the system comes up without any problem:

# touch /tmp/allowshutdown

# shutdown –Fr

Break the rootvg mirrors

1. Check that rootvg is mirrored across 2 disks (dump area may not be mirrored):

# lsvg -p rootvg

# lsvg -l rootvg

2. Unmirror rootvg:

# unmirrorvg rootvg hdisk1 <= Don't remove the mirror OS was booted from

3. Remove the boot block from hdisk1 (optional, but it is safer):

# chpv -c hdisk1

4. Reconfigure hdisk0 as the only boot device:

# bosboot -a -d /dev/hdisk0

# bootlist -m normal hdisk0

# bootlist -m normal –

5.Display the dump devices:

# sysdumpdev -l

6.Record the size of the dump device so that it can be recreated later:

# lslv hd7 > /tmp/sysdump.save

7.Remove the dump device:

# sysdumpdev -P -p /dev/sysdumpnull

# rmlv hd7

8.Compare the "LP" and "PV" is the same for each logical device:

# lsvg -l rootvg

9.If any logical volumes are still mirrored (i.e. LPs and PVs columns are not the same), reduce the number of copies to 1:

#lslv –l lv1

# rmlvcopy <lvname> 1 hdisk1

10.Make sure hdisk1 contains no LV:

# lspv -l hdisk1

11.Remove hdisk1 from rootvg :

# reducevg rootvg hdisk1

12.Make a backup of rootvg on hdisk1:

# alt_disk_install -C -P all -B hdisk1 - OR –

# alt_disk_copy -P "all" -B -d "hdisk1"

Note: "-B" disallows the setting of bootlist to hdisk1 at the end of cloning

Perform OS update:

1. location of update media:

/net/NIM SERVER/export/lpp_source/53/lpp-5300-11-2/installp/ppc

2.Install LPP install commands for TL11:

# cd <location of update media>

# installp -agX -d . bos.rte.install

3.Run a preview of update to the desired TL level:

# cd <location of update media>

# /usr/lib/instl/sm_inst installp_cmd -apgXY -d . -f '_update_all' \

| tee -a /tmp/preview.out

4. Install the updates to APPLIED state:

# cd <location of update media>

# /usr/lib/instl/sm_inst installp_cmd -agXY -d . -f '_update_all' \

| tee -a /tmp/update.out

5. Check the new TL level:

# lppcheck -v

# oslevel -s

6.If the output does not show the correct OS level 5300-11-02-1007, determine which filesets didn’t get updated:

# oslevel -rl 5300-11

Note: You may need to run installp again to update these filesets to the correct OS level

7. Set hdisk0 (or the disk with the active rootvg) to first in the bootlist:

# bootlist -m normal hdisk0

# bootlist -m normal –o

8. Reboot the server:

# touch /tmp/allowshutdown

# shutdown -Fr now

Validate after OS update is complete

1. Check that the system is booted from the correct image:

# bosboot -b

2. Check that all patches were installed:

#oslevel -s

# lppchk -v

3.Set the auto-varyon flag on all non-rootvg volume groups to yes:

# varyonvg <vgname>

# chvg -a y <vgname>

4.Contact other support groups for application/database validation

Mirror rootvg if OS update is successful

1. Clean up the alt_disk_install:

# alt_disk_install -X altinst_rootvg - OR -

# alt_rootvg_op -X altinst_rootvg

2. Add hdisk1 back to rootvg:

# extendvg -f rootvg hdisk1

3. Mirror rootvg in the background:

# mirrorvg -S rootvg hdisk1

4. Check periodically if mirrorvg is complete:

# lsvg -l rootvg

5. Mirror any logical volumes that were not mirrored:

# mklvcopy -k <lvname> 2 hdisk1

6. Recreate the dump device:

# mklv -y hd7 -t sysdump rootvg xx hdisk0

7. Change the dump device to the original configuration:

# sysdumpdev -P -p /dev/hd7

8. Configure hdisk0 and hdisk1 as the boot devices:

# bosboot -a -d /dev/hdisk0

# bootlist -m normal hdisk0 hdisk1

# bootlist -m normal -o

Understanding Micro Partitioning and Entitled Capacity

Micro-Partitioning was introduced as a feature of the POWER5 processor-based product line back in 2004, yet I still get a number of questions on a regular basis around implementing and understanding Micro-Partitioning. In this article, I'll try and paint a concise and clear picture of everything you need to know about Micro-Partitioning in the Power Systems environment and address the most frequently asked questions with regards to best practices. Every reference I'll be making throughout this article will be in the context of shared uncapped LPARs.

Understanding Entitled Capacity

The entitled capacity of an LPAR plays a crucial role. It determines two very important factors: the guaranteed CPU cycles the LPAR will get at any point in time, and the base unit of measurement for utilization statistics. One aspect of a managed system's entitled capacity is that the total entitled capacity on the system cannot exceed the number of physical processors in that system. In plain English: the system's processors can not be over-subscribed by the total of the entitlements. As a side effect, this means every LPAR on a managed system will always be able to use its entitled capacity at any point in time. This capacity is guaranteed to be available to its LPAR within one dispatch cycle (10ms).

On the other hand, if an LPAR isn’t making full use of its entitlement, these cycles are being yielded back to the shared processor pool that LPAR is part of. The second crucial aspect of entitled capacity is being the basis for utilization statistics and performance reporting. Ore more simply: an LPAR consuming all of its entitled CPU capacity will report 100 percent utilization. Now that LPAR will not necessarily be limited to 100 percent utilization. Depending on that LPAR's virtual-processor configuration, it'll be able to borrow unused cycles from the shared processor pool and report more then 100 percent utilization. In that case, it’s important to know that any capacity used beyond an LPAR's entitled capacity isn’t guaranteed (as it might be some other LPAR's entitlement). Therefore, if an LPAR is running beyond 100 percent CPU, it might be forced back down to 100 percent if another LPAR requires that borrowed capacity.

Then why is there a minimum/desired/maximum setting for entitlement? Because the entitled capacity of an LPAR can be changed dynamically. The minimum and maximum values of entitled capacity are there to set the limUnderstanding Micro-Partitioningits to which a running LPAR's entitled capacity may be varied.

The Role of Virtual Processors

Virtual processors are what AIX sees as being actual CPUs from an OS standpoint. You have to look at them as being a logical entity that is backed up by physical processor cycles. For each virtual processor, between 0.1 and 1.0 physical processor can be dispatched to execute tasks in that virtual processor's run queue. There are no conditions under which a single virtual processor will consume more then 1.0 physical processor. Therefore, the number of online virtual processors dictates the absolute maximum CPU consumption an LPAR can achieve (should enough capacity be available in its shared processor pool). That being said, if an LPAR has an entitlement of 2.0 processors and four virtual processors, this LPAR could be able to consume up to four physical processors, in which case, it will report 200 percent CPU utilization. You must keep in mind, while configuring virtual processors on a system, it’s possible to dispatch more virtual processors then there are physical processors in the shared processor pool, therefore you might not be able to have an LPAR peak all the way up to its number of virtual processors.

Again, in configuring an LPAR, a minimum/desired/maximum value must be set for the number of virtual processors. These values serve strictly as boundaries for dynamic LPAR operation while varying the number of virtual processors on a running LPAR.

From Dedicated to Shared

Still today, a very large number of customers are running LPARs in a dedicated mode. A question I often get is: What's the best way to convert an LPAR from a dedicated mode to a shared uncapped mode without affecting performance? The simplest way is to just change its mode from dedicated to shared uncapped and make sure the number of virtual processors are equal to the entitled capacity. By doing so, the LPAR will preserve its entitled capacity, which is its guaranteed cycles. Its entitled capacity being guaranteed will ensure this LPAR can’t starve and application response time isn’t impacted. The immediate advantage is any unused CPU cycles from that LPAR will go from wasted (dedicated mode) to yielded back to the shared processor mode (shared uncapped) and readily available for any other LPAR that might need it. A second step is to look at the LPAR's CPU consumption and determine if it could use more CPU. If the LPAR does show signs of plateauing at 100 percent, then adjusting the number of virtual processors up (one virtual processor at a time) will allow that LPAR to borrow cycles from other LPARs that might not be using their full entitlement.

There are a few important considerations when implementing this very simple method. If your LPAR now uses more then its entitled capacity, it’ll report more then 100 percent CPU utilization. That can result in issues with some performance-monitoring software and make some people uneasy. The other consideration is users might have varying application response times at different times of day, based on the available CPUs in the processor pool. If your processor pool reaches full utilization, applications using more then their entitled capacity might have their response time return dedicated-like performance. Fortunately, this isn’t often the case. If you haven’t enabled processor pooling in your configuration, give it a try; you'll be surprised just how much free CPU you’ll end up with.

Optimizing Pooled Resources

Over the years, I've developed a very simple approach to getting the most out of micropartitioned environments. This approach is based on a good understanding of entitled capacity. In maximizing a system's utilization, you'll want to drive each LPAR's utilization as close as possible to 100 percent, on average. Once an LPAR has been converted from being dedicated to being shared uncapped, you’ll want to gradually reduce its entitled capacity so it reports higher utilization until your LPAR's average utilization is at a level you feel comfortable with. Your LPAR's peaks, more then likely, will exceed the LPAR's entitled capacity (100 percent), and that's fine. If all of your LPARs on your managed systems run at 90 percent utilization on average, and all your entitled capacity is dispatched, then your entire managed system will be running at 90 percent utilization.

One very important factor in determining the average utilization you wish to have on your LPAR is the managed system size. The larger the system, the more LPAR on the system, the higher the utilization target can be set. This is simply a reflection of the law of large numbers in probability theory.

Load Testing On AIX (See %entc reaches up to 500%)

---------------------------------------

root@LPAR-104#sar 1 2

AIX LPAR-104 1 7 000C74F0D600    10/23/70

System configuration: lcpu=2 ent=0.20 mode=Uncapped

16:03:56    %usr    %sys    %wio   %idle   physc   %entc
16:03:57       1       3       1      95    0.01     6.6
16:03:58      68       7       0      26    0.16    81.2

Average       34       5       1      60    0.09    43.9
root@LPAR-104#sar 1 2

AIX LPAR-104 1 7 000C74F0D600    10/23/70

System configuration: lcpu=2 ent=0.20 mode=Uncapped

16:05:44    %usr    %sys    %wio   %idle   physc   %entc
16:05:45       1       3       0      97    0.01     6.5
16:05:46       0       2       0      98    0.01     4.7

Average        1       2       0      97    0.01     5.6
root@LPAR-104#sar 1 2

AIX LPAR-104 1 7 000C74F0D600    10/23/70

System configuration: lcpu=2 ent=0.20 mode=Uncapped

16:05:56    %usr    %sys    %wio   %idle   physc   %entc
16:05:57       1       3       0      97    0.01     6.9
16:05:58      88       7       0       6    0.59   295.8

Average       66       6       0      29    0.30   150.7
root@LPAR-104#lparstat -i
Node Name                                  : LPAR-104
Partition Name                             : LPAR-104-Rajeev-Delhi
Partition Number                           : 5
Type                                       : Shared-SMT
Mode                                       : Uncapped
Entitled Capacity                          : 0.20
Partition Group-ID                         : 32773
Shared Pool ID                             : 0
Online Virtual CPUs                        : 1
Maximum Virtual CPUs                       : 1
Minimum Virtual CPUs                       : 1
Online Memory                              : 1024 MB
Maximum Memory                             : 2048 MB
Minimum Memory                             : 512 MB
Variable Capacity Weight                   : 64
Minimum Capacity                           : 0.10
Maximum Capacity                           : 0.30
Capacity Increment                         : 0.01
Maximum Physical CPUs in system            : 8
Active Physical CPUs in system             : 8
Active CPUs in Pool                        : 8
Shared Physical CPUs in system             : -
Maximum Capacity of Pool                   : -
Entitled Capacity of Pool                  : -
Unallocated Capacity                       : 0.00
Physical CPU Percentage                    : 20.00%
Unallocated Weight                         : 0
Memory Mode                                : Dedicated
Total I/O Memory Entitlement               : -
Variable Memory Capacity Weight            : -
Memory Pool ID                             : -
Physical Memory in the Pool                : -
Hypervisor Page Size                       : -
Unallocated Variable Memory Capacity Weight: -
Unallocated I/O Memory entitlement         : -
Memory Group ID of LPAR                    : -
Desired Virtual CPUs                       : 1
Desired Memory                             : 1024 MB
Desired Variable Capacity Weight           : 64
Desired Capacity                           : 0.20
Target Memory Expansion Factor             : -
Target Memory Expansion Size               : -
Power Saving Mode                          : -
root@LPAR-104#lparstat

System configuration: type=Shared mode=Uncapped smt=On lcpu=2 mem=1024MB psize=8 ent=0.20

%user %sys %wait %idle physc %entc lbusy vcsw phint
----- ----- ------ ------ ----- ----- ------ ----- -----
0.0   0.1    0.1   99.8 0.00   0.2    0.5 19444135 4312
root@LPAR-104#sar 1 2

AIX LPAR-104 1 7 000C74F0D600    10/23/70

System configuration: lcpu=2 ent=0.20 mode=Uncapped

16:07:13    %usr    %sys    %wio   %idle   physc   %entc
16:07:14       1       3       0      97    0.01     6.9
16:07:15       0       2       0      98    0.01     4.7

Average        1       2       0      97    0.01     5.8
root@LPAR-104#sar 1 2

AIX LPAR-104 1 7 000C74F0D600    10/23/70

System configuration: lcpu=2 ent=0.20 mode=Uncapped

16:07:20    %usr    %sys    %wio   %idle   physc   %entc
16:07:21       1       3       0      97    0.01     6.1
16:07:22       0       2       0      98    0.01     4.6

Average        0       2       0      97    0.01     5.3
root@LPAR-104#sar 1 30

AIX LPAR-104 1 7 000C74F0D600    10/23/70

System configuration: lcpu=2 ent=0.20 mode=Uncapped

16:07:32    %usr    %sys    %wio   %idle   physc   %entc
16:07:33      67      16       0      17    0.18    90.9
16:07:34       0       2       0      98    0.01     4.7
16:07:35       0       2       0      98    0.01     5.0
16:07:36      88       6       0       6    0.85   424.0
16:07:37      88       6       0       6    0.39   197.4
16:07:38       0       2       0      98    0.01     4.8
16:07:39       0       4       1      95    0.02     8.0
16:07:40      88       7       0       6    0.48   241.0
16:07:41      89       6       0       5    0.75   374.2
16:07:42       0       2       0      98    0.01     4.7
16:07:43       5      11       0      84    0.04    21.6
16:07:44      61       6       0      33    0.15    73.8
16:07:45      89       6       0       5    1.00   499.7
16:07:46      39       4       0      58    0.09    47.1
16:07:47       0       2       0      98    0.01     4.8
16:07:48       0       2       0      98    0.01     5.0
16:07:49       0       2       0      98    0.01     5.0
16:07:50      86       7       0       7    0.31   156.7
16:07:51      89       6       0       5    1.00   499.9
16:07:52      86       7       0       7    0.27   134.0
16:07:53       5      11       0      84    0.04    21.7
16:07:54       0       2       0      98    0.01     4.7
16:07:55       0       2       0      98    0.01     4.7
16:07:56      88       7       0       6    0.43   213.0
16:07:57      89       6       0       5    1.00   499.7
16:07:58      87       7       0       7    0.21   103.0
16:07:59       0       2       0      98    0.01     5.3
16:08:00       0       2       0      98    0.01     5.0
16:08:01      89       6       0       5    0.93   464.7
16:08:02      88       6       0       6    0.31   154.8

Average       65       6       0      29    0.29   142.7

root@LPAR-104#

Understanding location Codes in AIX

Decoding location codes

Where is my device?

When you have a faulty device, you need to know where the device is located physically on your system, so it can be replaced. The errpt or lscfg comand provides a location code specifying where the faulty device is located. Armed with the location code and a server manual or an IBM® Redbooks® title covering your model, or even better with access to IBM web information center, you should be able to identify exactly where the device is located.

Introduction

Getting a device failure is definitely an inconvenience. The type of device that is failing might be hardware swappable such as a fan cooling unit or a hot swap Peripheral Component Interconnect (PCI) card. In either case, you need to know the physical location of the device for it to be replaced. So, you need to know the location code of the device. A failing device will be shown up in the error report (using the errptcommand), where the physical location code will be posted as well. Alternatively, using the lscfg command also tells you the physical locations of devices. After getting the location, how do you go about locating the device?

AIX internal codes and physical codes

AIX provides two different codes, and they are:

IBM AIX® (internal) location system codes
Physical location codes

AIX internal location codes can be used in conjunction with physical codes to identify devices, as we will see later in this article. The ones generated using AIX , reference certain devices, for example:

10-80-00-3,0 SCSI CD Drive
10-80-00-2,0 SCSI disk
02-08-00 SAS disk

The above codes are the internal paths to the actual device, which can be viewed with the lsdev command.

The other location code which is the physical type and the one we are particularly interested in is generated by the firmware. For example:

U789C.001.DQD3F62-P2-D3 SAS Disk Drive

Since the release of IBM POWER5 processors a few years back, the physical location code is the preferred method for locating devices. As a rule, the physical code is generally all that you need. That is what I focus on in this article. The commands provided in Table 1 enables you to get various information about your devices.

Table 1. Commands to get information about your devices

Command	Description
lsdev -C -H -F "name status physloc location description"	Get the AIX ( if present) and physical location codes.
lsdev -Cc disk -F 'name location physloc'	Get the AIX and physical location codes of all disks.
lsdev -Cl hdisk0 -F physloc	Get the location code of hdisk0.
lscfg -vpl hdisk0	Get extended information o fhdisk0.
lsdev -C\| grep hdisk0	Get the AIX location code of hdisk0.
lsparent -Cl hdisk0	Get the parent devices for hdisk0.
lscfg -l fcs0	Get information about the fsc0 device.

Use the IBM information center or Redbooks

How can you locate a device using the physical location code? It depends on what type of system you have, as these might slightly different across the system ranges. Always make sure that you have your server system manual or refer to the online information at IBM information center. These references provide the schematics of your model, including the location codes of your system for easy identification. However, having stated that all is not lost, there are ways to physically identify a device.

What's in a code?

The location code of a physical device comes from the firmware side. If you follow the location code correctly, it eventually points to the device you are looking for.

The actual format of a location code is the same format no matter what server you have, it is just the codes (in numbers/letters) that can point to a different physical location on your system. The first character of a locatable device is always "U", so far so good. Next, it gets interesting. Here is the general format of a location code, with an example, taken from an IBM Power Systems™ 520 model (floor standing), which is what is used for examples unless otherwise stated. All location code examples are physical locations unless otherwise stated.

Unit enclosure type	Enclosure model	Serial number	Location
U789C	001	DQD3F62	P2-D3

The fields location are the unit/model serial number of the unit/drawer. Your system might contain different unit enclose types. Do not expect to have the same enclosure on all your location codes. This is especially true if you have expansion slots, such as additional disk drawers.

For this article, the location field is the interesting bit.

If the physical location cannot be resolved, AIX assumes it is a logical device that at some point is linked to a physical device. Typically, these can be logical devices connected to say, external storage such as Redundant Array of Independent Disks (RAID) SCSI devices or tape units. The codes can have different meanings depending on the type of hardware, for instance SCSI, serial, ttys and adapters.

The location code can be made up of several prefix letters and numbers. Common prefixes are shown in Table 2.

Table 2. Common prefixes

Code prefix	Description
A	Air moving device, for example,fan
C	Card, for example, PCI slots, memory slots
D	Devices, for example, disk slot, disk drawer
E	Electrical, for example, power supply
L	Logical path, for example, Fibre Channel
P	Planar, for example, a system or I/O back-plane, system board
T	Interface connector /Port, for example, serial port, usually followed by a number to denote which port
U	Unit
V	Virtual planar

That's a code

Let's now look at an example, say hdisk0. Below is a partial output from lscfg for hdisk0:

lscfg -vpl hdisk0
hdisk0 U789C.001.DQD3F62-P2-D3 SAS Disk Drive (146800 MB)

Manufacturer................IBM
Machine Type and Model......ST3146356SS
FRU Number..................10N7204
ROS Level and ID............45363045
Serial Number...............3QN2JFEP

Hardware Location Code......U789C.001.DQD3F62-P2-D3

PLATFORM SPECIFIC
Name:disk
Node: disk
Device Type:block

Looking at the above lscfg output, it first tells me this is a SAS disk , but let's look more closely at the location code, and break that down:

U789C.001.DQD3F62-P2-D3

Code	Description
U789C	Unit type
001	Enclosure model
DQDS3F62	Serial number
P2	Planar 2 (this is accessible from the front of the unit)
D3	Device slot 3 (disk drive number1, which is the second bay down first disk on left.)

How did I know all the above? By using my Redbooks and referencing the location with the schematic representation of the model, I know exactly where it is located!

Typically, you will find location codes referenced purely by the location code only and not by the system details. This is certainly true if the devices are all coming off the same planar, and in such cases, the following format would be used: Un<location>

These might be in the following format, which translates to: Unit1, drawer/frame 6, first planar,

I/O slot 2.

U1.6-P1-I2

When dealing with card slots, these could contain dual ports, for example Ethernet or fiber cards. If this is the case, then the location will have an alphabet "T" associated with it, and the number following the T is the port number. If a location has "T" but is not a card slot (and no C in the location code), then you can be pretty much assured that this is an integrated (on-board) interface. Here, I am thinking of serial ports or Ethernet ports.

Let's now turn our attention to a fiber card slot. Looking at the fiber card (fcs0) location:

lscfg -vl fcs0

Physical Location: U789C.001.DQD3F62-P1-C1-T1

Looking at the location code:

P1-C1-T1

We know that:

Code	Description
P1	Planar 1. This is accessible from the backside of the unit.
C1	Card (PCI) slot 1. This is the first PCI slot looking down from the top of the unit.
T1	This is the first port (upper port).

Looking at an integral (on-board) port, say a Hardware Management Console (HMC) Ethernet port:

P1-T5

We know that this is not a card as there is no "C' in the location.

Code	Description
P1	Planar 1. This is accessible from the backside of the unit.
T5	Slot 5. Located at the left side of the machine. Left port.

Another way to tell if a device could be on-board is when the device returns an AIX internal location code as well. For example, taken from a Power Systems 570 model, here are a couple of Ethernet devices:

ent0 Available 02-08 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
ent1 Available 02-09 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)

ent0 U7879.001.DQD1AE7-P1-T6
ent1 U7879.001.DQD1AE7-P1-T7

The above AIX internal location code (02-08 and 02-09) informs us that the Ethernet devices are both using the same address location, 02. As there is no "C" in the physical location, that is no card slot, we can assume that this is a on-board dual port. As a rule of thumb, if you have two on-board devices that are T1 and T2, if one pair is horizontal, T1 will be on the right and T2 on the left. If the devices are vertical then T1 will be at the top and T2 beneath it.

Many systems have storage area network (SAN) storage. It is good to know how to locate the logical unit number (LUN), looking at hdisk0 which is an IBM System Storage® DS3400 disk system, on a 570 model:

lscfg -vl hdisk20
hdisk2 0U7879.001.DQD1AE7-P1-C2-T1-W202500A0B85B6194-LE000000000000 MPIO Other DS3K Array Disk

The above output is pretty long. So, let's see what is going with that location code:

Code	Description
P1	Planar 1 (this is accessible from the backside of the unit)
C2	Card (PCI) slot 2 fiber card
T1	First fiber port (top)
W202500A0B85B6194	Worldwide port identifier on the remote SAN switch
LE000000000000	(Logical) LUN ID (in hexadecimal value) of the remote disk

Some codes will have an alphabet "L" followed by a number, and these are logical paths. Typical users of logical paths are SCSI disks including RAID disks. For example, a SCSI disk array location is shown below:

P1-C8-T1-L0-L0 SCSI RAID 5 Disk Array

Hot plugs to go

Knowing your PCI hot-plug cards is always good, because if a hot-plug device fails, it takes a few minutes to replace them. To view you PCI hots, use the following command:

lsslot -c pci
# Slot Description Device(s)
U789C.001.DQD3F62-P1-C1 PCI-E capable, Rev 1 slot with 8x lanes fcs0
U789C.001.DQD3F62-P1-C2 PCI-E capable, Rev 1 slot with 8x lanes fcs1
U789C.001.DQD3F62-P1-C3 PCI-E capable, Rev 1 slot with 16x lanes Empty
U789C.001.DQD3F62-P1-C4 PCI-X capable, 64 bit, 266MHz slot Empty
U789C.001.DQD3F62-P1-C5 PCI-X capable, 64 bit, 266MHz slot Empty

We have aleady covered the location of the fcs0 card. However, we can see that both active cards (fcs0, fcs1) are next to each other. we already know that fcs0 is the first card and is at the top of the slots. So, the second one down is the fscs1 card. The other three slots P1-C3, C4, and C5 are unallocated.

To view all your logical swaps use the following command:

lsslot -c slot

Get that location code quickly

If you are still confused about the locations, and you have a failed unit and the IBM engineer is knocking at the door, expecting you to know where the failing device is, you can always go into 'SMIT diag' and identify that failing device. This tells you the location code as well. Be sure to review your errpt before you do so to confirm that you are identifying the correct device.

Fix that attention light flashing

After you have a failure device, you will be able to identify it through an indicator light flashing in amber, or through a symbol when you are logged on to the HMC. After the device is replaced or fixed, turn the status back to normal using with:

/usr/lpp/diagnostics/bin/usysfault -s normal

Conclusion

So, now you know how the location codes can help you. As mentioned earlier, you need to know this if you are replacing devices. However, as described, it is very much dependent on you having access to your system hardware documentation, as these are dependent of your particular system.