IBM Power System 8335-GCA, Power System S812LC, Power System 8335-GTB, Power System S822LC, Power System 8348-21C, Power System 8335-GTA User Manual
Specifications and Main Features
Frequently Asked Questions
User Manual
Power Systems
Problem analysis, system parts, and
locations for the IBM Power System
S822LC (8335-GCA, 8335-GTA, and
8335-GTB), and IBM Power System
S812LC (8348-21C)
IBM
Power Systems
Problem analysis, system parts, and
locations for the IBM Power System
S822LC (8335-GCA, 8335-GTA, and
8335-GTB), and IBM Power System
S812LC (8348-21C)
IBM
Note
Before using this information and the product it supports, read the information in “Safety notices” on page v, “Notices” on
page 145, the IBM Systems Safety Notices manual, G229-9054, and the IBM Environmental Notices and User Guide, Z125–5823.
This edition applies to IBM Power Systems™servers that contain the POWER8®processor and to all associated
models.
Class A Notices................................. 148
Class B Notices ................................. 152
Terms and conditions................................ 155
ivProblem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
Safety notices
Safety notices may be printed throughout this guide:
v DANGER notices call attention to a situation that is potentially lethal or extremely hazardous to
people.
v CAUTION notices call attention to a situation that is potentially hazardous to people because of some
existing condition.
v Attention notices call attention to the possibility of damage to a program, device, system, or data.
World Trade safety information
Several countries require the safety information contained in product publications to be presented in their
national languages. If this requirement applies to your country, safety information documentation is
included in the publications package (such as in printed documentation, on DVD, or as part of the
product) shipped with the product. The documentation contains the safety information in your national
language with references to the U.S. English source. Before using a U.S. English publication to install,
operate, or service this product, you must first become familiar with the related safety information
documentation. You should also refer to the safety information documentation any time you do not
clearly understand any safety information in the U.S. English publications.
Replacement or additional copies of safety information documentation can be obtained by calling the IBM
Hotline at 1-800-300-8751.
German safety information
Das Produkt ist nicht für den Einsatz an Bildschirmarbeitsplätzen im Sinne § 2 der
Bildschirmarbeitsverordnung geeignet.
Laser safety information
IBM®servers can use I/O cards or features that are fiber-optic based and that utilize lasers or LEDs.
Laser compliance
IBM servers may be installed inside or outside of an IT equipment rack.
DANGER: When working on or around the system, observe the following precautions:
Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid
a shock hazard:
v If IBM supplied the power cord(s), connect power to this unit only with the IBM provided power cord.
Do not use the IBM provided power cord for any other product.
v Do not open or service any power supply assembly.
v Do not connect or disconnect any cables or perform installation, maintenance, or reconfiguration of this
product during an electrical storm.
v The product might be equipped with multiple power cords. To remove all hazardous voltages,
disconnect all power cords.
– For AC power, disconnect all power cords from their AC power source.
– For racks with a DC power distribution panel (PDP), disconnect the customer’s DC power source to
the PDP.
v When connecting power to the product ensure all power cables are properly connected.
– For racks with AC power, connect all power cords to a properly wired and grounded electrical
outlet. Ensure that the outlet supplies proper voltage and phase rotation according to the system
rating plate.
– For racks with a DC power distribution panel (PDP), connect the customer’s DC power source to
the PDP. Ensure that the proper polarity is used when attaching the DC power and DC power
return wiring.
v Connect any equipment that will be attached to this product to properly wired outlets.
v When possible, use one hand only to connect or disconnect signal cables.
v Never turn on any equipment when there is evidence of fire, water, or structural damage.
v Do not attempt to switch on power to the machine until all possible unsafe conditions are corrected.
v Assume that an electrical safety hazard is present. Perform all continuity, grounding, and power checks
specified during the subsystem installation procedures to ensure that the machine meets safety
requirements.
v Do not continue with the inspection if any unsafe conditions are present.
v Before you open the device covers, unless instructed otherwise in the installation and configuration
procedures: Disconnect the attached AC power cords, turn off the applicable circuit breakers located in
the rack power distribution panel (PDP), and disconnect any telecommunications systems, networks,
and modems.
DANGER:
v Connect and disconnect cables as described in the following procedures when installing, moving, or
opening covers on this product or attached devices.
To Disconnect:
1. Turn off everything (unless instructed otherwise).
2. For AC power, remove the power cords from the outlets.
3. For racks with a DC power distribution panel (PDP), turn off the circuit breakers located in the
PDP and remove the power from the Customer's DC power source.
4. Remove the signal cables from the connectors.
5. Remove all cables from the devices.
To Connect:
1. Turn off everything (unless instructed otherwise).
2. Attach all cables to the devices.
3. Attach the signal cables to the connectors.
4. For AC power, attach the power cords to the outlets.
5. For racks with a DC power distribution panel (PDP), restore the power from the Customer's DC
power source and turn on the circuit breakers located in the PDP.
6. Turn on the devices.
Sharp edges, corners and joints may be present in and around the system. Use care when handling
equipment to avoid cuts, scrapes and pinching. (D005)
(R001 part 1 of 2):
DANGER: Observe the following precautions when working on or around your IT rack system:
v Heavy equipment–personal injury or equipment damage might result if mishandled.
v Always lower the leveling pads on the rack cabinet.
v Always install stabilizer brackets on the rack cabinet.
v To avoid hazardous conditions due to uneven mechanical loading, always install the heaviest devices
in the bottom of the rack cabinet. Always install servers and optional devices starting from the bottom
of the rack cabinet.
v Rack-mounted devices are not to be used as shelves or work spaces. Do not place objects on top of
rack-mounted devices. In addition, do not lean on rack mounted devices and do not use them to
stabilize your body position (for example, when working from a ladder).
viProblem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
v Each rack cabinet might have more than one power cord.
– For AC powered racks, be sure to disconnect all power cords in the rack cabinet when directed to
disconnect power during servicing.
– For racks with a DC power distribution panel (PDP), turn off the circuit breaker that controls the
power to the system unit(s), or disconnect the customer’s DC power source, when directed to
disconnect power during servicing.
v Connect all devices installed in a rack cabinet to power devices installed in the same rack cabinet. Do
not plug a power cord from a device installed in one rack cabinet into a power device installed in a
different rack cabinet.
v An electrical outlet that is not correctly wired could place hazardous voltage on the metal parts of the
system or the devices that attach to the system. It is the responsibility of the customer to ensure that
the outlet is correctly wired and grounded to prevent an electrical shock.
(R001 part 2 of 2):
CAUTION:
v Do not install a unit in a rack where the internal rack ambient temperatures will exceed the
manufacturer's recommended ambient temperature for all your rack-mounted devices.
v Do not install a unit in a rack where the air flow is compromised. Ensure that air flow is not blocked
or reduced on any side, front, or back of a unit used for air flow through the unit.
v Consideration should be given to the connection of the equipment to the supply circuit so that
overloading of the circuits does not compromise the supply wiring or overcurrent protection. To
provide the correct power connection to a rack, refer to the rating labels located on the equipment in
the rack to determine the total power requirement of the supply circuit.
v (For sliding drawers.) Do not pull out or install any drawer or feature if the rack stabilizer brackets are
not attached to the rack. Do not pull out more than one drawer at a time. The rack might become
unstable if you pull out more than one drawer at a time.
v (For fixed drawers.) This drawer is a fixed drawer and must not be moved for servicing unless specified
by the manufacturer. Attempting to move the drawer partially or completely out of the rack might
cause the rack to become unstable or cause the drawer to fall out of the rack.
Safety noticesvii
CAUTION:
Removing components from the upper positions in the rack cabinet improves rack stability during
relocation. Follow these general guidelines whenever you relocate a populated rack cabinet within a
room or building.
v Reduce the weight of the rack cabinet by removing equipment starting at the top of the rack
cabinet. When possible, restore the rack cabinet to the configuration of the rack cabinet as you
received it. If this configuration is not known, you must observe the following precautions:
– Remove all devices in the 32U position (compliance ID RACK-001 or 22U (compliance ID RR001)
and above.
– Ensure that the heaviest devices are installed in the bottom of the rack cabinet.
– Ensure that there are little-to-no empty U-levels between devices installed in the rack cabinet
below the 32U (compliance ID RACK-001 or 22U (compliance ID RR001) level, unless the
received configuration specifically allowed it.
v If the rack cabinet you are relocating is part of a suite of rack cabinets, detach the rack cabinet from
the suite.
v If the rack cabinet you are relocating was supplied with removable outriggers they must be
reinstalled before the cabinet is relocated.
v Inspect the route that you plan to take to eliminate potential hazards.
v Verify that the route that you choose can support the weight of the loaded rack cabinet. Refer to the
documentation that comes with your rack cabinet for the weight of a loaded rack cabinet.
v Verify that all door openings are at least 760 x 230 mm (30 x 80 in.).
v Ensure that all devices, shelves, drawers, doors, and cables are secure.
v Ensure that the four leveling pads are raised to their highest position.
v Ensure that there is no stabilizer bracket installed on the rack cabinet during movement.
v Do not use a ramp inclined at more than 10 degrees.
v When the rack cabinet is in the new location, complete the following steps:
– Lower the four leveling pads.
– Install stabilizer brackets on the rack cabinet.
– If you removed any devices from the rack cabinet, repopulate the rack cabinet from the lowest
position to the highest position.
v If a long-distance relocation is required, restore the rack cabinet to the configuration of the rack
cabinet as you received it. Pack the rack cabinet in the original packaging material, or equivalent.
Also lower the leveling pads to raise the casters off of the pallet and bolt the rack cabinet to the
pallet.
(R002)
(L001)
DANGER: Hazardous voltage, current, or energy levels are present inside any component that has this
label attached. Do not open any cover or barrier that contains this label. (L001)
(L002)
viiiProblem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
DANGER: Rack-mounted devices are not to be used as shelves or work spaces. (L002)
1
2
!
1
2
12
3
4
(L003)
or
or
or
Safety noticesix
1
2
3
4
or
DANGER: Multiple power cords. The product might be equipped with multiple AC power cords or
multiple DC power cables. To remove all hazardous voltages, disconnect all power cords and power
cables. (L003)
(L007)
CAUTION: A hot surface nearby. (L007)
(L008)
xProblem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
CAUTION: Hazardous moving parts nearby. (L008)
All lasers are certified in the U.S. to conform to the requirements of DHHS 21 CFR Subchapter J for class
1 laser products. Outside the U.S., they are certified to be in compliance with IEC 60825 as a class 1 laser
product. Consult the label on each part for laser certification numbers and approval information.
CAUTION:
This product might contain one or more of the following devices: CD-ROM drive, DVD-ROM drive,
DVD-RAM drive, or laser module, which are Class 1 laser products. Note the following information:
v Do not remove the covers. Removing the covers of the laser product could result in exposure to
hazardous laser radiation. There are no serviceable parts inside the device.
v Use of the controls or adjustments or performance of procedures other than those specified herein
might result in hazardous radiation exposure.
(C026)
CAUTION:
Data processing environments can contain equipment transmitting on system links with laser modules
that operate at greater than Class 1 power levels. For this reason, never look into the end of an optical
fiber cable or open receptacle. Although shining light into one end and looking into the other end of
a disconnected optical fiber to verify the continuity of optic fibers many not injure the eye, this
procedure is potentially dangerous. Therefore, verifying the continuity of optical fibers by shining
light into one end and looking at the other end is not recommended. To verify continuity of a fiber
optic cable, use an optical light source and power meter. (C027)
CAUTION:
This product contains a Class 1M laser. Do not view directly with optical instruments. (C028)
CAUTION:
Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the following
information: laser radiation when open. Do not stare into the beam, do not view directly with optical
instruments, and avoid direct exposure to the beam. (C030)
CAUTION:
The battery contains lithium. To avoid possible explosion, do not burn or charge the battery.
Do Not:
v ___ Throw or immerse into water
v ___ Heat to more than 100°C (212°F)
v ___ Repair or disassemble
Exchange only with the IBM-approved part. Recycle or discard the battery as instructed by local
regulations. In the United States, IBM has a process for the collection of this battery. For information,
call 1-800-426-4333. Have the IBM part number for the battery unit available when you call. (C003)
Safety noticesxi
CAUTION:
Regarding IBM provided VENDOR LIFT TOOL:
v Operation of LIFT TOOL by authorized personnel only.
v LIFT TOOL intended for use to assist, lift, install, remove units (load) up into rack elevations. It is
not to be used loaded transporting over major ramps nor as a replacement for such designated tools
like pallet jacks, walkies, fork trucks and such related relocation practices. When this is not
practicable, specially trained persons or services must be used (for instance, riggers or movers).
v Read and completely understand the contents of LIFT TOOL operator's manual before using.
Failure to read, understand, obey safety rules, and follow instructions may result in property
damage and/or personal injury. If there are questions, contact the vendor's service and support.
Local paper manual must remain with machine in provided storage sleeve area. Latest revision
manual available on vendor's web site.
v Test verify stabilizer brake function before each use. Do not over-force moving or rolling the LIFT
TOOL with stabilizer brake engaged.
v Do not move LIFT TOOL while platform is raised, except for minor positioning.
v Do not exceed rated load capacity. See LOAD CAPACITY CHART regarding maximum loads at
center versus edge of extended platform.
v Only raise load if properly centered on platform. Do not place more than 200 lb (91 kg) on edge of
sliding platform shelf also considering the load's center of mass/gravity (CoG).
v Do not corner load the platform tilt riser accessory option. Secure platform riser tilt option to main
shelf in all four (4x) locations with provided hardware only, prior to use. Load objects are designed
to slide on/off smooth platforms without appreciable force, so take care not to push or lean. Keep
riser tilt option flat at all times except for final minor adjustment when needed.
v Do not stand under overhanging load.
v Do not use on uneven surface, incline or decline (major ramps).
v Do not stack loads.
v Do not operate while under the influence of drugs or alcohol.
v Do not support ladder against LIFT TOOL.
v Tipping hazard. Do not push or lean against load with raised platform.
v Do not use as a personnel lifting platform or step. No riders.
v Do not stand on any part of lift. Not a step.
v Do not climb on mast.
v Do not operate a damaged or malfunctioning LIFT TOOL machine.
v Crush and pinch point hazard below platform. Only lower load in areas clear of personnel and
obstructions. Keep hands and feet clear during operation.
v No Forks. Never lift or move bare LIFT TOOL MACHINE with pallet truck, jack or fork lift.
v Mast extends higher than platform. Be aware of ceiling height, cable trays, sprinklers, lights, and
other overhead objects.
v Do not leave LIFT TOOL machine unattended with an elevated load.
v Watch and keep hands, fingers, and clothing clear when equipment is in motion.
v Turn Winch with hand power only. If winch handle cannot be cranked easily with one hand, it is
probably over-loaded. Do not continue to turn winch past top or bottom of platform travel.
Excessive unwinding will detach handle and damage cable. Always hold handle when lowering,
unwinding. Always assure self that winch is holding load before releasing winch handle.
v A winch accident could cause serious injury. Not for moving humans. Make certain clicking sound
is heard as the equipment is being raised. Be sure winch is locked in position before releasing
handle. Read instruction page before operating this winch. Never allow winch to unwind freely.
Freewheeling will cause uneven cable wrapping around winch drum, damage cable, and may cause
serious injury. (C048)
Power and cabling information for NEBS (Network Equipment-Building System)
GR-1089-CORE
The following comments apply to the IBM servers that have been designated as conforming to NEBS
(Network Equipment-Building System) GR-1089-CORE:
xiiProblem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
The equipment is suitable for installation in the following:
v Network telecommunications facilities
v Locations where the NEC (National Electrical Code) applies
The intrabuilding ports of this equipment are suitable for connection to intrabuilding or unexposed
wiring or cabling only. The intrabuilding ports of this equipment must not be metallically connected to the
interfaces that connect to the OSP (outside plant) or its wiring. These interfaces are designed for use as
intrabuilding interfaces only (Type 2 or Type 4 ports as described in GR-1089-CORE) and require isolation
from the exposed OSP cabling. The addition of primary protectors is not sufficient protection to connect
these interfaces metallically to OSP wiring.
Note: All Ethernet cables must be shielded and grounded at both ends.
The ac-powered system does not require the use of an external surge protection device (SPD).
The dc-powered system employs an isolated DC return (DC-I) design. The DC battery return terminal
shall not be connected to the chassis or frame ground.
The dc-powered system is intended to be installed in a common bonding network (CBN) as described in
GR-1089-CORE.
Safety noticesxiii
xivProblem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
Beginning troubleshooting and problem analysis
This information provides a starting point for analyzing problems.
This information is the starting point for diagnosing and repairing systems. From this point, you are
guided to the appropriate information to help you diagnose problems, determine the appropriate repair
action, and then complete the necessary steps to repair the system.
Note: Update the system firmware to the latest level before you start problem analysis. If you update the
system firmware, you will have the latest available fixes and improvements for error handling, reporting,
and isolation. For instructions about updating the system firmware, see Getting fixes.
What type of problem are you dealing with?Problem analysis procedure
You do not know the type of problem.Go to “Determining the problem analysis procedure to
perform.”
A baseboard management controller (BMC) access
problem occurred.
The system does not power on (the power button or the
BMC power on command does not power on the
system).
A system firmware boot failure occurred (the system
started but was not able to boot to the Petitboot menu).
A video graphics array (VGA) monitor problem occurred
(the system started but video is not displayed on the
monitor).
An operating system boot failure occurred (the system
booted to the Petitboot menu but the operating system
did not start).
A BMC dashboard sensor is red.Go to “Resolving a sensor indicator problem” on page
A processor, memory, power, or cooling hardware failure
occurred.
Missing or faulty graphics processing unit (GPU), PCIe
adapter, disk drive, or solid-state drive.
Go to “Resolving a BMC access problem” on page 2.
Go to “Resolving a power problem” on page 3.
Go to “Resolving a system firmware boot failure” on
page 4.
Go to “Resolving a VGA monitor problem” on page 8.
Go to “Resolving an operating system boot failure” on
page 9.
11.
Go to “Resolving a hardware problem” on page 12.
Go to Resolving a GPU, PCIe adapter, or device problem.
Determining the problem analysis procedure to perform
Learn how to identify the correct problem analysis procedure to perform.
To determine the correct problem analysis procedure to perform, complete the following steps:
1. After you apply power to the system, do the power supply LEDs display XXX and after 30 seconds
the power button flashes?
IfThen
Yes:Continue with the next step.
No:Go to “Resolving a power problem” on page 3.
2. Can you access the baseboard management controller (BMC) across the network?
IfThen
Yes:Continue with the next step.
No:Go to “Resolving a BMC access problem.”
3. Can you boot the system to the Petitboot menu?
IfThen
Yes:Continue with the next step.
No:Go to “Resolving a system firmware boot failure” on page 4.
4. Is video displayed on the video graphics array (VGA) monitor?
IfThen
Yes:Continue with the next step.
No:Go to “Resolving a VGA monitor problem” on page 8.
5. Can you start the operating system?
IfThen
Yes:Continue with the next step.
No:Go to “Resolving an operating system boot failure” on page 9.
6. On the BMC dashboard, are any sensors red?
IfThen
Yes:Go to “Resolving a sensor indicator problem” on page 11.
No:Continue with the next step.
7. Go to “Resolving a hardware problem” on page 12. This ends the procedure.
Resolving a BMC access problem
Learn how to identify the service action that is needed to resolve a baseboard management controller
(BMC) access problem.
1. Ensure that the BMC password is not set to the default password. For information about changing the
default password, see Logging on to the BMC GUI. Does the problem persist?
IfThen
Yes:Continue with the next step.
No:This ends the procedure.
2. Are both ends of the network cable seated securely?
IfThen
Yes:Continue with the next step.
No:Seat both ends of the cable securely. If the problem persists, continue with the next step.
3. Power off the system and disconnect all ac power cords for 30 seconds. Then, reconnect the ac power
cords and power on the system. Does the BMC access problem persist?
2Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
IfThen
Yes:Continue with the next step.
No:This ends the procedure.
4. Verify that the BMC network settings are correct.
a. Power on the system by using the power button on the front of the system. Wait 1 - 2 minutes for
the system to display the Petitboot menu.
b. When the Petitboot menu is displayed, press any key to interrupt the boot process. Then, select
Exit to Shell.
c. Type the following command and press Enter:
ipmitool lan print 1
d. Verify that the MAC address and the IP address settings are correct. Then, continue with the next
step.
Note: If the IP address setting is incorrect, go to Configuring the firmware IP address
website(http://www.ibm.com/support/knowledgecenter/linuxonibm/liabw/
liabwenablenetwork.htm). If the MAC address is 00:00:00:00:00:00, go to “Contacting IBM service
and support” on page 110.
5. Complete the following actions:
a. Power on to the Petitboot menu.
b. Use the BMC to update the system firmware. For instructions, see Updating the system firmware
by using the BMC.
Are you able to access the BMC?
IfThen
Yes:This ends the procedure.
No:Continue with the next step.
6. Complete the service action that is indicated for your system:
v If your system is an 8335-GCA or 8335-GTA, replace the system backplane. Go to “8335-GCA and
8335-GTA locations” on page 111 to identify the physical location and the removal and replacement
procedure. This ends the procedure.
v If your system is an 8335-GTB, replace the BMC card. Go to “8335-GTB locations” on page 121 to
identify the physical location and the removal and replacement procedure. This ends the
procedure.
v If your system is an 8348-21C, replace the system backplane. Go to “8348-21C locations” on page
133 to identify the physical location and the removal and replacement procedure. This ends the
procedure.
Resolving a power problem
Learn how to identify the service action that is needed to resolve a power problem.
1. Is the amber LED of a power supply on solid and is the amber LED on the front of the system turned
off?
IfThen
Yes:Ensure that the power cords for both power supplies are fully seated and that the power
distribution units (PDUs) and power outlets are supplying electricity. This ends the
procedure.
No:Continue with the next step.
Beginning troubleshooting and problem analysis3
2. Are the power supply LEDs turned off?
IfThen
Yes:Continue with the next step.
No:Continue with step 4.
3. Perform the following actions, one at a time, until the problem is resolved:
a. Ensure that all of the power cords are fully seated in the power supplies.
b. Ensure that all of the power cords are fully seated in the power distribution units (PDUs) or wall
outlets.
c. If the power cords are plugged into PDUs, ensure that the PDUs are turned on.
d. Ensure that all of the power cords are plugged into PDUs or wall outlets that are supplying
electricity.
e. Replace the power cords.
f. Replace the power supplies.
v If your system is an 8335-GCA or 8335-GTA, go to “8335-GCA and 8335-GTA locations” on page
111 to identify the physical location and the removal and replacement procedure.
v If your system is an 8335-GTB, go to “8335-GTB locations” on page 121 to identify the physical
location and the removal and replacement procedure.
v If your system is an 8348-21C, go to “8348-21C locations” on page 133 to identify the physical
location and the removal and replacement procedure.
This ends the procedure.
4. Is the amber LED of a power supply on solid and is the red LED on the front of the system flashing
at 0.25 Hz?
IfThen
Yes:Continue with the next step.
No:Go to “Contacting IBM service and support” on page 110. This ends the procedure.
5. Perform the following actions, one at a time, until the problem is resolved:
a. Ensure that the power supply is fully seated in the system.
b. Ensure that the power supply fan is not blocked.
c. Replace the power supply.
v If your system is an 8335-GCA or 8335-GTA, go to “8335-GCA and 8335-GTA locations” on page
111 to identify the physical location and the removal and replacement procedure.
v If your system is an 8335-GTB, go to “8335-GTB locations” on page 121 to identify the physical
location and the removal and replacement procedure.
v If your system is an 8348-21C, go to “8348-21C locations” on page 133 to identify the physical
location and the removal and replacement procedure.
This ends the procedure.
Resolving a system firmware boot failure
Learn how to identify the service action that is needed to resolve a failure while booting your system
firmware.
1. After you pressed the power button, did the system turn on but fail to display the Petitboot menu?
IfThen
Yes:Continue with the next step.
4Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
IfThen
No:Continue with step 5.
2.Does the baseboard management controller (BMC) respond to commands?
Note: To determine whether the BMC responds to commands, run the following ipmitool command:
ipmitool -I lanplus -U <username> -P <password> -H <bmc ip or bmc hostname> chassis status
IfThen
Yes:Continue with the next step.
No:Continue with step 4.
3. Complete the following actions:
a. Use the BMC to update the system firmware. For instructions, see Updating the system firmware
by using the BMC.
b. Check the system event logs. For instructions, see “Identifying a service action by using system
event logs” on page 27. Then, continue with step 5.
4. Complete the following actions, one at a time, until the problem is resolved:
a. Reset the BMC remotely by entering the following command:
ipmitool -I lanplus -U <username> -P <password> -H <bmc ip or bmc hostname> mc reset cold
b. Disconnect the power cords from the system for 30 seconds. Reconnect the power cords, wait 5
minutes, and then go to step 2.
c. Use the IPMI tool to update the system firmware. For instructions, see Updating the system
firmware by using the IPMI tool.
d. Complete the service action that is indicated for your system:
v If your system is an 8335-GCA or 8335-GTA, replace the system backplane. Go to “8335-GCA
and 8335-GTA locations” on page 111 to identify the physical location and the removal and
replacement procedure.
v If your system is an 8335-GTB, replace the BMC card. Go to “8335-GTB locations” on page 121
to identify the physical location and the removal and replacement procedure.
v If your system is an 8348-21C, replace the system backplane. Go to “8348-21C locations” on
page 133 to identify the physical location and the removal and replacement procedure.
This ends the procedure.
5. Are you here because of a system event log (SEL) with the value OEM record c0 and OEM c0
specific log information 3a1503xxxxxx?
IfThen
Yes:Continue with step 8 on page 6.
No:Continue with the next step.
6. Are you here because of a SEL event with the value OEM record c0 and OEM c0 specific log
information 3a1504xxxxxx?
IfThen
Yes:Continue with step 12 on page 7.
No:Continue with the next step.
7. Power off the system and disconnect all ac power cords for 30 seconds. Then, reconnect the ac
power cords and power on the system. Does the system boot successfully?
Beginning troubleshooting and problem analysis5
IfThen
Yes:This ends the procedure.
No:Go to “Resolving a hardware problem” on page 12. This ends the procedure.
8. Did the system complete the boot process successfully?
IfThen
Yes:Continue with the next step.
No:Continue with step 12 on page 7.
9. Determine whether the system is booted from the user-updated level of the system firmware image
(primary side) or the manufacturing level of the system firmware image (golden side).
v For in-band networks, enter the following command:
ipmitool sensor list | grep -i golden
v To run the command remotely over the LAN, enter the following command:
ipmitool -I lanplus -U <username> -P <password> -H <BMC IP address or BMC hostname>
sensor list | grep -i golden
Do both of the returned records show 0x0080 in the data fields?
IfThen
Yes:The error was temporary. No service action is required. This ends the procedure.
No:One or both of the returned records have 0x0180 in the data fields. The system was booted
from the golden side. Continue with the next step.
10. Search for processor deconfiguration SEL events that have a time stamp in close proximity to the
time stamp of the event with value OEM record c0 that sent you here. Processor deconfiguration
SEL events are displayed in the following form:
v Processor CPU Func x | Transition to Non-recoverable | Asserted
Are processor deconfiguration events present?
IfThen
Yes:Complete the service actions for the processor deconfiguration events.
v If your system is an 8335-GCA or 8335-GTA, go to “Identifying a service action by using
sensor and event information for the 8335-GCA and 8335-GTA” on page 37. This ends
the procedure.
v If your system is an 8335-GTB, go to “Identifying a service action by using sensor and
event information for the 8335-GTB” on page 57. This ends the procedure.
v If your system is an 8348-21C, go to “Identifying a service action by using sensor and
event information for the 8348-21C” on page 78. This ends the procedure.
No:Continue with the next step.
11. Are there other types of SEL events that require a service action and have a time stamp in close
proximity to the time stamp of the event with value OEM record c0 that sent you here?
6Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
IfThen
Yes:Complete the service actions for the SEL events that require service actions.
v If your system is an 8335-GCA or 8335-GTA, go to “Identifying a service action by using
sensor and event information for the 8335-GCA and 8335-GTA” on page 37. This ends
the procedure.
v If your system is an 8335-GTB, go to “Identifying a service action by using sensor and
event information for the 8335-GTB” on page 57. This ends the procedure.
v If your system is an 8348-21C, go to “Identifying a service action by using sensor and
event information for the 8348-21C” on page 78. This ends the procedure.
No:If the boot problem persists, reload or update the system firmware image. Go to Getting
fixes and reload the system firmware with the same level of firmware or update the system
firmware with a more recent level of firmware. Then, reboot the system. This ends the
procedure.
12. Search for processor deconfiguration SEL events that have a time stamp in close proximity to the
time stamp of the event with value OEM record c0 that sent you here. Processor deconfiguration
SEL events are displayed in the following form:
v Processor CPU Func x | Transition to Non-recoverable | Asserted
Are processor deconfiguration events present?
IfThen
Yes:Complete the service actions for the processor deconfiguration events.
v If your system is an 8335-GCA or 8335-GTA, go to “Identifying a service action by using
sensor and event information for the 8335-GCA and 8335-GTA” on page 37. This ends
the procedure.
v If your system is an 8335-GTB, go to “Identifying a service action by using sensor and
event information for the 8335-GTB” on page 57. This ends the procedure.
v If your system is an 8348-21C, go to “Identifying a service action by using sensor and
event information for the 8348-21C” on page 78. This ends the procedure.
No:Continue with the next step.
13. Are there other types of SEL events that require a service action and have a time stamp in close
proximity to the time stamp of the event with value OEM record c0 that sent you here?
IfThen
Yes:Complete the service actions for the SEL events that require service actions.
v If your system is an 8335-GCA or 8335-GTA, go to “Identifying a service action by using
sensor and event information for the 8335-GCA and 8335-GTA” on page 37. This ends
the procedure.
v If your system is an 8335-GTB, go to “Identifying a service action by using sensor and
event information for the 8335-GTB” on page 57. This ends the procedure.
v If your system is an 8348-21C, go to “Identifying a service action by using sensor and
event information for the 8348-21C” on page 78. This ends the procedure.
No:Continue with the next step.
14. Power off the system and disconnect all AC power cords for 30 seconds. Then, reconnect the AC
power cords and power on the system. Does the system boot successfully?
IfThen
Yes:This ends the procedure.
No:Continue with the next step.
Beginning troubleshooting and problem analysis7
15. Is the system an 8348-21C, and are all 32 of the DIMM locations populated with 32 GB DIMMs?
IfThen
Yes:Continue with the next step.
No:Go to step 18.
16. Use the baseboard management controller (BMC) to update the system firmware. For instructions,
see Updating the system firmware by using the BMC. Does the problem persist?
IfThen
Yes:Continue with the next step.
No:This ends the procedure.
17. Is your system is an 8335-GTB?
IfThen
Yes:Replace the Baseboard management controller (BMC) card. Go to “8335-GTB locations” on
page 121 to identify the physical location and the removal and replacement procedure. If
the problem persists, continue with the next step. Otherwise, this ends the procedure.
No:Continue with the next step.
18. Replace the system backplane.
v If your system is an 8335-GCA or 8335-GTA, go to “8335-GCA and 8335-GTA locations” on page
111 to identify the physical location and the removal and replacement procedure. Then, continue
with the next step.
v If your system is an 8335-GTB, go to “8335-GTB locations” on page 121 to identify the physical
location and the removal and replacement procedure. Then, continue with the next step.
v If your system is an 8348-21C, go to “8348-21C locations” on page 133 to identify the physical
location and the removal and replacement procedure. Then, continue with the next step.
19. Does the problem persist?
IfThen
Yes:Go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and
support” on page 110. This ends the procedure.
No:This ends the procedure.
Resolving a VGA monitor problem
Learn how to identify the service action that is needed to resolve a video graphics array (VGA) monitor
problem.
1. Is the system powered on and is the VGA monitor connected to the VGA display port, but video is
not displayed?
IfThen
Yes:Continue with the next step.
No:This ends the procedure.
2. Complete the following steps, one at a time until the problem is resolved:
a. Ensure that the VGA cable is properly seated to the server port and to the monitor port.
8Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
b. Verify that the monitor and the VGA cable are working properly by testing them on a system that
is known to be working properly. If the monitor or the VGA cable does not work properly, replace
it.
c. Verify that the system is powered on by activating a serial over LAN (SOL) session through the
baseboard management controller (BMC). If the system is not active, go to “Resolving a system
firmware boot failure” on page 4.
d. Replace the system backplane.
v If your system is an 8335-GCA or 8335-GTA, go to “8335-GCA and 8335-GTA locations” on page
111 to identify the physical location and the removal and replacement procedure.
v If your system is an 8335-GTB, go to “8335-GTB locations” on page 121 to identify the physical
location and the removal and replacement procedure.
v If your system is an 8348-21C, go to “8348-21C locations” on page 133 to identify the physical
location and the removal and replacement procedure.
This ends the procedure.
Resolving an operating system boot failure
Learn how to identify the service action that is needed to resolve a failure while booting your operating
system.
1. Was the system recently installed, serviced, moved, or upgraded?
IfThen
Yes:Ensure that all cables are properly seated in the connection path to the designated boot
device. This ends the procedure.
No:Continue with the next step.
2. Are you booting the operating system from a network location?
IfThen
Yes:Continue with the next step.
No:Continue with step 4.
3. Complete the following actions, one at a time, until the problem is resolved:
a. Ensure that a problem does not exist with the connection to the network location.
b. Ensure that the adapter has a valid IP address for the network.
c. Replace the network adapter.
v If your system is an 8335-GCA or 8335-GTA, go to “8335-GCA and 8335-GTA locations” on
page 111 to identify the physical location and the removal and replacement procedure.
v If your system is an 8335-GTB, go to “8335-GTB locations” on page 121 to identify the physical
location and the removal and replacement procedure.
v If your system is an 8348-21C, go to “8348-21C locations” on page 133 to identify the physical
location and the removal and replacement procedure.
4. Petitboot displays all recognized bootable images to use by default. Is the boot image recognized by
Petitboot?
IfThen
Yes:Continue with step 11 on page 11.
No:Select the Petitboot menu option to refresh the boot images. If the problem persists,
continue with the next step.
Beginning troubleshooting and problem analysis9
5. Is the system an 8348-21C, and is the boot image on a storage device that is configured in a RAID
configuration?
IfThen
Yes:Continue with the next step.
No:Continue with step 11 on page 11.
6. On the Petitboot command line, type the following command:
arcconf getconfig 1 LD
Is the logical boot drive recognized and in optimal status?
IfThen
Yes:Reinstall the operating system on the logical drive. This ends the procedure.
No:Continue with the next step.
7. Are the drives properly seated in their respective drive bays?
Note:
v If your system is an 8335-GCA or 8335-GTA, go to “8335-GCA and 8335-GTA locations” on page
111 to identify the physical location and the removal and replacement procedure.
v If your system is an 8335-GTB, go to “8335-GTB locations” on page 121 to identify the physical
location and the removal and replacement procedure.
v If your system is an 8348-21C, go to “8348-21C locations” on page 133 to identify the physical
location and the removal and replacement procedure.
IfThen
Yes:Continue with the next step.
No:Properly seat the drives in the drive bays. Then, go to step 4 on page 9.
8. Refresh the Petitboot boot options. Is the boot image on the logical drive recognized?
IfThen
Yes:Boot the operating system. Then, continue with step 11 on page 11.
No:Continue with the next step.
9. Verify that the physical drives are in the RAID array. On the Petitboot command line, type the
following command:
arcconf getconfig 1 PD
Are the physical drives that are known to be in the RAID array recognized?
IfThen
Yes:Reinstall the operating system on the logical drive. This ends the procedure.
No:Continue with the next step.
10. Complete the following actions, one at a time, until the physical drives are recognized in the RAID
array:
Note:
v If your system is an 8335-GCA or 8335-GTA, go to “8335-GCA and 8335-GTA locations” on page
111 to identify the physical location and the removal and replacement procedure.
10Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
v If your system is an 8335-GTB, go to “8335-GTB locations” on page 121 to identify the physical
location and the removal and replacement procedure.
v If your system is an 8348-21C, go to “8348-21C locations” on page 133 to identify the physical
location and the removal and replacement procedure.
a. Ensure that the SAS cable is securely seated in the RAID adapter and the storage backplane.
b. Replace the RAID adapter.
c. Replace the SAS cable.
This ends the procedure.
11. Does an operating system error occur during the boot?
IfThen
Yes:Recover the operating system with the tools provided for the operating system. If that does
not resolve the problem, reinstall the operating system. This ends the procedure.
No:Reinstall the operating system. This ends the procedure.
Resolving a sensor indicator problem
Learn how to resolve a sensor indicator problem by using the BMC dashboard.
After the system is powered on, some sensors retain their status from the last time the system was
operational. As a result, the sensor indicator LED might not reflect the status of the physical sensor, and
it can be unclear whether the sensor indicator LED indicates an actual problem that requires a service
action. For more information about BMC dashboard sensors on an 8335-GCA or 8335-GTA, see Event
sensor status GUI display. For more information about BMC dashboard sensors on an 8335-GTB, see
Event sensor status GUI display. For more information about BMC dashboard sensors on an 8348-21C,
see Event sensor status GUI display.
To refresh the sensor indicator LEDs and to determine whether a service action is required, complete the
following procedure:
1. Power off the system. Then, boot the system to the operational state. Click Refresh on the BMC
dashboard.
Are any of the sensor indicator LEDs still red?
v Yes: Continue with the next step.
v No: This ends the procedure.
2. Record the names of any sensors that have a red LED indicator status.
Note: Repeat steps 3 - 6 for every sensor that you record in this step.
3. Use one of the following commands to list the sensor event logs (SELs).
v To list SELs by using an in-band network, enter the following command:
ipmitool sel elist
v To list SELs remotely over the LAN, enter the following command:
ipmitool -I lanplus -U <username> -P <password> -H <BMC IP addres or BMC hostname> sel elist
4. Review the list of SELs and locate the log entry that meets the following criteria:
v The name of any of the sensors you recorded in step 2.
v A service action keyword is present. For a list of service action keywords, see “Identifying service
action keywords in system event logs” on page 36.
v Asserted is in the description.
Did you identify a log entry that meets the above criteria?
v Yes: Continue with the next step.
Beginning troubleshooting and problem analysis11
v No: Go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and
support” on page 110. This ends the procedure.
5. Use one of the following options to display the SEL details for the sensor:
Note: You must specify the SEL record ID in hexadecimal format. For example: 0x1a.
v To display SEL details by using an in-band network, enter the following command:
ipmitool sel get <SEL record ID>
v To display SEL details remotely over the LAN, enter the following command:
ipmitool -I lanplus -U <username> -P <password> -H <BMC IP address or BMC hostname> sel get <SEL record ID>
6. The sensor ID field contains sensor information in the sensor name (sensor ID) format. Record the
sensor name, sensor ID, and event description. Then, use this information to determine the service
action to perform:
v If your system is an 8335-GCA or 8335-GTA, go to “Identifying a service action by using sensor and
event information for the 8335-GCA and 8335-GTA” on page 37 to determine the service action to
perform. This ends the procedure.
v If your system is an 8335-GTB, go to “Identifying a service action by using sensor and event
information for the 8335-GTB” on page 57 to determine the service action to perform. This ends the
procedure.
v If your system is an 8348-21C, go to “Identifying a service action by using sensor and event
information for the 8348-21C” on page 78 to determine the service action to perform. This ends the
procedure.
Resolving a hardware problem
Learn how to identify the service action that is needed to resolve a hardware problem.
1. If you have not already done so, manually boot the system.
2. Go to “Identifying a service action by using system event logs” on page 27. Then, continue with the
next step.
3. Was a service action identified?
IfThen
Yes:Continue with the next step.
No:Go to step 5.
4. Did the service action fix the problem?
IfThen
Yes:This ends the procedure.
No:Go to step 5.
5. Go to “Resolving a GPU, PCIe adapter, or device problem” on page 13. Then, continue with the next
step.
6. Was a service action identified?
IfThen
Yes:Continue with the next step.
No:Go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and
support” on page 110. This ends the procedure.
7. Did the service action fix the problem?
12Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
IfThen
Yes:This ends the procedure.
No:Go to “Collecting diagnostic data” on page 109. Then, go to “Contacting IBM service and
support” on page 110. This ends the procedure.
Resolving a GPU, PCIe adapter, or device problem
Learn how to access log files, information to identify types of events, and a list of potential problems and
service actions.
1. Are all of the adapters in the system missing or failed?
IfThen
Yes:Replace the system backplane.
v If your system is an 8335-GCA or 8335-GTA, go to “8335-GCA and 8335-GTA locations”
on page 111 to identify the physical location and the removal and replacement procedure.
v If your system is an 8335-GTB, go to “8335-GTB locations” on page 121 to identify the
physical location and the removal and replacement procedure.
v If your system is an 8348-21C, go to “8348-21C locations” on page 133 to identify the
physical location and the removal and replacement procedure.
No:Continue with the next step.
2. To identify the correct service procedure to perform by using operating system log information,
complete the following steps:
a. Log in as the root user.
b. At the command prompt, type dmesg and press Enter.
3. Scan the operating system logs for the first occurrence of keywords, such as fail, failure, or failed.
When you find a keyword that accompanies one or more of the resource names in the following table,
a service action is required. Use the following table to determine the service procedure to perform for
your type of problem.
Table 1. Resource names, examples, and service procedures for different types of operating system logs.
Example of a log requiring
Resource name
aacraidPCI error detected 2RAID
eth1, eth2, eth3Failed to re-initialize
NVRMaborting RmInitAdapter
nvidia-nvlinkIBMNPU: NPU FENCE
nvmeFailed status: ffffffff,
a service actionType of problemService procedure
Go to “Resolving a RAID
device
failed!
detected, machine power
cycle required
reset controller
Note: This adapter is
available only for 8348-21C
systems.
NetworkGo to “Resolving a network
GraphicsGo to “Resolving a
GraphicsGo to “Resolving a
NVMe Flash adapter
Note: This adapter is
available only for
8335-GCA systems.
Beginning troubleshooting and problem analysis13
adapter problem” on page
14.
adapter problem” on page
15.
graphics processing unit
problem” on page 16.
graphics processing unit
problem” on page 16.
Go to “Resolving an NVMe
Flash adapter problem” on
page 19.
Table 1. Resource names, examples, and service procedures for different types of operating system
logs. (continued)
Example of a log requiring
Resource name
ata1, ata2SError: { RecovComm
sda, sdb, sdcFAILED ResultStorage
a service actionType of problemService procedure
PHYRdyChg 10B8B Dispar }
Marvell storage adapter
Note: This adapter is
available only for 8348-21C
systems.
Go to “Resolving a storage
device problem” on page
20.
Resolving a RAID adapter problem
Learn about the possible problems and service actions that you can perform to resolve a RAID adapter
problem.
Note: To determine the location of the PCIe adapter, see “Identifying the location of the PCIe adapter by
using the slot number” on page 21.
Table 2. RAID adapter problems and service actions.
ProblemService action
System unable to find adapter
Adapter stops working suddenly
1. Verify that the adapter is properly seated in a
compatible slot.
2. Install the adapter in a different compatible slot.
3. Verify that the drivers for the adapter are installed.
4. Verify that the most recent firmware is installed on
the system. Otherwise, install the most recent
firmware if it is not already installed.
5. Restart the system.
6. Replace the adapter.
7. Replace the system backplane.
8. Replace the central processing unit (CPU).
1. If the system was recently installed, moved, serviced,
or upgraded, verify that the adapter is seated
properly and all associated cables are connected
correctly.
2. Inspect the PCIe socket and verify that there is no
dirt or debris in the socket.
3. Inspect the card and verify that it is not physically
damaged.
4. Verify that all cables are properly seated and are not
physically damaged. If you recently added one or
more new adapters, remove them and then test to
determine whether the failing adapter is functioning
properly again. If the RAID adapter is functioning
again, review the IBM support tips to confirm that
there are no PCI address, driver, or firmware
conflicts. Then, reinstall the new adapters again one
at a time until all adapters function properly.
5. Replace the adapter.
6. Replace the system backplane.
7. Replace the CPU.
14Problem analysis, system parts, and locations for the 8335-GCA, 8335-GTA, 8335-GTB, and 8348-21C
Loading...
+ 144 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.