iSeries / AS400 monitoring solution for New Relic Infrastructure. Actually split into 5 separate OHIs, all described below.
- New Relic Infrastructure Agent installed on a Linux host
- Linux host must be able to remotely address the iSeries/AS400 server(s) in question
- Unzip/gunzip nri-as400 package on host where NRI Agent is installed
- Test your credentials and parameters with
nri-as400-test.sh
- Edit
nri-as400/nri-as400-config.yml
, configuring instances of the 5 OHIs outlined below as needed. - run
nri-as400/install_linux.sh
Configuration for monitoring iSeries active jobs.
Configuration for monitoring iSeries server memory usage KPIs.
Configuration for monitoring iSeries message queues.
Configuration for monitoring iSeries server KPIs.
Configuration for monitoring iSeries disk usage.
This collection of OHIs comes with nri-as400-test.sh
, a shell script to verify:
- connectivity to AS400 host
- credentials and other parameters
- data returned by the OHI instances
To use nri-as400-test.sh
:
- Edit
nri-as400-test.sh
, setting the environment variables at the top to the settings you will use in this configuration. - If not executable, run
chmod +x nri-as400-test.sh
- Run
./nri-as400-test.sh [job-list|memory-status|message-queue|system-status|disk-usage]
This will allow you to test the configuration and ensure that the credentials and parameters are correct for each of the supported commands, including the new disk-usage
command.
Each section below details the possible settings for an instance of that OHI. All of these settings should be in the instances:
stanza of nri-as400-config.yml
.
instances:
- name: pub400_jobs
command: job-list
arguments:
as400host: pub400.com
userid: USER0465
passwd: user0465
reset_wait_delay: 100
name
: The name of the instance, usually the host and "_jobs"as400host
: The iSeries host.userid
: The user that has access to the message queue.passwd
: The password for the user.reset_wait_delay
: The delay interval (in ms) between resetting stats and capturing them - optional, default to 100msretrieve_msgw
: Retrieves the outstanding last message for jobs with an active job status of MSGW. Optional, default to false
- This OHI can be configured to pull job data from multiple iSeries servers
- Recommended: Depending on the server, this could take over 60 seconds to complete. Test using
nri-as400-test.sh job-list
and note the time it takes to complete. If longer than 60 seconds, increase theinterval
innri-as400-definition.yml
to be at least as long as that time. - The attributes collected for each job are a subset of those that can be included, and it would not be difficult to extend the agent to collect additional attributes.
instances:
- name: pub400_status
command: memory-status
arguments:
as400host: pub400.com
userid: USER0465
passwd: user0465
name
: The name of the instance, usually the host and "_status"as400host
: The iSeries host.userid
: The user that has access to the message queue.passwd
: The password for the user.
- This OHI can be configured to pull server memory usage KPIs from multiple iSeries servers.
- Execution interval should be fine at 30 to 60 seconds.
instances:
- name: pub400_queue_QSYSOPR
command: message-queue
arguments:
instance: pub400.QSYSOPR
as400host: pub400.com
msgqueue: /QSYS.LIB/QSYSOPR.MSGQ
userid: USER0465
passwd: user0465
name
: The name of the instance, an instance being a combination of iSeries host and message queue.as400host
: The iSeries host.userid
: The user that has access to the message queue.passwd
: The password for the user.instance
: Can be the same asname
, used to build / maintain the checkpoint file.msgqueue
: The iSeries message queue to be monitored.
- This OHI can be configured to monitor multiple Messages Queues on Multiple iSeries servers.
- The agent on startup will read and post the last message in the queue, subsequent executions will read from the last read message and process only new messages.
- There are checkpoint files which contain the binary value of the 4 byte message key that was last processed in the OHI working directory (usually /var/db/newrelic-infra/custom-integrations). Deleting the checkpoint file will force the agent to only read the latest message, this might be helpful / required in the message queue does not reset after an IPL (like QSYSOPR).
- Execution interval should take into account the verbosity of the queue, however in general 30 to 60 seconds should be fine.
instances:
- name: pub400_status
command: system-status
arguments:
as400host: pub400.com
userid: USER0465
passwd: user0465
name
: The name of the instance, usually the host and "_status"as400host
: The iSeries host.userid
: The user that has access to the message queue.passwd
: The password for the user.
- This OHI can be configured to pull server KPIs from multiple iSeries servers.
- Execution interval should be fine at 30 to 60 seconds.
instances:
- name: pub400_disk_usage
command: disk-usage
arguments:
as400host: pub400.com
userid: USER0465
passwd: user0465
labels:
env: production
name
: The name of the instance, usually the host and "_disk_usage".as400host
: The iSeries host.userid
: The user that has access to the disk usage statistics.passwd
: The password for the user.env
: You may specify the enviroment.
- This OHI can be configured to pull disk usage statistics from multiple iSeries servers.
- Execution interval should be fine at 30 to 60 seconds.
Event Type: AS400:JobList
Attributes:
event_type
- Required for all OHI events.summary
- Summary of the event. Optional for OHI events.jobQueue
- The job queue that the job was submitted to.jobName
- The executing job name.jobUser
- The user which submitted the job.jobNumber
- The unique job number assigned to the job.jobCPUUsed
- Amount of CPU used by the job.jobRunPriority
- Current job run priority assigned to the job.jobStatus
- Current status of the active job.jobSubsystem
- The job subsystem that the job is currently executing in.
Event Type: AS400:MemoryStatusEvent
Attributes:
event_type
- Required for all OHI events.eventInstanceId
- A unique ID correlating the variaous storage pools metrics captured together.systemName
- iSeries name.dateTimeStatusGathered
- Date / Time of collection.mainStorageSize
- The amount of main storage, in kilobytes, in the system..minimumMachinePoolSize
- The minimum size, in kilobytes, for the machine pool.minimumBasePoolSize
- The minimum size, in kilobytes, for the base pool.numberOfPools
- The number of pools allocated when the information was gathered.poolName
- The name of this storage pool.subsystemName
- The name of the system where the statistics were collected.susbsystemLibraryName
- The subsystem with which this storage pool is associated.pagingOption
- Whether the system will dynamically adjust the paging characteristics of the storage pool for optimum performance.description
- The description of the shared pool.status
- The status of the pool: 0=Active 1=Inactive.systemPools
- The system-related pool identifier for each of the system storage pools that currently has main storage allocated to it.poolSize
- The amount of main storage, in kilobytes, in the pool.maximumActiveThreads
- The maximum number of threads that can be active in the pool at any one time.databaseFaults
- The rate (in tenths), shown in page faults per second, of database page faults against pages containing either database data or access paths. A page fault is a program notification that occurs when a page that is marked as not in main storage is referred to by an active program.databasePages
- he rate (in tenths), in pages per second, at which database pages are brought into the storage pool. A page is a 4096-byte block of information that is transferable between auxiliary storage and main storage.nondatabaseFaults
- The rate (in tenths), in page faults per second, of nondatabase page faults against pages other than those designated as database pages.nondatabasePages
- The rate (in tenths), in pages per second, at which nondatabase pages are brought into the storage pool.activeToWait
- The rate (in tenths), in transitions per minute, of transitions of threads from an active condition to a waiting condition.waitToIneligible
- The rate (in tenths), in transitions per minute, of transitions of threads from a waiting condition to an ineligible condition.activeToIneligible
- The rate (in tenths), in transitions per minute, of transitions of threads from an active condition to an ineligible condition.definedSize
- The size of the pool, in kilobytes, as defined in the shared pool, subsystem description, or system value QMCHPOOL.currentThreads
- The number of threads currently using the pool's activity level.currentIneligibleThreads
- The number of ineligible threads in the pool's activity level.
Event Type: AS400:MessageQueueEvent
Attributes:
event_type
- Required for all OHI events.summary
- Summary of the event. Optional for OHI events.host
- The iSeries host that the message queue resides on.queue
- The iSeries message queue being polled.messageID
- The 4 byte message unique message identifier.job
- The name of the job which produced the message.jobNumber
- The job number of the job which produced the messagetype
- The message type.program
- The program from which the message originated.severity
- The numeric message severity 00 through 99.replyStatus
- If the message requires a reply from an operator, the status of that reply.user
- The user associated with the job that originated the message.message
- The message text that appears on the operator console.messageHelp
- The message help text (if any) associated with the message.
Event Type: AS400:SystemStatusEvent
Attributes:
event_type
- Required for all OHI events.numberActiveJobsInSystem
- Number of jobs in the system that are currently active and running.activeThreadsInSystem
- Number of active threads in the system.batchJobsEndedWithPrinterOutputWaitingToPrint
- Number of jobs that have ended, but are currently waiting for output to spool to a printer.numberOfBatchJobsEnding
- Number of batch jobs ending.numberOfBatchJobsHeldOnQueue
- Number of batch jobs that have been held in the job queue.nummberOfBatchJobsOnUnassignedQueues
- Number of batch jobs on unassigned queues.numberOfBatchJobsRunning
- Number of batch (versus interactive) jobs running on the system.numberOfBatchJobsWaitingForMessage
- Number of batch jobs that have produced a message requiring a reply to a message queue.numberOfBatchJobsWaitingToRunOrAlreadyScheduled
- Number of batch jobs waiting execution.currentProcessingCapacity
- Current processing capacity.currentUnprotectedStorageUsed
- Unprotected storage currently in use.dateTimeStatusGathered
- Date / Time of collection.numberOfJobsInSystem
- Number of total jobs currently in the system.mainStorageSize
- Main storage capacity.maxJobsInSystem
- Maximum number of jobs allowed in the system.maxUnprotectedStorageUsed
- Maximum unprotected storage units allowed in the system.numberOfPartitions
- Number of partitions active on the system.numberOfProcessors
- Number of processors enabled on the system.partitionIdentifier
- Logical partition ID.percentCurrentInteractivePerformance
- Percent of current interactive performance.percentDBCapability
- Percent of database capability.percentPermanent4GBSegmentUsed
- Percent of permanent 4GB segments in use.percentProcessingUnitUsed
- Percent of processing units currently in use.percentSharedProcessorPoolUsed
- Percent of shared processing pool currently in use.percentSystemASPUsed
- Percent of Auxiliary storage pools in use.percentTemporary256MBSegmentsUsed
- Percent of temporary 256MB segments in use.percentTemporary4GBSegmentsUsed
- Percent of temporary 4GB segments in use.percentTemporaryAddresses
- Percent of temporary addresses in use.percentUncappedCPUCapacityUsed
- Percent of uncapped SPU capacity in use.poolsNumber
- Number of pools.processorSharingAttribute
- Processor sharing disposition.restrictedStateFlag
- Restricted state flag setting.systemASP
- Total size of auxiliary storage pool.systemName
- iSeries nametotalAuxiliaryStorage
- Total size of auxiliary storage.currentUsersSignedOn
- Total number of users logged into the system.usersSignedOffWithPrinterOutputWaitingToPrint
- users signed off with output pending print.usersSuspendedBySystemRequest
- Users suspended by a system request
Event Type: AS400:DiskUsageEvent
Attributes:
event_type
- Required for all OHI events.aspNumber
- The storage pool (ASP) number.unitNumber
- The unit number of the disk.unitType
- The type of disk unit.diskType
- The disk type number of the disk.diskModel
- The model number of the disk.serialNumber
- The serial number of the disk unit.resourceName
- The unique system-assigned name of the disk unit.resourceStatus
- The status of the resource. Possible values include:ACTIVE
- RESOURCE_NAME is active.PASSIVE
- RESOURCE_NAME is not active.
capacityMB
- The storage capacity of the unit in megabytes.availableMB
- The available space on the unit in megabytes.percentUsed
- The percentage of the disk unit that has been consumed.
New Relic has open-sourced this project. This project is provided AS-IS WITHOUT WARRANTY OR DEDICATED SUPPORT. Issues and contributions should be reported to the project here on GitHub. We encourage you to bring your experiences and questions to the Explorers Hub where our community members collaborate on solutions and new ideas.
We encourage your contributions to improve nri-as400! Keep in mind when you submit your pull request, you'll need to sign the CLA via the click-through using CLA-Assistant. You only have to sign the CLA one time per project. If you have any questions, or to execute our corporate CLA, required if your contribution is on behalf of a company, please drop us an email at [email protected].
nri-as400 is licensed under the Apache 2.0 License.