The TSCO Agent data transfers are failing with an error 87 [Collect Request - No data or permission on Agent repository (Local repository) for time] for some entities. There are two common potential causes for this. The most common, as described in KA 000097173: 'CO Gateway Manager (BPA General Manager) Error codes and remediation suggestions' (https://bmcsites.force.com/casemgmt/sc_KnowledgeArticle?sfdcid=000097173) occurs when two Manager runs are attempting to manage data collection for the same agent. Another possible cause is that the TSCO Agent is accepting the collection request but the bgscollect process PID is in a state where it can't collect data. For example, this can happen on Windows if the bgscollect.exe process fails with a MEMORY_VIOLATION but the process doesn't fully terminate. |
Option A: Two active Manager runs both attempting to manage data collection against the same TSCO Agent with the same collection start time Q: What is the correct way to configure data collection when it is intentional for two TSCO Gateway Servers to manage data collection against a single entity? Having multiple Manager runs supporting data collection, transfer, and processing for the same TSCO Agent concurrently isn't a supported configuration if the Manager runs would be sharing the same UDR data on the TSCO Agent side. To have multiple TSCO Gateway Servers managing data collection on the same TSCO Agent concurrently it would be necessary to configure the two consoles differently to have separate collection requests on the TSCO Agent side. The easiest way to do that is to have a different 'Start Time' for each Manager run. That will split the UDR data collection requests for each TSCO Gateway Server into separate directories on the TSCO Agent side so they can be separately managed. So, on one Gateway Server you could have your Manager runs configured to run from 00:00 to 23:59 and the other could run from 01:00 to 00:59. The only best practice thing to keep in mind is that the Start Time should be evenly divisible by the data processing summarization interval (VIS_INTERVAL). So, if your VIS interval is 15 minutes then valid Start Times are 00:00, 00:15, 00:30, 00:45 and so on. If it is 60 minutes then valid Start Times are 00:00, 01:00, 02:00, and so on. There is no way to prevent one console from deleting the data out from another console if both consoles are trying to manage the same UDR data repository on the TSCO Agent side. Note that it would be necessary to stop and resubmit the Manager runs for that change to take effect. The TSCO Gateway Server doesn't support changing the Start Time/End Time of an active Manager run -- the run needs to be reinitialized after a change like that. A potential easier workaround than stopping and rescheduling the runs is: (1) Edit the Manager Run Configuration for the run and make two changes: * Change the Analyze tab "Analyze time interval" from 00:00 to 23:59 to 01:00 to 00:59 * Make a corresponding change to the Scheduling tab "Schedule the scripts to run" minutes before collection. So, if you changed the Start Time from 00:00 to 01:00 (60 minutes) then change the Schedule scripts to run at from 30 minutes to 90 minutes. What making those two changes will do is allow the scheduling of the Manager script to remain at 23:30 (when it is currently scheduled based upon the 00:00 start time) but allow the actual collection time to be adjusted to 01:00 - 00:59 without the need to reschedule the run. That will generally be easier than trying to stop the current Agent List and resubmit it. Option B: TSCO collector is running but in a state where it is unable to service the collection request In this situation there will be error messaging on the TSCO Agent side that indicates there is a data collection problem, but in the Node History the error will appear as a data transfer error. Here is an example scenario from the TSCO Agent running on Windows. Looking at the [hostname]-bgsagent_6767.log the following error is reported: UDR Collect Request - WARNING: spilltime not within requested range (hostname.domain.com:noInstance:Sep-28-2021.00.00:'[metric group]) The error is repeated for all registered metric groups. Looking at the [hostname]-bgscollect.exe-noInstance.log it ends with an indicator that there has been a MEMORY_VIOLATION and an attempted exit: Fri Sep 17 20:17:41 2021 bgscollect.exe (61228:71100) Error : caught unhandled exception Fri Sep 17 20:17:41 2021 bgscollect.exe (61228:71100) Info : unhandled exception at 0xf2d8e743: 0xc0000005 - EXCEPTION_ACCESS_VIOLATION Fri Sep 17 20:17:41 2021 bgscollect.exe (61228:71100) Info : bgscollect.exe received exception in NT System Statistics Fri Sep 17 20:17:41 2021 bgscollect.exe (61228:71100) Changing status for NT System Statistics from OK to MEMORY_VIOLATION Fri Sep 17 20:17:41 2021 bgscollect.exe (61228:71100) Info : Exitting(1) But the time stamp of that message is well before the latest time stamp in the TSCO Agent log, indicating that the message is old. The problem here is that the bgscollect process has hung on termination but hasn't been deregistered from the Collector Registration Queue (CRQ) and the check that the TSCO Agent does to see if the process is still running indicates it is running (although it is unable to collect data). That causes no data to be collected over the day and thus the timestamp directory doesn't exist when it is time to transfer the data. |