The problem symptoms include: (A) TSCO Manager runs scheduled in the environment are duplicated after the upgrade to TSCO 11.3.01 or 11.5.00. (B) Nightly Manager runs may be taking much longer to complete (twice as long) due to the run duplication (C) There may be errors in the /tmp/manager_runs_log.txt file that is created by the migrateManagerRuns.pl command (which is executed by the TSCO Gateway Server 11.5.01 b1config11500.sh file to migrate Manager runs from the existing environment) (D) This issue is seen for TSCO Gateway Server Linux. Errors in the migrateManagerRuns.pl:
Creating Lock file /usr/adm/best1_workspace/results/Nov-08-2019.19.17_[run]/lock_file
Lock file already exists
ERROR: Cannot start this run, duplicate output directory.
A Manager run has just started at this minute.
Please wait for at least one minute before another run.
Or if you did not select timestamp as a subfolder, a Manager run
has already started in the specified output directory.
INFO: Cannot start this Manager run([run].vcmds) this minute: sleeping for 60 seconds |
This problem can only occur in an environment that has 'Agent List' Manager runs defined. The migrateManagerRuns.pl script is migrating both the base 'Agent List' run and the individual run for the domains. So, the problem is that migratManagerRuns.pl doesn't recognize that the base run has been split into multiple runs and it migrates all the runs it sees like the listManagerRuns.pl output (when it should really only migrate the base run). This problem is believed to occur due to the migration attempting to migrate both the parent Manager run and all the child Manager runs associated with an Agent List. In-house the duplicate run scheduling blocking in Manager itself blocks it but it is believed that a race condition may exist that prevents that blocking from happening. SolutionThis problem can be avoided during the TSCO Gateway Server upgrade by upgrading to TrueSight Capacity Optimization (TSCO) Gateway Server version 20.02 rather than version 11.5.WorkaroundIf the problem had occurred contact Technical Support for assistance removing the duplicate Manager runs. The removal of the duplicates must _not_ be done using a "run.quit" file to avoid data loss on the day of the duplication removal (since the duplicate runs as sharing a common set of udrCollectMgr processes managing data collection and transfer. The removal of the duplicates can be done by cleaning up the duplicates in pcron. |