Megatest

Check-in [bd581f5fff]
Login
Overview
Comment:Eliminated load delays of less than 1 second. Eliminated INFO when load is acceptable.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | v1.65-real
Files: files | file ages | folders
SHA1: bd581f5fff38d1f72ac52eb21792e519a2ab1cdb
User & Date: mmgraham on 2021-02-12 16:44:51
Other Links: branch diff | manifest | tags
Context
2021-02-14
19:54
Bunch of minor fixes/cleanup check-in: 4aae235143 user: matt tags: v1.65-real
2021-02-12
16:44
Eliminated load delays of less than 1 second. Eliminated INFO when load is acceptable. check-in: bd581f5fff user: mmgraham tags: v1.65-real
12:23
Avoided server logs that have exited. Detected when a server is in dbprep, and wait 25 seconds for that to finish. check-in: bbf1632194 user: mmgraham tags: v1.65-real
Changes

Modified common.scm from [28bc361690] to [5dd52b20b3].

2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204

2205
2206
2207
2208
2209
2210
2211

2212
2213
2214
2215
2216
2217
2218
	 ;; effective load accounts for load jumps, this should elminate all the first-next-avg, adjwait, load-jump-limit
	 ;; etc.
	 (effective-load    (common:get-intercept first next))
	 (recommended-delay (common:get-delay effective-load numcpus))
	 (effective-host    (or remote-host "localhost"))
	 (normalized-effective-load (/ effective-load numcpus))
	 (will-wait                 (> normalized-effective-load maxnormload)))
    (if (> recommended-delay 0)
	(let* ((actual-delay (min recommended-delay 30)))
	  (if (common:low-noise-print 30 (conc (round actual-delay) "-safe-load"))
	      (debug:print-info 0 *default-log-port* "Load control, delaying "
				actual-delay " seconds to maintain safe load. current normalized effective load is "
				normalized-effective-load". maxnormload = " maxnormload " numcpus = " numcpus " loadavg = " loadavg " effective-load = " effective-load))
	  (thread-sleep! actual-delay)))
    
    (cond
     ;; bad data, try again to get the data
     ((not will-wait)
      (if (common:low-noise-print 30 (conc (round normalized-effective-load) "-load-acceptable-" effective-host))
	  (debug:print 0 *default-log-port* "Effective load on " effective-host " is acceptable at " effective-load " continuing.")))

     ((and (< first 0) ;; this indicates the loadavg data is bad - machine may not be reachable
	   (> num-tries 0))
      (debug:print 0 *default-log-port* "WARNING: received bad data from get-cpu-load "
		   first ", we'll sleep 10s and try " num-tries " more times.")
      (thread-sleep! 10)
      (common:wait-for-cpuload maxnormload numcpus-in
			       count: count remote-host: remote-host num-tries: (- num-tries 1)))

     ;; need to wait for load to drop
     ((and will-wait ;; (> first adjmaxload)
	   (> count 0))
      (debug:print-info 0 *default-log-port*
			"Delaying 15" ;; adjwait
			" seconds due to normalized effective load " normalized-effective-load ;; first
			" exceeding max of " adjmaxload







|










|
|
>







>







2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
	 ;; effective load accounts for load jumps, this should elminate all the first-next-avg, adjwait, load-jump-limit
	 ;; etc.
	 (effective-load    (common:get-intercept first next))
	 (recommended-delay (common:get-delay effective-load numcpus))
	 (effective-host    (or remote-host "localhost"))
	 (normalized-effective-load (/ effective-load numcpus))
	 (will-wait                 (> normalized-effective-load maxnormload)))
    (if (> recommended-delay 1) 
	(let* ((actual-delay (min recommended-delay 30)))
	  (if (common:low-noise-print 30 (conc (round actual-delay) "-safe-load"))
	      (debug:print-info 0 *default-log-port* "Load control, delaying "
				actual-delay " seconds to maintain safe load. current normalized effective load is "
				normalized-effective-load". maxnormload = " maxnormload " numcpus = " numcpus " loadavg = " loadavg " effective-load = " effective-load))
	  (thread-sleep! actual-delay)))
    
    (cond
     ;; bad data, try again to get the data
     ((not will-wait)
      ;; (if (common:low-noise-print 30 (conc (round normalized-effective-load) "-load-acceptable-" effective-host))
      ;;	  (debug:print 0 *default-log-port* "Effective load on " effective-host " is acceptable at " effective-load " continuing.")))

     ((and (< first 0) ;; this indicates the loadavg data is bad - machine may not be reachable
	   (> num-tries 0))
      (debug:print 0 *default-log-port* "WARNING: received bad data from get-cpu-load "
		   first ", we'll sleep 10s and try " num-tries " more times.")
      (thread-sleep! 10)
      (common:wait-for-cpuload maxnormload numcpus-in
			       count: count remote-host: remote-host num-tries: (- num-tries 1)))

     ;; need to wait for load to drop
     ((and will-wait ;; (> first adjmaxload)
	   (> count 0))
      (debug:print-info 0 *default-log-port*
			"Delaying 15" ;; adjwait
			" seconds due to normalized effective load " normalized-effective-load ;; first
			" exceeding max of " adjmaxload