Megatest

Check-in [f1ba33210e]
Login
Overview
Comment:Add forced reconnect to allow old servers to gracefully die
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | v1.70-refactor-procedures
Files: files | file ages | folders
SHA1: f1ba33210eb39c42db71ad3916266f73ff122310
User & Date: matt on 2022-06-01 06:08:44
Other Links: branch diff | manifest | tags
Context
2022-06-01
21:14
speculative fix check-in: 2f1850785d user: matt tags: v1.70-refactor-procedures
06:08
Add forced reconnect to allow old servers to gracefully die check-in: f1ba33210e user: matt tags: v1.70-refactor-procedures
2022-05-31
06:01
Stop touching log after 600 seconds, clobber *runremote* on client side insteade of setting the remote in the record check-in: a732b9439d user: matt tags: v1.70-refactor-procedures
Changes

Modified common.scm from [49d3d99fc7] to [fda00ca967].

308
309
310
311
312
313
314

315
316
317
318
319
320
321
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322







+








(defstruct remote
  (hh-dat            (common:get-homehost)) ;; homehost record ( addr . hhflag )
  (server-url        #f) ;; (server:check-if-running *toppath*) #f))
  (server-id         #f)
  (server-info       (if *toppath* (server:check-if-running *toppath*) #f))
  (last-server-check 0)  ;; last time we checked to see if the server was alive
  (connect-time      (current-seconds))
  (conndat           #f)
  (transport         *transport-type*)
  (server-timeout    (server:expiration-timeout))
  (force-server      #f)
  (ro-mode           #f)  
  (ro-mode-checked   #f)) ;; flag that indicates we have checked for ro-mode

Modified rmt.scm from [8bbca69519] to [f2baad3376].

62
63
64
65
66
67
68
69

70
71
72
73
74
75
76
62
63
64
65
66
67
68

69
70
71
72
73
74
75
76







-
+







;; RA => e.g. usage (rmt:send-receive 'get-var #f (list varname))
;;
(define (rmt:send-receive cmd rid params #!key (attemptnum 1)(area-dat #f)) ;; start attemptnum at 1 so the modulo below works as expected

  #;(common:telemetry-log (conc "rmt:"(->string cmd))
                        payload: `((rid . ,rid)
                                   (params . ,params)))
                          

  (if (> attemptnum 2)
      (debug:print 0 *default-log-port* "INFO: attemptnum in rmt:send-receive is " attemptnum))
    
  (cond
   ((> attemptnum 2) (thread-sleep! 0.05))
   ((> attemptnum 10) (thread-sleep! 0.5))
   ((> attemptnum 20) (thread-sleep! 1)))
118
119
120
121
122
123
124







125
126
127
128
129
130
131
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138







+
+
+
+
+
+
+







    ;; ensure we have a homehost record
    (if (not (pair? (remote-hh-dat runremote)))  ;; not on homehost
	(thread-sleep! 0.1) ;; since we shouldn't get here, delay a little
	(remote-hh-dat-set! runremote (common:get-homehost)))
    
    ;;(print "BB> readonly-mode is "readonly-mode" dbfile is "dbfile)
    (cond
     ((> (- (current-seconds)(remote-connect-time runremote)) 180) ;; reconnect to server every 180 seconds
      (debug:print 0 *default-log-port* "Forcing reconnect to server(s) due to 180 second timeout.")
      (set! *runremote* #f)
      ;; BUG: close-connections should go here?
      (mutex-unlock! *rmt-mutex*)
      (rmt:send-receive cmd rid params attemptnum: 1 area-dat: area-dat))
     
     ;;DOT EXIT;
     ;;DOT MUTEXLOCK -> EXIT [label="> 15 attempts"]; {rank=same "case 1" "EXIT" }
     ;; give up if more than 150 attempts
     ((> attemptnum 150)
      (debug:print 0 *default-log-port* "ERROR: 150 tries to start/connect to server. Giving up.")
      (exit 1))