Megatest

Check-in [4e1162ffe9]
Login
Overview
Comment:Added checks for vector instead of just true before accessing some data stucts that are generated on the fly and can fail due to communication errors
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | v1.60
Files: files | file ages | folders
SHA1: 4e1162ffe9db11db49179b4e117b224530942892
User & Date: mrwellan on 2014-12-08 13:28:04
Other Links: branch diff | manifest | tags
Context
2014-12-08
17:36
Added few more defensive layers to calls that *may* be part of the crash-on-startup-at-weird-random-times bug check-in: 1510977b0a user: mrwellan tags: v1.60
13:28
Added checks for vector instead of just true before accessing some data stucts that are generated on the fly and can fail due to communication errors check-in: 4e1162ffe9 user: mrwellan tags: v1.60
12:39
Fixed call where :state and :status were not aliased to -state and -status. Improved watch dog exit to not wait gratuitious five seconds before exiting check-in: a834ac5f9e user: mrwellan tags: v1.60
Changes

Modified rmt.scm from [9cda1b0f8b] to [c506e20989].

79
80
81
82
83
84
85
86

87
88
89
90
91
92
93
79
80
81
82
83
84
85

86
87
88
89
90
91
92
93







-
+







(define (rmt:send-receive cmd rid params #!key (attemptnum 1)) ;; start attemptnum at 1 so the modulo below works as expected
  ;; clean out old connections
  (mutex-lock! *db-multi-sync-mutex*)
  (let ((expire-time (- (current-seconds) (server:get-timeout) 10))) ;; don't forget the 10 second margin
    (for-each 
     (lambda (run-id)
       (let ((connection (hash-table-ref/default *runremote* run-id #f)))
         (if (and connection 
         (if (and (vector? connection)
        	  (< (http-transport:server-dat-get-last-access connection) expire-time))
             (begin
               (debug:print-info 0 "Discarding connection to server for run-id " run-id ", too long between accesses")
               ;; SHOULD CLOSE THE CONNECTION HERE
	       (case *transport-type*
		 ((nmsg)(nn-close (http-transport:server-dat-get-socket 
				   (hash-table-ref *runremote* run-id)))))
104
105
106
107
108
109
110
111
112
113



114
115
116
117
118
119
120
104
105
106
107
108
109
110



111
112
113
114
115
116
117
118
119
120







-
-
-
+
+
+







			  ((http)(condition-case
				  (http-transport:client-api-send-receive run-id connection-info cmd params)
				  ((commfail)(vector #f "communications fail"))))
			  ((nmsg)(condition-case
				  (nmsg-transport:client-api-send-receive run-id connection-info cmd params)
				  ((timeout)(vector #f "timeout talking to server"))))
			  (else  (exit))))
	       (success (if (and dat (vector? dat)) (vector-ref dat 0) #f))
	       (res     (if (and dat (vector? dat)) (vector-ref dat 1) #f)))
	  (http-transport:server-dat-update-last-access connection-info)
	       (success (if (vector? dat) (vector-ref dat 0) #f))
	       (res     (if (vector? dat) (vector-ref dat 1) #f)))
	  (if (vector? connection-info)(http-transport:server-dat-update-last-access connection-info))
	  (if success
	      (begin
		;; (mutex-unlock! *send-receive-mutex*)
		(case *transport-type* 
		  ((http) res) ;; (db:string->obj res))
		  ((nmsg) res))) ;; (vector-ref res 1)))
	      (begin ;; let ((new-connection-info (client:setup run-id)))