Megatest

View Ticket
Login
2017-09-05
17:21 Ticket [827909e4da] dashboard running on non-homehost with large db can hang status still Open with 5 other changes artifact: d8597228f5 user: bjbarcla
17:17 New ticket [827909e4da]. artifact: 9f47737748 user: bjbarcla

Ticket Hash: 827909e4da935424995f21a683873114abb4bbcf
Title: dashboard running on non-homehost with large db can hang
Status: Open Type: Code_Defect
Severity: Important Priority: Immediate
Subsystem: Resolution: Open
Last Modified: 2017-09-05 17:21:49
Version Found In: 1.6429
Description:
I have observed that if you are running dashboard on a non-homehost it will work until the database size reaches some threshold, at which point it will freeze. When restarting, it immediately freezes. I suspect the database size is the critical dimension -- syncing likely takes longer than the action loop in dashboard.

In the instance I see it, the megatest.db size is 5.4M with 4k tests and 11 runs.

Running dashboard on homehost, there is no issue at all and dashboard responds very quickly.

Here is the console log from dashboard when we see the freezing issue:

Checking for user runconfigs config ... WARNING lefkowit.runconfigs.config file ...doesn't exist do this if you care ### Add custom targets/overrides to global targets here to ./configs/lefkowit.runconfigs.config

Checking for required/common linktree path: /p/fdk/gwa/lefkowit/qa/libanaacatqa/links ... exists /p/fdk/gwa/lefkowit/qa/libanaacatqa/links Defined disks: ------------------------------------ exists /p/fdk/gwa/lefkowit/qa/libanaacatqa/runs ------------------------------------ Checking if megatest variables are already defined ... STARTING DASHBOARD ... WARNING: Current policy requires running dashboard on homehost: (10.38.45.46 . #f) INFO: Trying to start server (megatest -server 10.38.45.46 -m testsuite:libanaacatqa) ... INFO: (0) Starting server on 10.38.45.46, logfile is /nfs/pdx/disks/ch_icf_megatest_001/lefkowit/mtTesting/megatestqa/libanaacatqa/logs/server.log #====================================================================== # NBFAKE logging command to: /nfs/pdx/disks/ch_icf_megatest_001/lefkowit/mtTesting/megatestqa/libanaacatqa/logs/server.log # megatest -server 10.38.45.46 -m testsuite:libanaacatqa #====================================================================== WARNING: problem with directory /p/fdk/gwa/lefkowit/fossil/ext/acatqa_ext/trunk/tests, dropping it from tests path WARNING: problem with directory /nfs/pdx/disks/ch_icf_megatest_001/lefkowit/mtTesting/megatestqa/relqa/tests, dropping it from tests path WARNING: problem with directory /nfs/pdx/disks/ch_icf_megatest_001/lefkowit/mtTesting/megatestqa/libanaacatqa/tests, dropping it from tests path NOTE: updates are taking a long time, 3.0s elapsed. NOTE: increasing poll interval from 1000 to 2000 NOTE: updates are taking a long time, 3.0s elapsed. NOTE: increasing poll interval from 2000 to 4000 NOTE: updates are taking a long time, 3.0s elapsed. NOTE: increasing poll interval from 4000 to 8000 NOTE: updates are taking a long time, 3.0s elapsed. NOTE: increasing poll interval from 8000 to 16000 NOTE: updates are taking a long time, 3.0s elapsed. NOTE: increasing poll interval from 16000 to 32000 NOTE: updates are taking a long time, 3.0s elapsed. NOTE: increasing poll interval from 32000 to 64000 NOTE: updates are taking a long time, 3.0s elapsed. NOTE: increasing poll interval from 64000 to 128000 NOTE: updates are taking a long time, 3.0s elapsed. NOTE: increasing poll interval from 128000 to 256000 NOTE: updates are taking a long time, 3.0s elapsed. NOTE: increasing poll interval from 256000 to 512000 ... NOTE: increasing poll interval from 1125899906842624000 to 2251799813685248000 NOTE: updates are taking a long time, 3.0s elapsed. NOTE: increasing poll interval from 2251799813685248000 to 4503599627370496000 NOTE: updates are taking a long time, 3.0s elapsed.

Call history:
db.scm:1972: call-with-current-continuation db.scm:1972: with-exception-handler db.scm:1972: ##sys#call-with-values db.scm:1972: k3436 db.scm:1972: g3440 dashboard.scm:795: current-seconds dashboard.scm:798: dboard:tabdat-allruns-by-id dashboard.scm:798: hash-table-set! dashboard.scm:803: debug:print common_records.scm:131: debug:debug-mode common_records.scm:132: with-output-to-port dashboard.scm:804: iup-base#attribute dashboard.scm:805: floor dashboard.scm:3551: k7615 dashboard.scm:3551: g7619 dashboard.scm:3551: print-call-chain <-- inexact number cannot be represented as an exact number Callback error in dashboard:runs-tab-updater Full condition info: ((exn (location inexact->exact) (call-chain (#(db.scm:1978: loop #f #f) #(db.scm:1978: loop #f #f) #(db.scm:1978: loop #f #f) #(db.scm:1972: call-with-current-continuation #f #f) #(db.scm:1972: with-exception-handler #f #f) #(db.scm: 1972: ##sys#call-with-values #f #f) #(db.scm:1972: k3436 #f #f) #(db.scm:1972: g3440 #f #f) #(dashboard.scm:795: current-seconds #f #f) #(dashboard.scm:798: dboard:tabdat-allruns-by-id #f #f) #(dashboard.scm:798: hash-table-set! #f # f) #(dashboard.scm:803: debug:print #f #f) #(common_records.scm:131: debug:debug-mode #f #f) #(common_records.scm:132: with-output-to-port #f #f) #(dashboard.scm:804: iup-base#attribute #f #f) #(dashboard.scm:805: floor #f #f))) (arg uments (9.00719925474099e+18)) (message inexact number cannot be represented as an exact number)) (type)) NOTE: updates are taking a long time, 3.0s elapsed. <stack dump repeats regularly, probably in the dashboard event loop>
User Comments:
bjbarcla added on 2017-09-05 23:21:49:
One interesting side-note: running dashboard on non-homehost as a different user triggers read-only mode. Dashboard works great in this mode.  Read-only mode starts an in-line "server" which talks to megatest.db directly, bypassing http-transport.  A potential approach would be to force "read-only mode" always and intercept rmt: calls which affect the database and only in this case use http-transport.