Megatest

Ticket Change Details
Login
Overview

Artifact ID: 9f477377482b90da01c58972ace115b62038f60d
Ticket: 827909e4da935424995f21a683873114abb4bbcf
dashboard running on non-homehost with large db can hang
User & Date: bjbarcla on 2017-09-05 17:17:35
Changes

  1. comment changed to:
    I have observed that if you are running dashboard on a non-homehost it will work until the database size reaches some threshold, at which point it will freeze.  When restarting, it immediately freezes.  I suspect the database size is the critical dimension -- syncing likely takes longer than the action loop in dashboard.
    
    In the instance I see it, the megatest.db size is 5.4M with 4k tests and 11 runs.
    
    Running dashboard on homehost, there is no issue at all and dashboard responds very quickly.
    
    Here is the console log from dashboard when we see the freezing issue:
    
    
    Checking for user runconfigs config ...
    WARNING  lefkowit.runconfigs.config file ...doesn't exist
    do this if you care ### Add custom targets/overrides to global targets here to  ./configs/lefkowit.runconfigs.config
    
    Checking for required/common linktree path: /p/fdk/gwa/lefkowit/qa/libanaacatqa/links ...
    exists   /p/fdk/gwa/lefkowit/qa/libanaacatqa/links
    Defined disks:
    ------------------------------------
    exists   /p/fdk/gwa/lefkowit/qa/libanaacatqa/runs
    ------------------------------------
    Checking if megatest variables are already defined ...
    STARTING DASHBOARD ...
    WARNING: Current policy requires running dashboard on homehost: (10.38.45.46 . #f)
    INFO: Trying to start server (megatest -server 10.38.45.46 -m testsuite:libanaacatqa) ...
    INFO: (0) Starting server on 10.38.45.46, logfile is /nfs/pdx/disks/ch_icf_megatest_001/lefkowit/mtTesting/megatestqa/libanaacatqa/logs/server.log
    #======================================================================
    # NBFAKE logging command to: /nfs/pdx/disks/ch_icf_megatest_001/lefkowit/mtTesting/megatestqa/libanaacatqa/logs/server.log
    #      megatest -server 10.38.45.46 -m testsuite:libanaacatqa
    #======================================================================
    WARNING: problem with directory /p/fdk/gwa/lefkowit/fossil/ext/acatqa_ext/trunk/tests, dropping it from tests path
    WARNING: problem with directory /nfs/pdx/disks/ch_icf_megatest_001/lefkowit/mtTesting/megatestqa/relqa/tests, dropping it from tests path
    WARNING: problem with directory /nfs/pdx/disks/ch_icf_megatest_001/lefkowit/mtTesting/megatestqa/libanaacatqa/tests, dropping it from tests path
    NOTE: updates are taking a long time, 3.0s elapsed.
    NOTE: increasing poll interval from 1000 to 2000
    NOTE: updates are taking a long time, 3.0s elapsed.
    NOTE: increasing poll interval from 2000 to 4000
    NOTE: updates are taking a long time, 3.0s elapsed.
    NOTE: increasing poll interval from 4000 to 8000
    NOTE: updates are taking a long time, 3.0s elapsed.
    NOTE: increasing poll interval from 8000 to 16000
    NOTE: updates are taking a long time, 3.0s elapsed.
    NOTE: increasing poll interval from 16000 to 32000
    NOTE: updates are taking a long time, 3.0s elapsed.
    NOTE: increasing poll interval from 32000 to 64000
    NOTE: updates are taking a long time, 3.0s elapsed.
    NOTE: increasing poll interval from 64000 to 128000
    NOTE: updates are taking a long time, 3.0s elapsed.
    NOTE: increasing poll interval from 128000 to 256000
    NOTE: updates are taking a long time, 3.0s elapsed.
    NOTE: increasing poll interval from 256000 to 512000
    ...
    NOTE: increasing poll interval from 1125899906842624000 to 2251799813685248000
    NOTE: updates are taking a long time, 3.0s elapsed.
    NOTE: increasing poll interval from 2251799813685248000 to 4503599627370496000
    NOTE: updates are taking a long time, 3.0s elapsed.
    
            Call history:
    
            db.scm:1972: call-with-current-continuation
            db.scm:1972: with-exception-handler
            db.scm:1972: ##sys#call-with-values
            db.scm:1972: k3436
            db.scm:1972: g3440
            dashboard.scm:795: current-seconds
            dashboard.scm:798: dboard:tabdat-allruns-by-id
            dashboard.scm:798: hash-table-set!
            dashboard.scm:803: debug:print
            common_records.scm:131: debug:debug-mode
            common_records.scm:132: with-output-to-port
            dashboard.scm:804: iup-base#attribute
            dashboard.scm:805: floor
            dashboard.scm:3551: k7615
            dashboard.scm:3551: g7619
            dashboard.scm:3551: print-call-chain            <--
    inexact number cannot be represented as an exact number
    Callback error in dashboard:runs-tab-updater
    Full condition info:
    ((exn (location inexact->exact) (call-chain (#(db.scm:1978: loop #f #f) #(db.scm:1978: loop #f #f) #(db.scm:1978: loop #f #f) #(db.scm:1972: call-with-current-continuation #f #f) #(db.scm:1972: with-exception-handler #f #f) #(db.scm:
    1972: ##sys#call-with-values #f #f) #(db.scm:1972: k3436 #f #f) #(db.scm:1972: g3440 #f #f) #(dashboard.scm:795: current-seconds #f #f) #(dashboard.scm:798: dboard:tabdat-allruns-by-id #f #f) #(dashboard.scm:798: hash-table-set! #f #
    f) #(dashboard.scm:803: debug:print #f #f) #(common_records.scm:131: debug:debug-mode #f #f) #(common_records.scm:132: with-output-to-port #f #f) #(dashboard.scm:804: iup-base#attribute #f #f) #(dashboard.scm:805: floor #f #f))) (arg
    uments (9.00719925474099e+18)) (message inexact number cannot be represented as an exact number)) (type))
    NOTE: updates are taking a long time, 3.0s elapsed.
    <stack dump repeats regularly, probably in the dashboard event loop>
    
  2. foundin changed to: "1.6429"
  3. login: "bjbarcla"
  4. private_contact changed to: "9a900f538965a426994e1e90600920aff0b4e8d2"
  5. severity changed to: "Important"
  6. status changed to: "Open"
  7. title changed to:
    dashboard running on non-homehost with large db can hang
    
  8. type changed to: "Code_Defect"