Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: Cortex PC BSOD overnight :-(

  1. #1
    Automated Home Ninja marcuslee's Avatar
    Join Date
    Dec 2009
    Posts
    261

    Unhappy Cortex PC BSOD overnight :-(

    Hi Guys,

    Woke up this morning to find the dreaded DFP panels button lights lit and Idratek shown on the LCD display giving the me the heads up that the system was down :-(

    Remote logging into the Cortex PC was no go. So going to the PC, found it was BSOD with 'dumping Kernel' message.

    As it's all in Production/not at a scheduled down time (!!), a quick hardware reboot solved it.

    Forensically:
    - I wonder how to check when it went offline? I'm not that familiar with WinXP Event Viewer, but clicking around, I don't see anything
    - I didn't see anything in the Cortex Logs, but then again I wouldn't expect to if it went down suddenly
    - looking at temp graphs to see if it might show a blank patch in graph data, but there isn't any. However it would appear it might be just after 5am the line goes dead straight at that point until reboot this morning.
    - I've now enabled logging on the switch the PC connects to so I've got a means to see it when it went off
    - I don't suppose anyone else has seen this before or any insight? Obviously I'm a bit worried now about this being something that's going to come up time and again

    [PC is WinXP SP3, AVG Free antivirus, Cortex 26.7.2. Been rock steady for over a year etc, other than Cortex upgrades and AVG upgrades]

    Going forward:
    - I don't think Reflex would have helped as the system only uses DFPH panels, QLD for turning on/off lights and QRI relays for heating and Cortex for heating temp set points?
    - other than rebuild on the cards for Win7, is there any benefit to using Win7 64bit?
    - also has anyone any experience with Cortex redundant PC setup?
    - also I'm beginning to see some benefits to having Cortex as VM (though I tend to distrust the reliability of VMs!)

    Cheers,

    Marcus

  2. #2
    Automated Home Ninja marcuslee's Avatar
    Join Date
    Dec 2009
    Posts
    261

    Default

    - or of course Windows 8.1, if anyone has any insight into going to that (with it's faster boot times)?

    Cheers,

    Marcus

  3. #3
    Automated Home Legend Paul_B's Avatar
    Join Date
    Jul 2006
    Location
    Eastbourne, UK
    Posts
    604

    Default

    Quote Originally Posted by marcuslee View Post
    Hi Guys,

    Forensically:
    - I wonder how to check when it went offline? I'm not that familiar with WinXP Event Viewer, but clicking around, I don't see anything
    • If you have a look in the System Event log of Event Viewer and the times of events do you see a an obvious gap?
    • If don't see a gap then do you see EventLog entries with the EventID 6005, 6006, 6008 and 6009? Event 6005 / 6009 is generated on every reboot when the OS starts-up, just before or after you might see an event 6008, if it exists it might give some more info. - see http://support.microsoft.com/kb/196452 for more info


    Quote Originally Posted by marcuslee View Post
    [PC is WinXP SP3, AVG Free antivirus, Cortex 26.7.2. Been rock steady for over a year etc, other than Cortex upgrades and AVG upgrades]
    • If this is the first time this has happened it might have been a one off, for example a bit flip on a cluster written to disk (OS sends a 1 to disk but disk records a 0 due to electronic interpretation). It might not happen again for another year or more.



    Quote Originally Posted by marcuslee View Post
    Going forward:
    - other than rebuild on the cards for Win7, is there any benefit to using Win7 64bit?
    • Not really because Cortex is a 32 bit application, plus drivers for hardware are probably more mature in 32 bit guise than 64 bit.


    Quote Originally Posted by marcuslee View Post
    - also I'm beginning to see some benefits to having Cortex as VM (though I tend to distrust the reliability of VMs!)
    • To counter this thinking a VM can make Cortex harder to setup and run because of the abstraction layer and trying to pass-through devices. In addition if the memory dump was down to a hardware problem a VM wouldn't have helped




    To check if your machine is setup to generate a memory dump file, how you can find the location and tools to read it refer to http://support.microsoft.com/kb/254649
    Last edited by Paul_B; 22nd October 2013 at 10:02 PM.

  4. #4
    Automated Home Ninja marcuslee's Avatar
    Join Date
    Dec 2009
    Posts
    261

    Default

    Quote Originally Posted by Paul_B View Post
    • If you have a look in the System Event log of Event Viewer and the times of events do you see a an obvious gap?
    • If don't see a gap then do you see EventLog entries with the EventID 6005, 6006, 6008 and 6009? Event 6005 / 6009 is generated on every reboot when the OS starts-up, just before or after you might see an event 6008, if it exists it might give some more info. - see http://support.microsoft.com/kb/196452 for more info
    Thanks for reply Paul.

    In answer:
    - no obvious gap in so far as the there was no log entries past 14/10/13 (I guess the last time something happened), until the reboot this morning and the 6009 log entry
    - thanks for the MS kb link

    Quote Originally Posted by Paul_B View Post
    • If this is the first time this has happened it might have been a one off, for example a bit flip on a cluster written to disk (OS sends a 1 to disk but disk records a 0 due to electronic interpretation). It might not happen again for another year or more.
    - thanks muchly for the reassurance and insight. Makes me feel better than previously where I saw a rebuild as being of extremely high priority


    Quote Originally Posted by Paul_B View Post
    • Not really because Cortex is a 32 bit application, plus drivers for hardware are probably more mature in 32 bit guise than 64 bit.
    Got it. My thinking also, but I was surprised to see a recent reply from Karam about building Cortex on 64bit Windows (I wasn't sure if that meant Cortex was 64bit available/compatible, or if it was simply running 32bit mode on 64bit OS). So I wasn't sure if there was some meat in moving to 64bit.


    Quote Originally Posted by Paul_B View Post
    • To counter this thinking a VM can make Cortex harder to setup and run because of the abstraction layer and trying to pass-through devices. In addition if the memory dump was down to a hardware problem a VM wouldn't have helped
    - understood with this. Also to be frank I dislike VMs for that reason. Also I find vmplayer not great (used on a daily basis, over months, I'll always find something odd happening like USB pass through failing). VMware ESXi though in my limited use, seems to be better, though I didn't punish it nearly as much as VM player on Windows to know.
    - additionally in the banking and finance sector I worked at a bank which used VMs exclusively (I'm in networks, so not very OS or App savy). At some point I did ask the Linux and Windows teams there about how robust that was. Their answer was: at another bank (of higher standing), they never used VMs. At most VMs were only used for lab purposes. And it was sufficient to say, they absolutely stood by this (even though they were admin'ing in this bank's VM environment).
    - thanks though for the information on it not helping.

    Quote Originally Posted by Paul_B View Post
    To check if your machine is setup to generate a memory dump file, how you can find the location and tools to read it refer to http://support.microsoft.com/kb/254649
    Thanks for link. I had a look at the last mini dump file was back in September 2012. So it appears that it was definitely a completely catastrophic failure that came out of the blue.


    I should add I've been meaning to start a thread to put forward a Best Practice for Cortex? thread. In so far as assisting or at least possibly producing a template from which new Idratek people can work from as a reference platform. For instance:
    - I have shortcut to Cortex log file on my desktop for quick double click to checking logs
    - I use the Ctrl+F in the logs for error usually to grab what's going on
    - Also a shortcut to Event Viewer
    - also I guess I should (but haven't so far), have some form of monitoring or at least some event notification taking place if the Cortex PC is running low on disk, memory, or possibly even high temp (if that exists)
    - also what's the preferred OS people are using.
    - and I suppose I should setup proper error handling from Cortex also
    - disable auto updates and only update Cortex, OS, anti virus etc when necessary or at least vetted and confirmed to be ok?

    I think it'd assist greatly and also I'd like the opportunity to be able to pool efforts in if there was a reference platform, at least we could help others out for those who might go to newer versions of Cortex, system updates etc and if there should be any gotchas and therefore to hang back? Especially for those on the forum who have more resources (such as VMs to support easier roll back vs those who are running a single PC, where if you somehow trash the system you're completely stuck).

    I should add, my own strategy has been:
    - single dedicated PC (as I deem Cortex to be a Priority 1 device ie the function it serves is too important since in my case it covers climate control for babies etc)
    - regular copy to separate NAS of the Idratek Database
    - I also have an identical cold spare PC to be used in the event of catastrophic failure. It's a mirror image, but would only require a database copying to

    As of this morning however I've determined it's a flawed as:
    - the cold spare hasn't had it's Cortex updated, so actually couldn't be enacted at time of failure as the production database relies on a minimum version of Cortex (which I'm not 100% sure where to find what minimum version is required actually)
    - Node 0 doesn't have lighting which doesn't rely on Cortex lighting, so I was fumbling in the dark!
    - also what bothers me, is that if I run the cold spare up to the same level as Production Cortex PC, it could very well be that a bug which kills the Production PC, is introduced into the cold spare. So I suppose best practice would dictate to leave it one compatible revision older??

  5. #5
    Automated Home Sr Member mcockerell's Avatar
    Join Date
    Jan 2009
    Posts
    71

    Default

    For what it's worth:

    I used to have problems with our Cortex box hanging occasionally (WinXP SP3) which I suspected were due to hard disk reliability problems.
    I switched to a solid-state disk last Christmas and, touch wood, have experienced no problems since.
    The system is backed up automatically every night to our home server, and I also 'lock' a backup when I update Cortex or make significant changes.

    One useful side effect of using the SSD is that the system boots very quickly now.
    I disabled Windows Update some time ago - on more than one occasion an update prevented Cortex from running.

  6. #6
    Automated Home Ninja marcuslee's Avatar
    Join Date
    Dec 2009
    Posts
    261

    Default

    Also Karam, Viv, if you get to reading this; Oddly post crash in the morning, there were some parts of the house which had their QLD channels lit (where previously when we went to bed, they were off).

    Not sure if there's any insight as to how/why that should be?

    Marcus

  7. #7
    Automated Home Sr Member
    Join Date
    Jul 2004
    Posts
    55

    Default

    Hi Marcus,

    Interesting to read your backup precautions. My PC occasionally BSODs, but always seems to restart itelf (not sure if this is a w7 thing or I've got lucky with a setting somewhere!).

    On the one ocasion so far that the PC didn't restart itself reflex was plenty capable enough to allow lights to be turned on and off presence to work at a basic level etc. Your story reminds me I must at the very least auto generate the reflex vectors again. I probably should put the effort in to understand reflex more fully too!

    Peter.

  8. #8
    Automated Home Legend Karam's Avatar
    Join Date
    Mar 2005
    Posts
    818

    Default

    Marcus, do you have defibrillator running on your machine? If the problem affected Cortex in the first instance then defibrillator will have tried to reboot the PC and possibly something might have gone AWOL after that. However in that case defibrillator will also have written a log file with defibrillator in the name before trying the reboot. So I guess either it wasn't running or whatever crashed the machine was some more global event.

    We noticed a couple of bugs in 26.7.2 - one was: having moved ourselves to a newer version of jquery (Cortex mobile related) we found that sliders in the Cortex mobile interface didn't seem to update values correctly so we have rolled back to a previous version. Also a bug was introduced which meant that Cortex would immediately fail an object if it failed to receive a communication acknowledge at first attempt no matter the retry settings in network supervisor. The symptoms were then that people suddenly started finding Cortex reporting failed modules (flashing icons) every so often where their system had been running just fine before. So there is an update now to correct these - 26.7.3 incase you haven't noticed it yet.

    Regarding running Cortex on other platforms, I can't say I'm an expert on the details but yes Cortex is a 32 bit application and as to whether it would benefit from a 64bit environment I'd say would have more to do with other processes that are running on that platform which can take advantage of that - in other words indirect speed benefits for example. We have a test installation running on Windows 8.0 (don't dare to try the upgrade to 8.1 just yet from the various reports I've heard). There were a number of initial hurdles to overcome mainly relating to more pernickety program privilege requirements but otherwise the installation has been running fine for a couple of months now. Incidentally the machine in question is an HP Pavillion laptop running an i3 processor which was purchased at the time for around 320 inc VAT. When you consider you also get a screen and integral battery backup IMHO its quite good value and needless to say the performance relative to its otherwise reliable for over two years Xp predecessor was very noticeable - especially so when it came to camera handling, the feel of the Cortex mobile interface and the overall machine. There is still plenty to irritate with Windows 8 itself (IMO) but seeing as the machine is dedicated to Cortex that doesn't really matter much.

  9. #9
    Automated Home Ninja marcuslee's Avatar
    Join Date
    Dec 2009
    Posts
    261

    Default

    Quote Originally Posted by m****erell View Post
    I used to have problems with our Cortex box hanging occasionally (WinXP SP3) which I suspected were due to hard disk reliability problems.
    I switched to a solid-state disk last Christmas and, touch wood, have experienced no problems since.
    I've also considered SSD, given the light requirements of Cortex + OS not being significantly large (and therefore SSD price worthy!). And whilst lack of mechanical failure = good, I was wondering about SSD failure. I'm aware of a colleague's use of SSD for syslog server and he believed it came to complete failure due to small file write nature, which I thought Cortex might end up doing.

    Quick re-read, I see there's been gains in SSD reliability, but not sure if it's sufficient? Hence I was going to live through slower boots for a trade for reliability.

    The other thing being is that I don't think faster execution after boot, helps performance too much (at least on a non Chris Hunter sized Idratek installs!), since Idratek serial comms manner I believe is bottle neck?

    Quote Originally Posted by m****erell View Post
    The system is backed up automatically every night to our home server, and I also 'lock' a backup when I update Cortex or make significant changes.

    One useful side effect of using the SSD is that the system boots very quickly now.
    I disabled Windows Update some time ago - on more than one occasion an update prevented Cortex from running.
    Could I ask if this is a manual backup? And is it just the Cortex database or something more?.

    Agreed with windows Update!



    Quote Originally Posted by pbj View Post
    Interesting to read your backup precautions. My PC occasionally BSODs, but always seems to restart itelf (not sure if this is a w7 thing or I've got lucky with a setting somewhere!).
    Indeed which is what I found so disturbing in this particular black out. I'm used to BSOD, and then some sort of dumps which can take an eternity, but they usually do come back to life. But in this case it was 2+ hours in and it was still stuck :-(

    Quote Originally Posted by pbj View Post
    On the one ocasion so far that the PC didn't restart itself reflex was plenty capable enough to allow lights to be turned on and off presence to work at a basic level etc.
    You're absolutley right. For the readers of thread. I should concur with this, DFP mapped light buttons will continue to work. Unfortunately for me though, heating doesn't.



    Quote Originally Posted by Karam View Post
    Marcus, do you have defibrillator running on your machine? If the problem affected Cortex in the first instance then defibrillator will have tried to reboot the PC and possibly something might have gone AWOL after that. However in that case defibrillator will also have written a log file with defibrillator in the name before trying the reboot. So I guess either it wasn't running or whatever crashed the machine was some more global event.
    Thanks for reply Karam, and it is running (as part of standard install, so untouched in that regard), and indeed no logs, so at least as you say it alludes to a global event.

    Quote Originally Posted by Karam View Post
    We noticed a couple of bugs in 26.7.2 - one was: having moved ourselves to a newer version of jquery (Cortex mobile related) we found that sliders in the Cortex mobile interface didn't seem to update values correctly so we have rolled back to a previous version. Also a bug was introduced which meant that Cortex would immediately fail an object if it failed to receive a communication acknowledge at first attempt no matter the retry settings in network supervisor. The symptoms were then that people suddenly started finding Cortex reporting failed modules (flashing icons) every so often where their system had been running just fine before. So there is an update now to correct these - 26.7.3 incase you haven't noticed it yet.
    I haven't got using Cortex Mobile extensively yet, and no, havne't had failed modules, so indeed haven't had to move to 26.7.3 yet.

    Quote Originally Posted by Karam View Post
    Regarding running Cortex on other platforms, I can't say I'm an expert on the details but yes Cortex is a 32 bit application and as to whether it would benefit from a 64bit environment I'd say would have more to do with other processes that are running on that platform which can take advantage of that - in other words indirect speed benefits for example. We have a test installation running on Windows 8.0 (don't dare to try the upgrade to 8.1 just yet from the various reports I've heard). There were a number of initial hurdles to overcome mainly relating to more pernickety program privilege requirements but otherwise the installation has been running fine for a couple of months now.
    A couple of months? hmmm... well I'd trade a couple of months Win8 install vs another user's report of Win7 install known to be running for couple of years.

    Just kidding of course, thanks for feedback. It does sound like either platform will suffice, though realistically speaking other than faster boot with Win8, I would probably put forward Win7 for the time being, until the leap-aheaders like yourself etc have come back and let us all know Win8 can hit the same reliability benchmarks :-S :-D

    Quote Originally Posted by Karam View Post
    Incidentally the machine in question is an HP Pavillion laptop running an i3 processor which was purchased at the time for around 320 inc VAT. When you consider you also get a screen and integral battery backup IMHO its quite good value and needless to say the performance relative to its otherwise reliable for over two years Xp predecessor was very noticeable - especially so when it came to camera handling, the feel of the Cortex mobile interface and the overall machine. There is still plenty to irritate with Windows 8 itself (IMO) but seeing as the machine is dedicated to Cortex that doesn't really matter much.
    I should add I'm somewhat agreed on this. As a comprimise (vs running enterprise servers etc), this is what I go for also:
    - being a notebook, it's already been tuned with low power consumption in mind (for those of us energy conscious)
    - it's a quick swap hard drive if need be - Thinkpads, and I've seen others, have a 1 screw removal hard drive swap
    - in my case I go for Thinkpads as it is tested to destruction, but also that aside (as I don't advocate throwing laptops around to begin with!), they also offer onsite hardware repair. And I think it's 3 years, and it's global regardless of where it was purchased to where it's now homed (I've authenticated this), and regardless of whether you purchased it new / second hand / however you came to be the owner of it (!!), as it tracks the machine, not owner (also authenticated).

  10. #10
    Automated Home Legend Paul_B's Avatar
    Join Date
    Jul 2006
    Location
    Eastbourne, UK
    Posts
    604

    Default

    Just an update on my experience of SSD's. I originally purchased an OCZ and had no end of problems including BSoD. I then purchased a Intel SSD and haven't had any problems.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •