Mass communication loss overnight?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Jemster
    Automated Home Guru
    • Dec 2018
    • 123

    Mass communication loss overnight?

    I came down this morning to a communication failure on the EvoHome controller. When I cleared the message, most (not all) of my zones, including hot water, are now reading '--' for the temperature.

    Took a look at the fault log and lots of communication loss at about 2:30 last night. A few have returned, most have not.

    I had breakfast, took another look and 4 more actuators had restored communication so, having to go to work, I hoped the rest would return. Looking at the app, I can see they haven't - it also seems to have overrides indicated on the zones that are back up and running so I've reset these remotely (probably!) but see that one of these hasn't 'stuck' and has gone back up to 20.

    I also note it's been calling for heat overnight. So when communication was lost, it seemed to think heat was required even though all the valves should've been shut down for the night

    Nothing weird happened during the night to the best of my knowledge. Normal day, no electricals on to speak of overnight. Anyone seen this or got any thoughts on getting it back and running with minimal disruption? Don't want to have to sit and rebind everything as nothing has changed, but not sure how to get it to pick everything up again?

    Guess I'll be working at home this afternoon to sort this mess out.
  • Jemster
    Automated Home Guru
    • Dec 2018
    • 123

    #2
    No explanations, nothing.

    Pulled the battery on the controller and put it back. Now, about 90 minutes later, all appears to have returned to normal. Although in between it did give me a variety of randomness, telling me zones had overrides (when they hadn't) that I had to manually cancel. I've a Loop Energy Monitor and I read somewhere that these also occupy the same frequency range for communication. I've unplugged it for now and will see what happens but it's been working alongside the EvoHome for the past month without any problems.

    Not very impressed at having to kick this thing up the arse with a physical battery removal to be honest. What if that happened while I was unable to get to the house (e.g. on holiday). Although I had remote communication with the controller through the app there was nothing I could do to resolve the issue - a reboot option would be nice. At least with the Loop Energy Monitor connected I had the pleasure of watching my bank account empty

    Comment

    • dty
      Automated Home Ninja
      • Aug 2016
      • 489

      #3
      I have Loop and Evohome (and ZWave) and they all operate in the same frequency range. Of these, the most likely to get disrupted is Evohome as it's not a very robust radio protocol (despite Honeywell's advertising) - most messages are just sent "fire and forget" with no acknowledgement. Having said that, the radios in these devices should be listening for other transmissions before they attempt to transmit, but there's still the chance that two things will try and begin transmission at the exact same time. The chances of a collision are therefore pretty slim, and I've certainly not observed any significant issues with having all these things chatting away at once.

      Comment

      • Jemster
        Automated Home Guru
        • Dec 2018
        • 123

        #4
        Good to know, I honestly have absolutely no idea what happened yesterday. The system has behaved itself since I restarted it in the afternoon. I haven't plugged Loop back in yet. Guess I'll do that tonight and hopefully everything will remain stable.

        What gets me is that if it were a collision happening, it would happen once and the next time around everything would be ok - chances of repeat collision are surely minimal. So if I eliminate the Loop from my enquiries that leaves me with either something blocking communications or that EvoHome gave up all on its own.

        If something physical was blocking the communications, then simply resetting the unit would not have caused the blocking to stop, so I think I can rule that out.

        Leaves me with EvoHome sh*t itself. But judging from the lack of replies to this thread, this is not a common occurrence either. Well, not to the extent that I saw - I have 9 zones + HW. I lost 6 of them. 4 upstairs, 1 split up and down (2 bathrooms), the other being the room with the EvoHome controller itself. The Hot Water came and went a couple of times through the morning.

        Comment

        • dty
          Automated Home Ninja
          • Aug 2016
          • 489

          #5
          I would agree with your assessment. I have 24 HR92, half a dozen assorted wall stats, etc., and I'm not aware of any significant problem ever. Sure, the odd thing might have been missed, but most of the system works on just repeating itself every 5-10 minutes anyway (this might be Honeywell's definition of "robust"!) so I wouldn't have noticed.

          Comment

          • Jemster
            Automated Home Guru
            • Dec 2018
            • 123

            #6
            UPDATE:

            Everything has been working faultlessly since ‘The Incident’. I’ve been checking fault logs religiously and in the last 6 weeks I’ve had one momentary lost hot water sensor but nothing else.

            This morning I arrived in at 2:30am to find the heating on. I looked at the controller and it was reporting a failure on a bedroom actuator. I check the fault log and it gave me...
            Bedroom 1: lost communication with sensor, lost communication with actuator (1 hr92 zone)
            Living Room: lost communication with sensor (3 hr92s and 1 dt92)
            The main display was showing overrides for 5 zones! (Nobody had touched anything on the system) and loss of temperature reading for 2 of them, the remaining 4 zones were all normal.

            I cleared all the overrides then went into the Living Room and saw that the DT92 was ‘off’ (no display). I popped a battery and put it back in and it re-synced.

            By this stage it was 3am and I’d had enough. Went to bed. This morning, I see one of the cleared overrides came back on 20 mins after I cleared it and had been causing demand all night. (Thanks Honeywell, let me know where you’d like me to send your gas bill)

            I’ve looked into it further with Domoticz and it would appear that the overrides happened, for the most part, at about 11:45 on a few zones. The DT92 disappeared offline at about 1:30am. I have noticed in both Incidents the DT92 has gone off... but given the times, I am not sure if this is a cause or an effect.

            What is really puzzling me is where these overrides are coming from. They have the clock symbol in the UI so aren’t coming from the phone app, they obviously aren’t coming from multiple hr92s simultaneously so how are they getting set???

            Twice in 6 weeks is not good after an initial 5 months of reliable operation and I’m unsure where to turn. I don’t know which bit of kit is faulty. There’s no obstructions, no weirdly placed objects, no changes in the house, just total random failure with total inability to manage the failure.

            Got to be either the controller, or that DT92 that goes off on its own... maybe...

            Comment

            • Jemster
              Automated Home Guru
              • Dec 2018
              • 123

              #7
              I've exported all my data from Domoticz and here's a timeline of what happened...
              At about 11pm all the radiators go back down to 10 degrees on their schedule and at this stage everything was running normally.

              23:20 Bedroom 3 - Set point unexpectedly goes up to 20 degrees
              23:25 TV Room - Set point unexpectedly jumps to 20.5 degrees
              23:45 Living Room - Set point unexpectedly goes up to 20.5 degrees
              23:50 Computer Room - Set point mysteriously goes up to 20 degrees
              00:30 TV Room - Set point drops to 20 degrees (!?!??!)
              00:50 Living Room - Set point drops to 20 degrees (!?! like... why???? I can kind of understand a random override but this indicates more than 1 override happened)
              01:25 Bedroom 2 - Stopped reporting temperature (This is the one I got the fault message for)
              01:30 TV Room - Stopped reporting temperature
              01:40 Bedroom 1 - Set point goes up to 20 degrees

              02:45 I arrived in and cleared faults and overrides...

              02:45 Computer Room - Set point goes back to 10 degrees when I cleared override
              Bedroom 1 - Set point goes back to 10 degrees when I cleared override
              Living Room - Set point goes back to 10 degrees when I cleared override
              Bedroom 3 - Set point goes back to 10 degrees when I cleared override
              TV Room - Set point goes back to 10 degrees when I cleared override

              02:50 Computer Room - Set point magically goes back to 20 degrees
              Bedroom 2 - Restored temperature readings
              02:55 Computer Room - Set point goes back to 10 degrees when I cleared override for a 2nd time
              03:10 TV Room - Starts reporting temperature again (I did a battery reset of DT92 about 15 minutes earlier)

              03:20 Bedroom 3 - Set point went back up to 20 degrees!
              03:20 TV Room - Set point went back up to 20 degrees!
              03:45 Bedroom 1 - Set point went back up to 20 degrees!


              Right now, re-reading this, I am having trouble not over-reacting to what I think of EvoHome and 'fit for purpose'. This is not a simple case of interference. This is total system collapse. Bear in mind that while this was going on, 3 other zones Plus the hot water were functioning normally.

              Please... Anybody... any ideas or theories??

              Comment

              • paulockenden
                Automated Home Legend
                • Apr 2015
                • 1719

                #8
                Is your Domoticz setup exposed to the outside world? If so, is it password protected?

                A rogue 'script kiddie' attack that probes forms could end up doing something like this.

                Have a look in domoticz/var/domoticz.log and see if there's anything suspicious around that time.

                (Actually the location of the logfile will depend on what you're running it on).

                P.

                Comment

                • Jemster
                  Automated Home Guru
                  • Dec 2018
                  • 123

                  #9
                  Originally posted by paulockenden View Post
                  Is your Domoticz setup exposed to the outside world? If so, is it password protected?

                  A rogue 'script kiddie' attack that probes forms could end up doing something like this.

                  Have a look in domoticz/var/domoticz.log and see if there's anything suspicious around that time.

                  (Actually the location of the logfile will depend on what you're running it on).

                  P.
                  Interesting theory. It's certainly not intended to be exposed to the outside world. I have just tried using my external IP and port 8080 but no response so I'm thinking it is correctly closed off. I've also ran an external port scanning check tool and it's not finding anything open. Does the domoticz server support PnP to set up the port forwarding? I've not seen any evidence of this on my router.... but I don't check it all that often.

                  I'm on OS X and according to my LaunchAgent, my log is meant to be at /var/log/domoticz.log and of level "normal" but I don't appear to have a log file at that location.... I've scanned my system but can't find any domoticz.log file That would've been very useful to see.

                  Chicken and egg. If I kill Domoticz I'll not be able to see it happening in such detail

                  Comment

                  • Jemster
                    Automated Home Guru
                    • Dec 2018
                    • 123

                    #10
                    Actually, I’ve just tried an override from Domoticz and that gives me the mobile phone override symbol on the EvoHome GUI. That’s not what I had, I had the clock symbol override that you get when changing it on the controller itself.

                    Comment

                    • DBMandrake
                      Automated Home Legend
                      • Sep 2014
                      • 2361

                      #11
                      Originally posted by Jemster View Post
                      UPDATE:
                      I cleared all the overrides then went into the Living Room and saw that the DT92 was ‘off’ (no display). I popped a battery and put it back in and it re-synced.
                      Evohome is notorious for poor battery connections. You need to re-tension the battery contacts. The contacts are (probably, in my opinion) made of chrome plated mild steel rather than using proper spring steel normally used for spiral battery terminals so over time they lose their tension and become intermittent.

                      Every single one of my HR92's has needed to have this done over the last 3 years because they became intermittent. Some have needed it done twice! My CS92A has needed it done, and so has one of my DTS92's which like yours decided to go blank one day even though the batteries were fine.
                      What is really puzzling me is where these overrides are coming from. They have the clock symbol in the UI so aren’t coming from the phone app, they obviously aren’t coming from multiple hr92s simultaneously so how are they getting set???
                      One possibility is the so called "phantom override" issue. Check for other threads about this, me and a few others have discussed this problem a few times. In essence it can cause the set point to be reverted to the previous set point. The workaround (there is no true fix) is to reboot all the HR92's in the affected zones preferably at random times in the hour, as the problem occurs when the bootup time of the HR92 within an hour clashes with set point changes.
                      Got to be either the controller, or that DT92 that goes off on its own... maybe...
                      The DTS92 going off is a fault but wouldn't cause all your other issues, and certainly wouldn't affect other zones.
                      Originally posted by Jemster View Post
                      I've exported all my data from Domoticz and here's a timeline of what happened...
                      At about 11pm all the radiators go back down to 10 degrees on their schedule and at this stage everything was running normally.

                      23:20 Bedroom 3 - Set point unexpectedly goes up to 20 degrees
                      23:25 TV Room - Set point unexpectedly jumps to 20.5 degrees
                      23:45 Living Room - Set point unexpectedly goes up to 20.5 degrees
                      23:50 Computer Room - Set point mysteriously goes up to 20 degrees
                      00:30 TV Room - Set point drops to 20 degrees (!?!??!)
                      00:50 Living Room - Set point drops to 20 degrees (!?! like... why???? I can kind of understand a random override but this indicates more than 1 override happened)
                      01:25 Bedroom 2 - Stopped reporting temperature (This is the one I got the fault message for)
                      01:30 TV Room - Stopped reporting temperature
                      01:40 Bedroom 1 - Set point goes up to 20 degrees
                      20 degrees is the default set point for an HR92 when it boots up. If you reboot an HR92 by removing the batteries and reinserting them it will jump to 20C. This will be registered by the controller as a local override (clock icon) as if the user had turned the HR92's dial to 20.

                      HR92's are very prone to random spontaneous reboots when the battery contacts start to lose their tension. You can see where I'm going with this...if an HR92 spontaneously reboots in the night you'll end up with a 20C override until your next set point in the morning. It's happened to me a few times.
                      02:45 I arrived in and cleared faults and overrides...

                      02:45 Computer Room - Set point goes back to 10 degrees when I cleared override
                      Bedroom 1 - Set point goes back to 10 degrees when I cleared override
                      Living Room - Set point goes back to 10 degrees when I cleared override
                      Bedroom 3 - Set point goes back to 10 degrees when I cleared override
                      TV Room - Set point goes back to 10 degrees when I cleared override

                      02:50 Computer Room - Set point magically goes back to 20 degrees
                      That sounds like either the Phantom override problem (which causes a revert to the previous set point) or an HR92 in the zone rebooted again - which would revert to 20C. Low battery/poor contact tension can cause spontaneous reboots when the motor tries to turn, especially if it turns all the way to the limit stop as that causes a sudden large current drain.

                      Bedroom 2 - Restored temperature readings
                      02:55 Computer Room - Set point goes back to 10 degrees when I cleared override for a 2nd time
                      03:10 TV Room - Starts reporting temperature again (I did a battery reset of DT92 about 15 minutes earlier)

                      03:20 Bedroom 3 - Set point went back up to 20 degrees!
                      03:20 TV Room - Set point went back up to 20 degrees!
                      03:45 Bedroom 1 - Set point went back up to 20 degrees!
                      Ditto.

                      Right now, re-reading this, I am having trouble not over-reacting to what I think of EvoHome and 'fit for purpose'. This is not a simple case of interference. This is total system collapse. Bear in mind that while this was going on, 3 other zones Plus the hot water were functioning normally.

                      Please... Anybody... any ideas or theories??
                      I would check the status of the batteries in all your DTS92's and HR92's and then re-tension their contacts, and make sure all HR92's/DTS92 have been rebooted and then finally reboot the controller itself.

                      The way I test the battery tension on the HR92 is simple but effective - remove the HR92 from it's base, hold it vertically in one hand, hold your other palm open facing up about 6" below the HR92 then bring the HR92 down with a medium force downwards "thump" into your palm. If there is any trouble with the battery contact tension the HR92 will spontaneously reboot. Often you'll find the slightest bump will reboot it and you'll wonder how it ever worked...

                      To tension the contacts you'll need a small flat jewellers screwdriver or similar to bend up the middle prong of the two bottom contacts by pushing the screw driver up through the base. Bend it up a couple of millimetres. Sometimes the little shorting bar at the top needs to be bent slightly as well, (it does a really poor job and is as soft as butter) and it's helpful to fit the two security screws as that helps to hold it down as normally it relies on the pathetic clip in the middle of the bar.

                      See how you go after that. Might just be a combination of many of the HR92's batteries starting to get low (but not low enough to show a fault log entry) and many of them starting to lose contact tension.
                      Last edited by DBMandrake; 28 April 2019, 10:11 PM.

                      Comment

                      • Jemster
                        Automated Home Guru
                        • Dec 2018
                        • 123

                        #12
                        DBMandrake - invaluable advice as always...

                        I believe all my batteries are strong (they are under 6 months old) but I will check. But the contacts, I’ve already had issues with 1 in the kitchen so it is possible. If a little disappointing given the amount spent on the system.

                        BUT my one niggle here is How has this happened to 5 zones Almost simultaneously (no we haven’t had an earthquake ). If it was 1 messing about, fair enough, but can 1 bad connection ‘infect’ many other zones? And more than slightly coincidental that the dts92 turns off at the same time. No? Just too much coincidence for me.

                        I’ll read up on the phantom override issue, maybe that explains the multiple-zone failure. Thanks for the pointer!

                        Comment

                        • Jemster
                          Automated Home Guru
                          • Dec 2018
                          • 123

                          #13
                          All batteries reporting 2/3 bars. All signal strengths are full and all pass the bash-on-hand test. The mystery continues...

                          Also, doesn’t the system round-robin every few minutes and send out the zone temperatures? I didn’t think it waited until the next set change?

                          Comment

                          • Jemster
                            Automated Home Guru
                            • Dec 2018
                            • 123

                            #14
                            Ahhh. Wow. Just been reading phantom override. Will reset it tomorrow avoiding the hour and half hours. What a crazy bug!

                            Comment

                            • DBMandrake
                              Automated Home Legend
                              • Sep 2014
                              • 2361

                              #15
                              Originally posted by Jemster View Post
                              All batteries reporting 2/3 bars. All signal strengths are full and all pass the bash-on-hand test. The mystery continues...
                              Hmm..
                              Also, doesn’t the system round-robin every few minutes and send out the zone temperatures? I didn’t think it waited until the next set change?
                              No, the controller only sends set point changes to HR92's when there is a change in set point. Additionally identical set points in a row do not send out redundant set point changes.

                              So if you have 5C at 11pm and then 5C again at 2am (which I do on downstairs zones, to catch any after 11pm manual override I make if I'm staying up late) then the second one won't be sent unless a manual set point change occurred between the two to some other temperature.

                              This not sending "redundant" set point changes also causes issues of it's own, if an HR92 set point override isn't registered by the controller properly it won't be cancelled. This is the case with a multi-room zone's - so if you have a multi-room zone with 5C at 11pm and 5C at 2am, and manually turn the HR92 up at midnight it will not be turned down to 5C again at 2am as the controller is not aware of the set point change and thinks there is nothing to revert! This can happen in single room zones occasionally if there are any minor loss of comms. (The workaround is to alternate "duplicate" set points in the schedule slightly, like 5.5/5.0 so that it always considers there to be a change in set point that needs to be transmitted)
                              Last edited by DBMandrake; 29 April 2019, 10:56 AM.

                              Comment

                              Working...
                              X