Generic data store - what to use?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jpdw
    Automated Home Guru
    • Oct 2007
    • 169

    Generic data store - what to use?

    Hi,

    What's a good tool in which to store data collected from around the house - i.e database or one of the noSQL type tools.
    I was wondering what other people might use / would recommend (or are thinking about or would avoid).

    The data I want to store is from a number of sources including 3 or 4 DIY temperature sensors and electricity usages read from a currentcost envi. The current cost is currently just 'whole house' but will soon add at least 1 "appliance" monitor. We also have Idratek's cortex running our heating and I already have that exporting heating zone status and house occupancy, so would like to store that too. If suitable I'd also add broadband stats (something else I already monitor elsewhere).

    Once collected I want to be able to pull out data for graphing & comparison, some some relatively simply querying but nothing exceedingly complicated I dont think (so a mark against relational DB and towards a simpler nosql method perhaps). I'm not sure yet what tool I'd use.

    I also don't want to lose data definition of older data in the way that I think RRD would do (i.e older data become averaged). I'd rather know that I've got "full-definition" rolling data back X days (or months) etc.

    At the moment I've got some of these things going into a mysql database (and some rudimentary graphs for some of the sensors) but at the moment the schema is a column per sensor source so requiring a schema change each time I add something. Fixable of course, but it made me start to wonder if a full relational database is the best way to go for easily adding 'disparate' data.

    I'm completely happy with creating the integration - I'm already doing some with the mysql albeit with some currently hacky mix of python and php. I figure pretty much any data store would support either anyway (though I'd probably implement is all in python). Similar with the graphing, a nice full-fat solution that you just point at a data source would be good (a la visage at RRDs) but I might not be averse to creating the layer to retrieve the data from the store and use a graphing framework of some sort to do the actual rendering (what I have at the moment uses rgraph, for instance).

    Any thoughts? Has anyone done something similar? I'm sure these no right or wrong answer here!

    /jon
    Jon
  • steve_the_greek
    • Oct 2013
    • 1

    #2
    I'm probably in a similar boat to you - I collect data from one wire sensors, wireless (weather station, electricity monitor, homeeasy), solar controller, and web scraping.

    I used to use rrdtool, but got fed up with having to recreate the rrd every time I added a new sensor, and it wasn't easy (or quick) to extract data in the way I wanted.

    I now store my data in a mysql database. I have two tables, sensor_names which contains the human readable name of the sensor with its auto-generated unique id. So, that contains 20 odd sensors at the mo. The big table is the data table. Each entry in this table contains a unique id associated with that table entry, a timestamp for when the sensor value was taken, the id of the sensor (matching the sensor_names table entry), and the actual value of the sensor.
    This table layout minimises the amount of space taken by each entry and you don't need to redo the schema every time you add a sensor, and you can handle sensors with varying update rates, with the disadvantage that you end up with a lot of entries in each table. As it grows, I suspect it will get slow, so I'll have to decide whether to start purging or archiving data, or do something with indexes (if it's possible or helps). Anyway, here's some sql to give you an idea:


    mysql> show tables;
    +-----------------------+
    | Tables_in_environment |
    +-----------------------+
    | sensor_names |
    | sensor_values |
    +-----------------------+
    2 rows in set (0.00 sec)

    mysql> describe sensor_names;
    +-------------+-----------------+------+-----+---------+----------------+
    | Field | Type | Null | Key | Default | Extra |
    +-------------+-----------------+------+-----+---------+----------------+
    | sensor_id | int(5) unsigned | NO | PRI | NULL | auto_increment |
    | sensor_name | varchar(64) | NO | PRI | NULL | |
    +-------------+-----------------+------+-----+---------+----------------+
    2 rows in set (0.01 sec)

    mysql> select * from sensor_names;
    +-----------+-----------------------------------+
    | sensor_id | sensor_name |
    +-----------+-----------------------------------+
    | 1 | Oregon:Lounge:Humidity |
    | 2 | Oregon:Lounge:Temperature |
    | 5 | OWL:Instant |
    | 6 | OWL:Total |
    | 7 | Oregon:Patio:Humidity |
    | 8 | Oregon:Patio:Temperature |
    | 10 | Oregon:Rain:Rate |
    | 11 | Oregon:Rain:Total |
    | 12 | Oregon:Wind:Average |
    | 13 | Oregon:Wind:Gust |
    | 14 | Oregon:Windirection |
    | 17 | Temperatures:OWFS:Garage |
    | 34 | Temperatures:BBC:Lymington |


    mysql> describe sensor_values;
    +--------------+---------------------+------+-----+---------+----------------+
    | Field | Type | Null | Key | Default | Extra |
    +--------------+---------------------+------+-----+---------+----------------+
    | value_id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
    | value_time | int(11) | NO | | NULL | |
    | sensor_id | int(5) | NO | MUL | NULL | |
    | sensor_value | float | NO | | NULL | |
    +--------------+---------------------+------+-----+---------+----------------+
    4 rows in set (0.00 sec)

    mysql> select * from sensor_values where sensor_id=2 order by value_time desc limit 10;
    +----------+------------+-----------+--------------+
    | value_id | value_time | sensor_id | sensor_value |
    +----------+------------+-----------+--------------+
    | 10368200 | 1382355001 | 2 | 20.2 |
    | 10368174 | 1382354701 | 2 | 20.2 |
    | 10368146 | 1382354401 | 2 | 20.2 |
    | 10368118 | 1382354101 | 2 | 20.2 |
    | 10368090 | 1382353801 | 2 | 20.2 |
    | 10368062 | 1382353501 | 2 | 20.2 |
    | 10368033 | 1382353202 | 2 | 20.2 |
    | 10368007 | 1382352901 | 2 | 20.2 |
    | 10367979 | 1382352602 | 2 | 20.2 |
    | 10367951 | 1382352302 | 2 | 20.2 |
    +----------+------------+-----------+--------------+
    10 rows in set (0.08 sec)
    mysql> select * from sensor_values order by value_time desc limit 10;
    +----------+------------+-----------+--------------+
    | value_id | value_time | sensor_id | sensor_value |
    +----------+------------+-----------+--------------+
    | 10368252 | 1382355308 | 42 | 45.5 |
    | 10368251 | 1382355308 | 45 | 51.37 |
    | 10368250 | 1382355306 | 35 | 21 |
    | 10368249 | 1382355305 | 41 | 20.87 |
    | 10368248 | 1382355305 | 43 | 16.31 |
    | 10368247 | 1382355305 | 40 | 18.62 |
    | 10368246 | 1382355304 | 39 | 0.1 |
    | 10368245 | 1382355303 | 34 | 16 |
    | 10368244 | 1382355302 | 38 | 0.4 |
    | 10368243 | 1382355302 | 17 | 19.25 |
    +----------+------------+-----------+--------------+
    10 rows in set (0.44 sec)

    mysql> select sensor_name, from_unixtime(value_time), sensor_value from sensor_values sv join sensor_names sn on sv.sensor_id = sn.sensor_id order by value_time desc limit 10;
    +-----------------------------------+---------------------------+--------------+
    | sensor_name | from_unixtime(value_time) | sensor_value |
    +-----------------------------------+---------------------------+--------------+
    | Temperatures:Quasar:Boiler Output | 2013-10-21 12:35:08 | 45.5 |
    | Temperatures:Quasar:Hot Water | 2013-10-21 12:35:08 | 51.37 |
    | Temperatures:Quasar:Bedroom 1 | 2013-10-21 12:35:05 | 20.87 |
    | Temperatures:Quasar:External | 2013-10-21 12:35:05 | 16.31 |
    | Temperatures:Quasar:Attic | 2013-10-21 12:35:05 | 18.62 |
    | Weather:River:Meerut Road | 2013-10-21 12:35:04 | 0.1 |
    | Temperatures:BBC:Lymington | 2013-10-21 12:35:03 | 16 |
    | Weather:River:Brockenhurst | 2013-10-21 12:35:02 | 0.4 |
    | Temperatures:OWFS:Garage | 2013-10-21 12:35:02 | 19.25 |
    +-----------------------------------+---------------------------+--------------+
    10 rows in set (0.71 sec)


    For displaying the data, I use jpgraph, which is definitely a bit more work than rrdtool due to having to write the php, but I think the end results are easily as good.
    Last edited by steve_the_greek; 21 October 2013, 01:52 PM. Reason: forgot to mention graphing

    Comment

    • jpdw
      Automated Home Guru
      • Oct 2007
      • 169

      #3
      I thought I should come back and add to this thread since steve_the_greek replied & since I've thought about this some more...

      steve_the_greek's comments got me questioning again why I was looking toward the 'nosql' type thing rather than a traditional DB.

      What I've come up with doing (and am part way into implementing) is:
      - a traditional DB (I've chosen postgres instead of mysql for some reason)
      - a schema that's pretty flat -- the main table is pretty much just : sensor_id, timestamp, value
      - a generic web API to access the DB -- I'll say it "nods" towards being RESTful but I'm sure knowledgeable people would say it's not
      - using simple json payloads for URLS that POST or GET data to/from the DB (json looks easier than XML to do in low-power 8-bit embedded devices)
      - crafting the API to be as non-application-specific as I can (it's "just" an http front to key/value store)
      - all data providers (i.e. sensors) and data consumers (web frontend, other integration stuff) access the db via the api
      - using python as the 'glue' -- 'flask' framework for the web-server to provide the API, pyscopg2 to access the DB behind it (though I'm sure PHP amongst others would be equally capable)

      This means that the data providers (first off being a rasppi grabbing electricity stats from the currentcost envi) simply make a POST call to http://<server>/ds/data with a json payload containing the sensor ID and the value - the simple "lobbing" of data mentioned above. I know the RaspPi can do much more and could run real db clients OK, but the simplicity is really aimed at the 2 nanodes I have. Additional URLs allow querying/manipulating a mapping of sensor ID to name/location or pulling back subsets of the stored data.

      One downside is that I can see the main flat data table will balloon in terms of number of records. A dozen (say) data points recorded once per minute would result in 17k records a day, 120k per week, 6.8 million in a year....

      So in conclusion, I think I'm getting the simple key/value interface I wanted but still using a full-fact DB.
      Jon

      Comment

      • jpdw
        Automated Home Guru
        • Oct 2007
        • 169

        #4
        I thought I'd add to this now a couple of months has gone by and I've got quite a bit running.

        After the comments on here got me thinking I created the relatively "flat" schema in postgres with a central table that has just a sensor id, timestamp and value - similar to what steve_the_greek posted. Only 2 associated tables to help reference the sensor id.

        I've given myself the "rule" that the only thing allowed to access the database is an API, sensors providing the data to be stored have to access via that, using simple POST calls. Similar for retrieving data, using GETs. At its most basic it means that readings from lower sensors can be as simple as this:

        POST http://hostname/ds/data/<sensor_id>
        Payload: [{ "value": 340 }]


        GET http://hostname/ds/data/2?qdp=5&ts [ sensor#2, 5 data points, and timestamp ]
        Payload:
        [{"ts": 1388883391.83042, "value": 342.0}, {"ts": 1388883402.95736, "value": 342.0}, {"ts": 1388883408.7056, "value": 340.0}, {"ts": 1388883414.18294, "value": 340.0}, {"ts": 1388883419.84238, "value": 340.0}]


        This is all purely APIs, nothing pretty. To make use of any of this data I've got a separate set of calls that retrieve data from the datastore (i,e. make http GETs like above), do whatever repackaging is needed and present a couple of web pages - mainly a live updating "energy consumed now" chart and an overall status page (energy now, outside temp so far... heating zone status to be added next... then Sonos etc). The idea is that this "application" layer will get data from wherever, the datastore is just one source (sonos / DLNA traffic will be another... one day).

        The datastore writes were working so well that as an interim measure I've bodged my old php scripts (that received data and pushed it into mysql) to make a call to push the data into the datastore as well... the only reason I've not yet stopped putting the data into mysql entirely is because I've only so far implemented power usage charts using the datastore ... but it's so generic that I dont think it'll take long to switch that too.
        Jon

        Comment

        • toscal
          Moderator
          • Oct 2005
          • 2061

          #5
          Missed this thread.
          Many many years ago I helped implement a logging program that basically took 6 serial data feeds (these where being sent to rs232 dot matrix printers, the PC had what was called an octopus serial card), and input it to a spread sheet.
          The data feeds came in to the pc via 6 hyperterminal windows. Since the data in each window wasn't continuous, we got a print out every 15 seconds or so, and some feeds only printed out when there was an error. So using a VB program which was triggered every time serial port 1 got data. This then scanned the recent events in the data windows, and checked to see if there where any errors and then using some Excel macros put the important data into an excel spread sheet.
          This was done in the days of Win 95.
          Just thought I would share.
          IF YOU CAN'T FIX IT WITH A HAMMER, YOU'VE GOT AN ELECTRICAL PROBLEM.
          Renovation Spain Blog

          Comment

          Working...
          X