Results 1 to 3 of 3

Thread: Motion & speech recognition ...

  1. #1
    Automated Home Legend chris_j_hunter's Avatar
    Join Date
    Dec 2007
    Location
    North Lancashire
    Posts
    1,675

    Default Motion & speech recognition ...

    'thought Karam's 15th Anniversary article for AH (17th August '11) was first-rate & very thoughtful :

    http://www.automatedhome.co.uk/Annou...m-Idratek.html

    not surprisingly, perhaps, his plea for better speech recognition & presence-sensing possibilities brought to mind the MS Kinect sensor ...

    there've been lots of reviews & comments on it - hmm, it clearly has some short-comings, but it's really not that expensive & it's proved v.popular so, surely, MS must be working seriously to improve it ...

    not only that, though, they've created an SDK to open it up to wider development & other uses, beyond gaming ... here :

    http://research.microsoft.com/en-us/...cts/kinectsdk/

    ie: there's no need to hack (OK it's currently beta, and non-commercial, but a commercial version is promised)

    reviews have said the speech aspect works well, lag & accuracy less well, and it's said to need space - eg:

    http://www.techradar.com/reviews/gam...view?artc_pg=4

    and :

    http://www.techradar.com/reviews/gam...view?artc_pg=3

    of course, that review had games in-mind - but, in an HA context, its lag & accuracy limitations might not be so serious, and its covering relatively large spaces sounds more like a plus than a minus ...

    and the projects reviews linked-to in the SDK link (above) seem to confirm it might have a lot of relevance to HA :

    http://channel9.msdn.com/coding4fun/kinect/

    there's an overview here (of course) :

    http://en.wikipedia.org/wiki/Kinect

    and there's a lot more insight here :

    http://research.microsoft.com/en-us/..._KinectSDK.pdf

    (see page 11 therein for a good summary)

    its downsides, perhaps, are that it connects via USB, and the motor & fan will make a noise ... plus, while the practical ranging limit of 1.2–3.5m (0.7–6m for tracking) maybe isn't so bad, the 640 480 resolution is in-combination with an angular field of view of 57H x 43V ( 27 up or down from the motor), meaning (from an HA point of view) rather a lot of it might be overly given-up to floor & ceiling and/or foreground activities (depending on what's going-on) ...

    hopefully future updates could provide a way to focus-in - ie: a way to have the full resolution applied to just a defined zone of the field of view, to maximise the usefulness of its limited resolution ...

    OK, just some thoughts !
    Last edited by chris_j_hunter; 22nd August 2011 at 12:46 AM. Reason: clarity

  2. #2
    Moderator toscal's Avatar
    Join Date
    Oct 2005
    Location
    Near Alicante Spain
    Posts
    2,014

    Default

    Speech recog, is a kind of Holy Grail for HA.

    I remember using dragon dictate for a while, and this ran on a pentium 133Mhz processor. It was quite good, if you took the time to train the software to your voice, but the lag was awful. I could have typed it quicker. For single commands like "computer run word" the lag was ok.
    Later on I tried the Speech recog in Homeseer. Much faster PC etc , so I expected it to be better, but once again you needed to train the software to your voice. But without any training it still recognised the action about 50% of the time. With training I would say it went up to 80 or 90% of the time. But switch the TV on or have music playing, and we back to 40 or 50%. The simpler the command the better it was at excuting the command, ie " computer bathroom on ", but say "computer turn on bathroom lights" it was a bit hit and miss. I also had the startrek computer voice giving me a reply "initiaitng program", quite cool when it worked.
    I know some people have had better success with bluetooth type earpieces or very high quality microphone placed in a room. Location of these is critical, as is background noise. One way to help counter background noise (this is the noise in the room without TV on etc), is to have more than one micro phone and use some form of noise cancellation hardware (like in headphones) or software.
    It is possible to add the two microphone outputs together to improve sound quality, since a lot of background noise is random adding them together will essentially cancel a lot of that noise out and amplify the needed sound ie your voice. Problem once again is when you have the TV on or are playing music, it will also amplify this as well. You could also apply filtering so anything below 20hz and above 20kHz is automatically removed. All this has to be done in real time.
    Adding more microphones may be answer and you get a signal improvement of (square root of 2) * the total number of microphone used. (number of microphones used has to be 2 or more).
    Having more than 2 microphones in different locations may improve the sound quality. But this will be a kind of trial and error approach.
    The HA system could have a built in noise generator to help with placement of microphones, you could have one located at the TV or speakers and then use this to subtract from the main microphone which is listening to your voice. But once again doing this in real-time is all costly and expensive.
    IF YOU CAN'T FIX IT WITH A HAMMER, YOU'VE GOT AN ELECTRICAL PROBLEM.
    www.casatech.eu Renovation Spain Blog

  3. #3
    Automated Home Legend Karam's Avatar
    Join Date
    Mar 2005
    Posts
    819

    Default

    Thanks Chris.

    I used to find that I could do reasonably well a couple of meters away from our panels using the MS SAPI4 engines without training and could even get away with the TV being on but not so loud. Much better via the phone and the phone recognition engine - no doubt because the bandwidth was constrained and also you had the microphone right next to your mouth. But on the other hand my colleagues had less success. Now that we have SAPI5 the situation is reversed. No doubt it is a combination of factors such as sound quality, engine suiting your particular voice, and tweaking the vocabulary to help the engine.

    Perhaps trying to locate the speaker and then doing some clever phase angle beam steering might be a way to improve the sound to noise ratio. Maybe the kinect does this or maybe just relies on better suited audio hardware and speech pocessing algorithms. Either way if its speech recognition is reliable then that alone may make it useful without even the gesture control aspects.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •