Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Sleeping Alexa Can Listen for More Than Just Its Name (ieee.org)
150 points by teklaperry on Feb 9, 2018 | hide | past | favorite | 168 comments


According to the article, "can" = has the capability to.

Isn't that obvious though? Any internet connected device with a microphone has the capability to always listen to what you say. Whether they choose to or not is up to the manufacturer, and you're placing your trust in them. And it's funny to me that people freak out about smart speakers when your cellphone is within listening distance of you all the time.


according to security principles, "can" = we must assume that it will or has been, given the proper motivation.

the iphone has a low-power asic whose sole purpose is to listen for "hey siri" with relaxed confidence intervals. the real voice processing isn't done until the main cpu is woken by this asic. because of this, we can identify discrete modes of operation in the iphone, one of which is incapable of doing arbitrary "spying".

based on the article, it sounds like the alexa doesn't have this affordance. while unsurprising in this case, i don't think we must accept that anything with an internet-connected microphone has the capability to always listen to what you say.


I'm not sure the distinction is as notable as you're making it. The iPhone as you describe it is perfectly capable of listening to everything you say, it just has to never release control of the mic to the low-power ASIC.

I'm also not sure the article actually indicates what you're saying, it's kind-of vague. For example, it never states that anybody was able to actually program Alexa with a new wake-word, just the possibility of being able to create different signatures that could be used as wake-words. And the description of how they prevent things like the superbowl ad could all be done after Alexa detects the wake-word wakes up the main processor, but before Alexa actually responds to the user. I don't see anything in the article that indicates Alexa can't be using a separate chip dedicated to listening to the wake-word.

To that end, not to long ago a Reddit user claimed that Alexa did have two separate modes it operated in [0]. Which, it is entirely possible they could be lying, but you could always just open the thing up and take a look. But even if it does have a separate chip my previous point about the iPhone still applies, so besides power-usage I don't think it is extremely notable.

[0] https://np.reddit.com/r/Showerthoughts/comments/7m91u9/if_go...


The other angle is that were Alexa actually recording and sending voice data frequently, it would show up in the traffic data. Unless they're smuggling past recordings along piecemeal along with actual recordings, traffic analysis would work.


Unless they are capable of doing the speech-to-text on-device, then the amount of data is negligible.

Hell, if they were really crafty they could have it listen for a specific phrase or word, then send back as little as a single byte signifying that it was heard.

(I own an echo and a few Google homes, and I don't actually think this is happening, but the idea is kind of fun to chase)


Yeah, that is an interesting idea. I'm not sure anyone is making an ASIC for speech-to-text right now, but if they were, it could enable this sort of thing. Might show up in a teardown, though. A General Purpise CPU could of course, do it, but that would push the bill of materials for the device up quite a bit.


> Unless they are capable of doing the speech-to-text on-device, then the amount of data is negligible.

What is the lowest bitrate codec and how does it compare to the plaintext version?


Well considering text is literally hundreds of bytes, I can't imagine audio would be able to keep up.


Take a look at the BoE calc that someone else did in this thread, it might shed some light on what is possible.


Alexa and all other "always-listening" devices do have the standard on-chip processing mechanism. The article is about how the trigger can be expanded from a single word to a broader range of sounds.


iPhone vs. Alexa, HomePod, Google Home: phones are often on battery power. They need a low-power mode for listening otherwise it will kill user batteries. A device plugged into an outlet? No incentive for power saving.


Even Amazon, with all the server capacity in the world, is not going to be able to store and process every sound heard by every Alexa in real time.


> is not going to be able to store

Voice codecs like AMR[1] (or GSM-HR) only need 2-3MB/hr. A trivial noise gate[2] that cuts out silence should allow a recording duty cycle under ~1%.

Amazon can easily store all the voice recording they want.

> and process ... in real time

Processing isn't that expensive once you have a trained classifier. However, realtime processing often isn't necessary.

[1] https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_audio_code...

[2] https://en.wikipedia.org/wiki/Noise_gate


I don't doubt that what you say is correct, but I think the author's point is that even with those optimizations, the sheer amount of data from [Every Alexa in the world] * [Every second of every day] is too much for any remote service to process. The wake word needs to be handled locally so that the remote service is only involved when needed. The amount of data for a single recording can be quite small, but multiplied by a large population it still adds up in a hurry.


> still adds up in a hurry

Never mind the wake word; a noise gate (which is a trivial computation locally in either hardware[1] or software) means we are not talking about [every second of every day]. I'm sure Amazon/whomever can do better than a simple noise gate, too.

Using GSM-HR which records voice at 5.6 kbit/s (a bit over 700 bytes/s).

    5.6 * 60 * 60 / 1024 / 8 = 2.4609375 MB/hr
Assuming a gate that only records 1% of the time, this is

    2.4609375 * 0.01 =   0.024609375 MB/hr
                     =  25.2 kB/hr
                     = 604.8 kB/day

    604.8 * 365.25 / 1024 = 215.72578125 MB/year
Assuming 40M devices[2], storing all the voice said around these devices for a year is only ~8 PB (8229.28 TB). Renting the necessary storage from Amazon AWS, from a brief look at their current prices[3] costs at most $200,000/month ("standard storage").

Yes, the storage costs add up over time, but the amount of data needed to record all the voice said around every Alexa in the world is trivially within Amazon's storage capability.

[1] voice activated tape recorders have been around for decades

[2] https://techcrunch.com/2018/01/12/39-million-americans-now-o...

[3] https://aws.amazon.com/s3/pricing/


Excellent response.


no, but its entirely feasable to store all spoken word transcripts.


since Siri is mostly processed remotely ("in the cloud") you can test this by trying Hey Siri in airplane mode. It will hear it's name but can't act any further.


Your most solid guarantee that your device isn't constantly spying on you is a battery.

Listening for words is power-intensive.


We’ve all had unexpected battery drains on our phones.

For me the most solid guarantee is the lack of a business case. Why risk the huge fallout of when this spying is inevitably uncovered when the profit to be had of listening in to people’s private conversations at scale is small. Individual private conversations might carry huge value, but discerning those from casual conversation at scale is next to impossible.


Recording at a low (but still intelligible) bitrate and uploading it when connected to power would be pretty easy on the battery.


I'm not familiar with the current state of the art in speech recognition, is there a low bitrate which is still intelligible to speech recognition engines, or is this still a domain that the human brain has enough of an advantage?


What do you consider to be low? Anecdotally, the Google app on my phone almost perfectly recognised, played over speakers, this fairly rapid dialogue between two people encoded as 6 Kbit/sec Opus.

https://ryanplant.net/love.wav

Its only mistake was missing the "you" in "you want to be."

Codec2 remains fairly clear down to human ears down to 2.4 Kbit/sec but I had no luck getting it recognised, even with much simpler and clearer samples.

This individual anecdote tells you nothing about the state of the art, but I wanted to note it anyway because I was astonished at how well 6 Kbit/sec is handled.


I'm aware I don't know what I'm talking about.

But the love.wav gives a bit rate of 768kbps, 6 sec of audio at 571 KB storage size.

Another poster also did the math on 5.6 kbit/s (700 bytes/s).

> 5.6 * 60 * 60 / 1024 / 8 = 2.4609375 MB/hr

But should that 5.6 not be 768 to get love.wav voice quality? What am I missing like a moron?

Thanks for the example and anecdote, something to actually grasp in trying to understand the problem.


The wav file is decompressed, so you can’t just look at the size of the file.


I'm not looking at the file size. I'm looking at the kbps the os reported for the wave file, which is 768kbps. Where 6 Kbit/sec mentioned.


The reported kbps is just the file size divided by the duration.


> Isn't that obvious though?

You'd think so, but no. I tried to explain this to some co-worker back when the Samsung TV issue came out. They didn't understand and literally made me out to be some conspiracy theorist. Their thinking is/was 'Why would X be listening to me if I didn't press the button or say the special word?' Same deal w/the cellphones. I've stopped trying to explain.


At least for me the reason I’m fine with my phone is that it’s a metered connection. If it was sending that much data to the mothership you’d probably hit the ceiling on your data plan really quickly. Since my home connection doesn’t have such a ceiling it’s a bit scarier.


Certainly getting into the conspiratorial, but telco providers have this concept of zero-rating for content they don't want applied to data rates. For example, there are zero-rating projects for stuff like calls to analytics platforms. So were a telco to have a zero rate agreement in conjunction with some eavesdropping system, it would not show up.


With decent audio compression you're talking something like 50 or 60 megabytes per hour of audio (half that in mono, come to think of it). Assuming your listener software is doing a little work to not send silence or unintelligible noise, I bet the data usage would be totally dwarfed by regular use of the phone.


With a low bitrate speech codec, you can get well down to 1 or 2 megabytes per hour.


Why would it need to send audio? Convert it to text in the device and send that.


You're right, but most of the real intelligence in the voice-to-text these devices are relying on comes from computation done on the server side, I believe. That way they get to constantly train the algorithm with new data, and so on and so forth. (the point still stands, either way: the network bandwidth these things use is not a big deal)


Yes, it is a metered connection, but unless you implement the metering, you don't know how much data you're leaking. For example, in some plans, Netflix data is not counted towards the plan.


Not necessarily, there are lots of phone service providers who except certain websites or services from the data tracking. Some of them Google services already -- Virgin Mobile in Australia excepted YouTube traffic for years.


Alexa should come with a physical switch on the top of it to turn the mike on/off. Not a software switch. It should also have an LED that glows when the mike is on.

If it had that, I'd buy one.


I've always asked for hardware safeguards, and I wish more people would demand them. I want the camera & microphone on my laptops & desktops to get their electrical power THROUGH an in-series LED, so it simply doesn't matter how the decision to start watching/listening is made, the device can't be powered on without making the LED glow. No other source of power except thru the LED.

There would also be a hardware switch in the same series. Turn it off, and no power can reach the device, no matter what the software decides. Manufacturers could easily make the circuit testable, so if you tell the machine to start using its camera, it could test the circuit and tell you that the switch appeared to be turned off and show you a little diagram to help you find the switch in case you accidentally switched it off and didn't even know it was there.

Most manufacturers who claimed to have this feature would do it correctly, because they would have to assume that somebody would disassemble a unit and damage their reputations if they faked it.


> THROUGH an in-series LED

That's a pain electronically since the brightness of the LED would vary depending on the current drawn by the camera and/or microphone. In order to not blow the LED, the circuit may have to be set such that the LED wouldn't be bright enough to notice reliably when the camera is in a lower power (but still functioning) mode.

You could easily get what you actually want by putting an LED in parallel with the power supply to the camera and/or microphone though. Then the LED would maintain a fixed brightness but wouldn't be able to be switched off unless the power to the camera/microphone is cut.

Separately, it's possible to have unpowered microphones. I suspect laptop microphones are of the unpowered type (and amplified elsewhere). It's still quite possible to have a circuit which is hardwired to be incapable of passing the microphone signal without lighting an LED.

I'm just saying that your "in series" demand is unrealistic, and instead you should prescribe what you actually want rather than specifying an electronic circuit that can't necessarily achieve it.


The original Amazon Echo has a physical switch when you mute it. A friend of a friend worked on that and he is a giant security nerd so he made sure a physical switch is designed in.


I just unplug it, and when I want to listen to music plug it in, problem solved!


Putting it on a bus strip with a power switch is definitely Plan B, but it's inconvenient.

My bluray player spends a lot of time fruitlessly searching for an internet connection when I put a bluray in it, and eventually it gives up and just plays the disk. I see no reason to connect it to the internet so it can spy on me.


Don't Blu-ray players have to get updates sometimes or certain discs won't play?


Technically yes, but due to how first generation Bluray players will never have updates to their DRM keysets, there have been extremely few discs that chose to employ new keys.

This may become a thing for 4k Blurays (since all 4k Bluray players are long since compliant with the over-the-internet key updating mechanism), but unlikely because most manufacturers still refuse to update anything after warranty ends.

Ergo, the Bluray industry doesn't want to tie themselves to a model where people have to buy new players every 3 years: people will just stop buying discs instead.

The other OTHER side of this is: the disc is dead. Most people own very few discs, only of their absolute favorite movies they want in a high quality copy or for the few people who are bothering with 4k content (since most people still do not have enough internet to stream it); so why rock the boat, just let the format play itself out.


The disc may be dead, but most of the movies I want to watch are available from Netflix only with their disc service. I also borrow discs from the library.

But you're right that I'll only buy a disc for a special movie.


Its long boot-time makes this inconvenient, unfortunately.


The dot I was developing for today has a mute button and the LEDs show red when it's on. Pretty sure I've read that the switch disconnects the mic. Is this comment sarcastic?



Would you still trust that it's off when it says that it's off?


If you could open the case and verify that the switch does what it's supposed to do, why not?


Yup. That's the point of a physical switch.


Since the board schematic is presumably proprietary, and the board could be multi-layered, you'd have to do some fairly destructive investigating in order to confirm that the switch is actually doing what it says it is doing.


Not as long as it cuts the wire that breaks all connections between board and the microphone(s) (and board is somehow audited to not contain any extra microphones or otherwise sound-sensitive components).

<tinfoil hat on>But this is only as long as you actually can verify or trust that the switch is actually a switch and not a device that pretends to be one. A transparent casing where one can visibly confirm the actuator operation is a good idea.</tinfoil hat off>


If you want extra tinfoil hattiness, most multilayer ceramic capacitors (the surface mount kind that are everywhere on PCBs) are somewhat microphonic. Condenser microphones, especially electrets are basically just capacitors designed in a special way to maximize this effect.


Interesting! I know about condenser mics, but I thought a typical SMDs are way too small to sense anything useful. Had anyone experimented with this?


They generally are too small to be useful. That's why I called it "tinfoil hatty". Most of the issues with them come from mechanically coupled vibrations in a system (from fans and the like) flexing the whole circuit board, so it's not as big an issue with phones. Also most of the capacitors are power supply decoupling caps, so difficult to sense variations from.

It's mostly just a fiendishly annoying effect in high-sensitivity test equipment.


The problem with that is that you can turn speaker info a microphone:)


True, but a multi-pole switch can also cut connection to those.


I figure one of those teardown article/video people would tear it down and confirm.

If Amazon advertised it as a physical switch, and it was not, that is fraud and actionable.


> If Amazon advertised it as a physical switch, and it was not, that is fraud and actionable.

A physical switch is still a technically a physical switch, even if it's not connected to anything. It would come down to how they are describing what it does, and what guarantees they make (or not.) Do they have any, or is this discussion entirely hypothetical?


They need to advertise it as a physical disconnection of the microphone.

The switch doesn't need to be part of the main board --- it could be located next to the mic, so it's trivial for anyone with even the most basic electronics knowledge to see that it does what it says it does.


Well, that's trivial - the 'Echo' functionality is just an AWS service:

http://www.instructables.com/id/How-to-Make-an-Amazon-Echo-W...

On a Raspberry Pi or similar machine, you can hook the logic up to anything; a button, a switch, an infrared remote, whatever.


My echo has a switch, I can press it and it lights up red to indicate its muted.


The entire point of Alexa is to be able to talk to it from wherever you are, so doesn't walking up to it every time defeat the purpose? A regular speaker is cheaper and has much better sound quality.


And yet I still want the choice to lower the Cone of Silence when I want to.


> Alexa should come with a physical switch on the top of it to turn the mike on/off.

Axe should do it.


Can't you just unplug it?


In the end, we don't know what Alexa is listening to.

If we do dead-listing style analysis with teardowns and eeprom dumps, it can only tell us what it was listening to at that time.. and sometimes not even that.

How do we know what [.0076162328 3.27617819 91817.111121123 -65.2129] is listening to, especially when combined with complicated neural network weights? We can only check and verify every possible input, and that's effectively infeasible.

We chose not to buy these types of devices for our house, because of the great deal of unknowns.


You have good reason not to trust it or any device like it; proprietary software (which is always untrustworthy) + mic and/or camera + Internet connection equals spy device that doesn't respect your software freedom, pure and simple. You deserve to have as much control over your computers as you wish and no device like that will respect your privacy. The alleged convenience is highly overrated (everyone who has one has lived most of their lives without one and apparently been able to do just fine without) and, more importantly than any alleged convenience, not a good trade-off in light of paying with your freedom and privacy.

But the same can be said of the tracker devices residing in many pockets. Some call it a cell phone or mobile phone, but its primary job and the job it does most of the time is to geolocate or track itself. And a tracker's mic has no indicator when it's listening, or a way for the user to vet the captured data, or control where the captured data goes. Those trackers that also have cameras are even worse in this regard. And they all run non-free, user-subjugating, proprietary software. So none of them respect a user's software freedom or privacy. I suspect that those with user-unmodifiable batteries aren't truly ever off except for when all of the batteries run down. But most users never turn their trackers off anyhow, so that's a bit of a moot point.

What other freedoms can proprietors convince you to do away with in exchange for some minor convenience or possibly a little innumerate fear-baiting (such as "you'll want this tracker in the case of an emergency!").


That's my big concern as well. Hidden trigger phrases and self-incrimination issues (i.e., valid trigger phrase + sending potentially incriminating speech) - either separately (or worse, combined) are something I'd rather not worry about in my own home.


I've played around with microphone arrays with a Kinect and http://hark.jp and it's amazing what you can do even with 4 microphones.

-----------------

I can figure out where people are talking spatially.

I can figure out how people are moving around when speaking.

I can voiceprint people.

I can cancel voices individually, or mute all noise past a certain distance from the center of the mic array.

I can 3d map rooms by recording over time how people talk and how they move.

I can connect multiple nodes together to create wider areas of coverage of above.

I can convert all sounds to text if appropriate. I can send tiny messages that include voiceprint, position, and content.

I can tell how many people are in your house at any one time.

I can tell if you have friends over, and when. If they have similar system, can collaborate backend data across all of above

------------

So yeah, there's plenty of things people aren't thinking of. I doubt they even have the experience to know how possible it is with open source tooling and cheap hardware. But all of these things are possible, and my guess is being done.

The distinction with phones is hard. Apple has stayed within the "we support privacy" aspect. And with things like limeSDR being cheap and common, wouldn't be as hard to attack and prove. Android is tricky, but with some of the anti-crapware systems and basebuild efforts going on, can be considered possibly trustworthy (if edited).

And in general, phones move a lot. Amazon tunacan/tomatocan doesn't. Just add a $1 accelerometer, and detection of movement is trivial. Phones can have this, but doing the above would be a magnitude harder.


Remember the FBI's request for a "special" version of iOS? It doesn't matter what the current firmware does if it can be replaced as-needed over the network.


I'm surprised we haven't seen any warrants from police/FBI that would require wire tapping using one of these devices.


Not for wire tapping, but just in case you weren't aware, there was a case where law enforcement wanted all recordings from an Alexa in a homicide case[1]

https://www.cnn.com/2017/03/07/tech/amazon-echo-alexa-benton...




Exactly. Wouldn't we have to have access to all the relevant source code, both for the devices and the server-side components, to tell what Alexa and other home assistant devices are actually doing and storing?


And rolling hashes of every build. And a device to constantly cross-check the hashes. And a device to constantly verify that the cross-checker was working. And a workflow to show diffs of each commit. And time to read through them, grok them, and sandbox-test that they weren't doing anything nasty in an obfuscated way.

At the end of the day, you really can't trust anything completely, and the only real solution is to 1) trust Amazon (which doesn't seem like a great bet right now), or 2) don't own one.


Or 3) just don't worry about it


Isn't 3 the same as 1?


I considered the "don't buy" strategy, but I'm not willing to go all the way to the "don't buy a phone" point, where we've got the same situation, which leaves me with little other choices. :|


Siri can be turned off on iPhones. I don’t see how buying a phone gets you in the same situation as buying a device that has a voice-only interface.


Just because some UI element labeled "Enable Siri" can be toggled doesn't mean it works exactly as you, or even Apple devs, expect it to. And even phones without digital assitant technology can be turned into remote listening devices.


That is of course correct from a technical perspective, but not from a trust perspective. I can’t remember that Apple has done anything like that on purpose. Sure, devs get wild, people hack, but I’d never expect them to do this systematically. With amazon, I’m not so sure. But maybe that’s just me


I also can’t recall anything Amazon has done to question their honesty.

Can you point us to facts please?


No.

Because of TLS 1.2 and key pinning, we cannot see what's actually going on with all the data transfer between this hardware and the mothership/owner.

We can only do differential analysis, and say that X data was transmitted after doing X action. What that data entails, we do not know.

If you're happy with that, cool. I'm not.


So you want for Amazon to not encrypt their service?


Apple doesn't even let you turn off your own wifi

https://www.theverge.com/2017/9/20/16340460/apple-ios-11-con...


You can still turn it off in settings.


Ah, thanks. I wasn't aware of that


Yeah, exactly. You can tell the software to disable the feature, but the hardware is still wired up, and it could still be doing everything but the user-facing bits, or a future update could flip that on even if it's not presently doing it.

I have an Android phone and I trust Google somewhat less in terms of personal-info-collecting than Amazon. :|


I have an Android too but I lost all trust in Google when I realized the default keyboard sends my keystrokes to Google by default. And the number of times ive had to re-disable highly undesired defaults after an update is infuriating.


We don't know what alexa is listening to, but we do know that the data it transmits to the cloud is roughly proportional to the amount of time that you are talking to it.

https://www.iot-tests.org/2017/06/careless-whisper-does-amaz...


I've been thinking about this recently. I have multiple Echo's in the house with really good microphones, as well as smart smoke alarms with farsight[0] which senses motion. This could be turned into a pretty sophisticated home security system that could alert you when sleeping (wakes up the Echo in your bedroom) or out (notify you via the app) when suspicious activity occurs - like the window glass breaking example, or motion is detected when everyone has gone to bed or are out of the house.

With machine learning/AI systems these could potentially be more accurate than traditional home security systems.

[0] - https://nest.com/uk/support/article/What-is-Farsight


That page makes it seem like only the Thermostat has Farsight. You mention smoke alarms.

I do know that my Nest Protects (smoke alarms) have motion detection though. Very handy nightlight in the middle of the night. Though if tied into the sole source of an alarm system I know it'd be a problem because my dogs can activate the nightlights in the Protects.

Pets really make home security hard, I'd bet.


I don’t understand why this was downvoted


I was in a small group of people having an intimate (private) conversation in a church. Out of nowhere we were interrupted by a Windows computer sitting in a cabinet, connected to the booming AV system: "I'm sorry, I didn't get that. I'm sorry, I didn't catch that."

I hate these things.


I'm still baffled why anyone would put those voice-activated devices in their houses. Smartphones are bad enough, and I'm hoping to get rid of mine soon.


Same here. I told my roommate/landlord I'd be totally against a Alexa/Google/Siri mic/speaker in the house. I see so many people in tech who have them that it's scary.

I've thought about making my own open source version using something like Jarvis.


I have an interest in home automation and smart lighting but it looks like all the current systems need to be hooked up to wifi/cloud to do anything.

Are there any systems out there that you can hook up via ethernet and hook up to your own server?

Things like having bedroom lights change color temp on time of day. perhaps stepper motor control of blinds. I really don't know why my lights need to be hooked up to the cloud via wifi for things like that.


Look at home assistant as a hub. It's a fully open source python-3 based system that runs happily on a raspberry pi. It has many ways of interfacing with just about every device including local-only things like zwave, ZigBee, or lutron.

They also have support for some wifi based things that are still local-only, but I tend to treat them as hostile and firewall them off, and even then I don't really like having them as they tend to be made by unknown manufacturers and have pretty shitty first party apps.


I don't know much about it, but Mozilla is working on this:

https://blog.mozilla.org/blog/2017/11/29/announcing-the-init...


There are several smart bulbs that people have reverse-engineered enough to put open-source firmware on them. They'd still be using wifi, but you could use firmware that doesn't require any "cloud" or external connections.


Homeseer, it can work with multiple protocols. I use Insteon, you can just buy a USB insteon modem and skip the cloud hub.


have you heard of https://mycroft.ai/ ? Opinion?


It still (by default) uses cloud STT, and there's no good local STT that's anywhere near simple to set up. It's hardly any better.


That's a good point Vendan, and that's a dependency that we're working pretty hard to remove. Here's our CTO, Steve Penrods, on our STT roadmap: https://mycroft.ai/blog/mycroft-speech-to-text-and-balance/


Smartphones are vastly worse, since they’re always with me. Once I’ve accepted that, putting a smart speaker in my house doesn’t make things noticeably worse.


Normalizing the use of always-on microphones in previously "private" areas (such as the home) such that people expect that their voice might be recorded and transmitted to a 3rd party removes the warrant requirement for the police to also use that technology. (Kyllo v United States)

https://news.ycombinator.com/item?id=16061681


The more devices you add (giving a larger number of unknown entities access to information about you), the higher the risks.

Also, the solution to the smartphone problem isn't just to passively accept defeat. That will allow the problem to become worse.


Passively accepting defeat may not be “the solution” but it’s what approximately everybody does, including approximately everybody complaining about smart speakers.


Yes, that's the problem that needs solving.


Unfortunately, every story about smart speakers is flooded with comments like yours, saying that they're evil and they can't understand why anyone would ever buy one, while stories about smartphones never get any comments like that. If smartphones are the problem, why are you complaining about smart speakers instead?


I comment about smartphones all the time, including at the top of this thread. :)


Yeah, it sucks having a powerful computer in my pocket all the time that can handle nearly all of my regular tasks. Wait, what?



My phone does a much better job of redefining "normal tasks" than it does solving problems I had before smartphones.


[flagged]


> having nothing to hide

Everyone has things to hide when malicious people control the machine. Even with companies who you might view as benevolent, you don't know who will control those companies in the future.

> ...understanding the technology that goes behind keeping your identity obfuscated from advertisers even with intense personalization.

I understand it. The idea that personal data can be easily anonymized is a myth. Also, I've worked for people who turned out to have malicious intentions, so I'm sure that many of the companies/people collecting that kind of data don't have good intentions.

One quick example of how even companies that talk "security" don't really practice it sufficiently: Equifax. Another quick example: look at all the Android apps that are completely abusing users' data.


I'll bet most people who have "noting to hide" could be motivated to reconsider if someone went through all their private correspondence and reported the results to them.


I would suggest that most people who think someone is going to spend their time doing this to get over themselves.

If you are notable, intensely rich, or some other special person...


> I would suggest that most people who think someone is going to spend their time doing this to get over themselves.

No person is going to personally do that. We have computers and increasingly more advanced 'learning' algorithms to do that for us, and to help us draw conclusions (correct or not) about the person of interest.


I agree that large-scale machine learning is a bigger risk, but there are people who personally creep around in others' personal data.[1]

Also, it doesn't need to be just a few people looking at your private data. One early worst-case example is Ashley Madison[2], but it's going to increasingly be a problem for society in the upcoming years, especially with the expansion of technologies like face recognition, voiceprinting, license plate cameras, etc.

[1] http://www.telegraph.co.uk/technology/google/8003925/Google-...

[2] https://en.wikipedia.org/wiki/Ashley_Madison_data_breach


Maybe in the future you will be one of those things, and your data will still be waiting for you. Unless you’re special enough to see into the future I guess.


Yeah, or you just piss off (or just attract the attention of) the wrong person or group. I can't imagine Ken Bone envisioned someone combing through his entire Reddit history or Justine Sacco imagined her joke would lead to people tracking down her employer and getting her essentially blackballed from her profession. And that's just nobodies working with only public information.



I had heard about that specific incidence during WW2 a few years ago. I can't help but think of that every time politicians and special interest groups push the US Census to collect more and more personal information from citizens.


>Also understanding the technology that goes behind keeping your identity obfuscated from advertisers even with intense personalization.


Which is laughably easy to work around. We've known that for years.


Why do you carry a GPS tracker around? Why do you put wireless in your home? Lots of things are easy to question until you step back from your single point of view and realize a large population out there doesn't live with a tinfoil hat on. It is not because they don't care about the same security problems you do, it is because they are naive and don't understand how stuff works or what the problematic implications of using them are.


It's also because this isn't a problem well-suited for addressing by individuals. I doubt even Richard Stallman audits every line of code he ever runs on his computer, and I doubt most of us would want the extreme level of inconvenience that comes with living his life.


Which is it - inconvenience or comfort? I don’t see how you get both here? If you don’t want something listening don’t buy it. If you don’t want tracking enabled on your phone that you feed to cloud services don’t use it. What am I missing ?


You're right, the Bill of Rights only protects criminals.


When did I say anything about the bill of rights?


"God, please end poverty in the world"

"I'm sorry, I didn't get that. I'm sorry, I didn't catch that."


Heh. To be fair the conversation in question was a lot more personal than that -- matters of crime, addiction, forgiveness, and personal change.

The "nothing to hide" arguments are irrelevant here -- people absolutely do have things to hide. This is where people seek advice about lawbreaking before they are ready or able to discuss with authorities. We can't have even the pretense that something could be listening in on these convos.

It's appropriate to blame the church for having the listening device.... but as technologists we know that off-the-shelf products don't give them much choice without technical proficiency/awareness to reconfigure the defaults.

Edit: and I think I interpreted your joke as you intended; no offense taken.


"For ye have the poor with you always, and whensoever ye will ye may do them good."


Does taking potshots at the OP's religion add anything at all to the discussion?


Huh? It's an amusing joke. How is it taking a "potshot" at his religion?

Imagine being in church, praying to God for the betterment of humanity, and then you hear Him answer you back, filling the church with His voice, but...it's just Cortana with an inane canned response.

It's funny, and not in an insulting way, except perhaps if you're on the team working on one of these products.


I guess I could see how you might be amused by that reading, but the way I imagined it was more like ridiculing someone who attends church with a joke about how prayer is futile.


Was this Cortana, or Alexa?


Cortana, didn't mean to conflate it with Alexa -- more the product category as a whole and its (presumably) unintended consequences.


My echos answer the TV all the time whenever it hears phonemes remotely similar to Alexa like "electric". Amazon doesn't allow users to set their own arbitrary wake words, which would fix the problem, and the only other two wake words they allow, "Echo" and "Computer" are much more common than Alexa so they're useless.

I also have a Google Home and it never, _ever_, answers by accident. "Hey Google" doesn't sound like other phonemes. Simple as that.


Have you gone through voice training? I have not had this problem after that.


Yep, doesn't fix it for me.


I unplug it when I don't want to listen to music or put together a shopping list.

Also, I wouldn't have one if it wasn't given as a gift. I can't believe these are even for sale. They are clearly not ready for consumers. Nearly 90% of the commands I try to give result in Alexa not knowing or misunderstanding. Why should anyone pay to beta-test the product?


Security minded people should install more physical switches to outlets for this type of thing.


Wait, in the UK all outlets have to have physical switches. Is it not like that in the US?


Nope. Switched outlets are pretty uncommon in the US. (I've seen them, but not recently.)


Is that actually a requirement? I know it's the norm, but my house still has plenty of unswitched outlets.


In old houses some are unswitched, but modern ones are always switched I think.


I'm curious what those use cases are in the 90% that don't work? I don't use mine a lot but I also don't seem to have the trouble you do.


> I unplug it when I don't want to listen to music or put together a shopping list.

Do the Echo devices not have a hard mute switch?

I have a Google Home mini and it's usually on mute unless we want it for the same reasons, or to play trivia.

It is muted for weeks at a time, sometimes.

Of course I wonder how much of a hard mute it is. If you touch the thing while it's on and muted it will kindly remind you it's muted and can't hear you.

It would be nice to gain more insight.


Yes, you can mute an echo and it will glow red.


No programmatic interface to do it as far as I know.



   I unplug it when I don't want to listen to music or put together a shopping list.
...which is when the device stores everything it hears, ready to send it to the mothership as soon as connectivity is reacquired.

No idea whether it already does this but a small battery and a bit of flash go a long way when it comes to overhearing that which was not meant to be overheard...


> Why should anyone pay to beta-test the product?

I guess the pioneer of that was Microsoft and Windows 95.


I have been waiting for some kind of audio fingerprinting for security systems for years. Though personally I'd prefer a training model which would also let you teach the device other sounds (like "this beeping means the stove is on"), I'd rather be able to set up notifications for those, but even having my phone tell me "I heard breaking glass at home" is a start. Otherwise I don't see much here that is new, presumably there's still no voice recognition on the non-wake-word speech (it's just noticing there are people around, talking to each other, or that you talk to yourself, your pet, etc).


In the community I live in, we think people have gone insane by marketing or technophilia if they allow one of these in their home and we are trying to back away from cell phones.


What community is this?


where are you living?


i think the article misses some details. from what i understand, here's how alexa operates in the superbowl ad context:

- ad plays on tv

- alexa hears its name

- alexa sees that name contains the digital fingerprint

- (maybe alexa sends this to amzn servers so they know how may people r watching the sb & have alexa on)

- alexa stops listening for additional commands

therefore alexa is still only listening for its name


You missed a few points. In order for "alexa hears its name" to work, Alexa needs to be able to hear sounds and actively process them.


The echo has a separate processor that listens for the wake word; when it hears the wake word it fires up the main processor to start doing the actual audio processing. Some simple commands are processed directly on the Echo without going to Amazon's servers but the rest are sent over the Internet to be processed.


From my experience reverse engineering the first generation Echo, there was no coprocessor. However, wake word detection was done offline. There was a software controlled hardware switch to disconnect the microphone when it was muted.


I don't know how accurate this is but this what I found on how the Echo works:

"

Echo is built on Texas Instruments DM3725 Digital Media Processor.

This TI SoC has two key pieces inside, first is ARM Cortex-A8 MPU, and the second one is TMS320DM64x+ DSP. The ARM core should be running Linux and the DSP is running firmware.

When idling, the ARM core is taken to lowest possible power state and Linux is completely suspended. At this time the DSP and 64KB On-Chip RAM are active. The DSP firmware processes noise coming in from the mics and attempts to identify if a keyword (e.g., Alexa) is spoken. As soon as it identifies there's a keyword, DSP sends an interrupt to wake up the ARM core which in turn resumes Linux.

"


Which she does locally...no server required





Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: