ENGtechnica.TV
Bringing technology into focus. We talk to leaders with technology of interest to engineers.
ENGtechnica.TV
Seeing Around Corners? We Can Do That, says Tristan Swedish of Ubicept
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
We dig into SPAD photon-counting sensors and why converting photons into bits can change low-light imaging, motion blur, and dynamic range for real perception systems. We also connect the physics of uncertainty to modern AI so vision models stop guessing when the data is ambiguous.
• How SPAD arrays work and why they differ from CMOS image sensors
• Why SPAD data volume can reach terabytes and how near-sensor processing makes it usable
• Photon noise as a physics limit and how integrating over time improves confidence
• Motion compensation that enables longer effective exposure without motion blur
• Why night color is often a noise problem, not a missing-color problem
• How uncertainty-aware imaging can reduce AI hallucinations in computer vision
• Automotive and ADAS use cases plus reflections and extreme dynamic range
• What “seeing around corners” means using time-tagged light echoes and real constraints
Welcome And Seeing Around Corners
RoopinderWelcome to ENGtechnica TV, where we bring technology into focus by talking to leaders with technology of interest to engineers. It could even see around corners. Welcome to the show. Thank you. It sounds like it's groundbreaking technology that you're seeing around corners.
Tristan SwedishIs that a postdoc and I was working on my PhD. So I was at the MIT Media Lab and Sebastian was at UW Madison. And we both were working on a project to see around corners. And the idea was that we could bounce light off of walls, time how long it took for that light to bounce around. Think of it like echolocation with sound, but with light, because turns out there's this type of sensor called SPADS, which are single photon avalanche diodes that allow you to time tag light extremely precisely. They're used in LIDAR and other ranging, and you can use that data to see around corners, like echolocation with light.
RoopinderMadison? Oh by the way, I spent some time at University of Wisconsin, Milwaukee. So we have some city. You guys are at where now? Boston now?
Tristan SwedishYeah, so we have we're split between Madison and Boston.
RoopinderOkay.
Tristan SwedishBoston, the Cambridge area? Yeah, so our offices are in downtown, right across the river from the Cambridge area.
RoopinderOh, okay. Okay. Yeah. Okay, good. Good. That's a good place to be. That's I'd say the second most important tech corridor. Uh I happen to think Silicon Valley is the center of the universe, because that's near where I live. But Boston's right up there. Okay. So all right. So the concept of this a SPAD is a say next generation types chip, then CMOS, would you say? Is that the Yeah?
Tristan SwedishSo we we think that SPADs are have the properties with the right processing to potentially replace CMOS for many applications. They're currently already used widely for specific applications like LIDAR. So SPAD arrays in particular look a lot like image sensors. They have pixels and they count, they collect light in a slightly different way than CMOS images. I mean, we can I can show you a little later that you can turn those sensors into imagery that's better than CMOS for primarily perception applications, but general purpose imaging as well.
What SPAD Sensors Actually Do
RoopinderOkay. So they're using LIDAR units right now. Okay. All right. Like automotive autonomous vehicles might use them, robotaxis might use them. Uh I guess LIDAR sensing equipment that's used in the field to do terrain mapping or something that could use them as well.
Tristan SwedishExactly. So it might be worth kind of taking a step back to describe how spads work. Yeah. And then how they're used today and how we're using them, because we're using them in a slightly unconventional way. Okay. But the original spad pixels, a few decades old. It's not a brand new technology. Most of the spads that existed until about 10 years ago were single pixel, though. They were a single photodiode. And the way they work is that they're sensitive to a single photon, which means that when a when a photon hits the silicon, it's all made out of silicon, it's called an avalanche diode because that's a reverse bias diode. And essentially it's kind of like you can think of all these electrons are right at the edge of the breakdown voltage. And when a photon hits the piece of silicon, there's this huge current spike. So it's right at the edge of kind of uh criticality or breakdown for the semiconductor. So damage the semiconductor, correct? And it can if you let that run away. So the big innovation with spads is that they that as the current electrons start going, you can collect that energy and recharge it. And so it doesn't damage the sensor with the right electronics. And that's really how SPADs have worked for decades is uh they are very precise sensitive uh devices used in microscopy, uh used in LIDAR. And because they're single photon sensitive, uh they emit this pulse with each photon count. And uh the innovation about 10 years ago, 10, 15 years ago, was that you could create an array of these pixels. So not just a single pixel, you could create an array, just like an image sensor, where you have a 2D array. And with each pixel, it can count a single photon, and that single photon is converted into a bit. So one way that we like to think about it is that spads convert photons into bits. And once you're in a bit, you can use all your digital logic and everything from there.
RoopinderGot it. A lot of bits, right? I'm reading now, did a did a tiny bit of research on this, and it can be like a terabyte of information, right? It can be can it's that much. So now Ubicept doesn't correct me if I'm wrong, it doesn't make the spads, right? It processes the information from spads. Is that correct? That's right, that's right.
From Single Pixels To SPAD Arrays
Tristan SwedishSo uh the spad will be triggered, we could turn that into a bit, and then it's reset. And the way that spads are designed means that they can be reset very fast within nanoseconds. And that means that in practice, you could count billions of photons, millions of photons per second. In practice, like for in most applications, you're running these spads at kilohertz frequencies. And you can imagine that at a full resolution image sensor, let's say one megapixel, so a million pixels, if you're capturing tens of thousands of photons, you can easily achieve terabytes of data. And uh so the raw sensor uh emitting this uh bitstream in a naive way is impractical to use in most applications. This is a ton of data. And so what we've done at Ubicept is come up with ways because these photons turned into bits to convert those bits into representations very efficiently using digital logic next to the sensor and reduce that bandwidth. So it makes it feasible to treat these sensors uh like a conventional image sensor, because at the end of the day, what you get is an image-like representation.
RoopinderIs this recording a still photo or is it recording a video?
Tristan SwedishThat's an excellent question. And I think it it highlights the difference about between spads and conventional imaging. So in a conventional camera, you typically have an exposure time. You you set the camera up, you set a certain amount of time that you want the camera to be sensitive to light, and it gives you an image. And even for video, what you're doing is just capturing exposures as fast as 30 frames per second, 60 frames per second. Whereas spads are streaming data constantly. Yeah. You can treat them like a conventional camera, but they spads are capturing light and we're similar to our own eye. As we as photons are detected, you can do something with them.
RoopinderOkay.
Tristan SwedishSo there is a blurriness between frames and videos with spads that you can select between.
RoopinderA lot of photons, like the spad. It's hitting my eye. My brain is processing them into objects, right? Into useful information. I my eyes alone are not enough to be useful, right? I need to process that image. Similar? Is that span? Yeah, software is doing that. Exactly, exactly.
unknownRight.
Tristan SwedishI would say the analogy is really close to how our own visual system works.
RoopinderYour span must have to filter out a lot of noise too, right? Because a lot of those photons are just erratically moving around, and you can't make sense out of them, right? So they may not be part of any object, that detectable object, right? How do you do that? How do you filter that out that noise?
Taming Terabytes With On-Sensor Processing
Tristan SwedishSo that noise, the photon noise, is fundamental to physics. When you're in a dark room and even outside in really bright light, all the information you're getting when you when you perceive an image or you perceive the scene in front of you, is a result of these photons hitting your retina. And in a dark room, those photons don't arrive very fast. And so that means that there, and this is like a this is goes down to quantum mechanics, the arrival time of those photons is inherently unpredictable. So in a short period in time, and you want to know what the image looks like at this precise moment in time, it's the image is going to be noisy, and it has to be noisy because there's just inherent uncertainty. In order to create a nice image, you need to have, you need to collect enough photons. So for a conventional camera, that's your exposure time. And if you're a photographer, you might be familiar with nighttime photography. You would you can make your adjust your lens as best as you can, but really the knob that you have available is increasing the exposure time. So night photography typically requires that you keep your camera very still for a few seconds of exposure time or longer in order to collect enough photons. And that's a big part of our own visual system. We talked about our eyes not enough. Our retina is doing a ton of work, and that's the part of our back of our eye that's capturing light, but also doing a little bit of processing. It's technically part of the nervous system. It captures light and then transmits to the brain. And the brain is also doing some processing. And our approach is similar, that as photons are arriving, they are the image is noisy. And that's we can't get around that. However, we're able to use the right algorithms and processing to convert that stream of uncertain data in a short period in time and then aggregate it over time until we're more certain. And that allows us to create a nice image that's filtered, but also allows us to pass on that uncertainty to downstream perception. If I'm in a dark room, I'm speaking for myself, but I think it's a common experience, you kind of can tell that you don't really know what you're looking at. But it's a dark image, but maybe if you stay still, you start to see a little bit more detail. You move slower. That's the kind of effect of your brain understanding that inherent uncertainty of the arrival time of light over a short period in time. And processing does something similar.
RoopinderYou mentioned staying still, and that's very key. Now images, uh the objects that you're sensing may not cooperate. They may have uh image blur because of that. How do you can you filter that out too? Or is that a Yeah, absolutely.
Tristan SwedishThat's actually a really important piece of making our technology work. So with a conventional camera, you set your exposure time, and if there's anything that moves, you get motion blur. Our technology, because we're capturing this light in these kind of digital bits as they come out of the sensor, we're able to compensate for that motion dynamically. It allows you to create an effectively longer exposure time while undoing motion blur. I can give an example of what that would look like in practice. You can see this. So yeah, Zoom's not going to be full resolution or full speed, but this is running about 40 frames per second. Okay. And we're going to see how well this works. But I think the key is that we're running our processing. This is a SPAD camera that you're seeing in real time. And what I'm running now is our motion compensation. So you can see that as I move my hand, my you can actually see the details of my hand. And what I'm going to switch to is a simulation of a camera with the same exposure time. So if I can I need to use the other capturing from this what is the time increments I'm seeing your hand move at?
RoopinderLook like a fraction of a second.
Photon Noise And Confidence Over Time
Tristan SwedishYeah, so it should be about 40 frames per second. We'll see how this capture system works. But what we're showing now, what I just switched to, was a conventional camera at about a 30 milliseconds. So this would be a 30 frames per second exposure time. Oh, it's blurry. And this is really blurry and you can't see much of the hand detail. If I switch and I'll switch to our processing just right now, you can see that you can see the details of the hand. So that's really the high-level idea, is that we're able to capture 30 milliseconds and you can see all the detail. So if you're a robot, if you're a self-driving car, you're in an industrial inspection environment, what you really want to see is a sharp image. You don't want it to deal with motion blur, but in low light, you need to have that long exposure time. In a conventional camera, your perception pipeline, let's say you're using this in an autonomous vehicle, it might be doing its perception at 30 frames per second. And actually that's pretty high for your typical eight-ass system. But let's say that you're able to run your computer vision algorithms, your perception algorithms at 30 frames per second. In practice, the exposure time needs to be short enough that things in front of you don't look blurry. And so at night, the longest that you would set your exposure time is somewhere around 10 milliseconds. And that means that two-thirds of the light that would normally you would normally make use of are completely dropped. You don't actually use those photons because in a conventional camera, if you made an exposure time at 30 milliseconds, it would be too blurry. What we're able to do is capture light and compensate for that motion over that full 30 milliseconds, but then undo all the motion blur.
RoopinderAnother thing you lose at night, though, is your color vision, right? And I imagine your software is able to put color back into C.
Tristan SwedishYou know, it's interesting because people will say that you lose color at night. There's a few different reasons for that, but in reality, really what they're referring to is that it's too noisy to see good color. If you're a photographer and you set your exposure time long enough, you can resolve color, especially if you compensate, say it's moonlight or you know what your illumination source is, that you can see color at night. We're not used to seeing color because our own eyes are much more sensitive to motion than color uh at night. But a camera system doesn't have that limitation. So the real problem is not that there isn't color, it's that the image is too noisy that when you look at a conventional camera that's captured with color, it just doesn't look very good. So we're able to resolve colors at night just by being able to integrate more photons, more light is combined together.
unknownYeah.
RoopinderTell me how AI comes at because honestly, we can't have a conversation these days without talking about it.
Tristan SwedishOh, absolutely. The we started a company, it's we're Ubicept, and that stands for ubiquitous perception. So this was we started a company a little over four years ago, mentioned that we were familiar with these spat arrays, and we were looking at where is perception going? These sensors that we were using can capture at the limit of physics. So no matter how good your AI is, what you're able to do is limited by your sensing, your perception. And so what we were thinking is there's this classic adage in AI, which is garbage in, garbage out, that if you give high-quality data to a perception system, it's going to work better. And there's deeper reasons that if you give a noisy image, there's just inherent uncertainty that AI systems are trained on lots of data. And so when they see an image that they're not so sure about, they use the data that they've been trained on to try to fill in the gaps. And we often call this hallucination. And I think people are familiar with this idea that AI, if you talk to ChatGPT or Claude, it will make things up. It's gotten a lot better, but it has a tendency to make things up. And that's also the case for imaging AI. So AI that is capturing or processing image data. When there's some uncertainty, it tends to just guess because of the nature of how they're trained. With our approach, we're able to capture this raw data, but also measure how uncertain we are. And that allows us to eliminate things like hallucinations. So really, it's not, we didn't build a system that is replacing AI. We built a system that gives the best data to AI better matched. And we talk about the eye and the brain. You can think of AI as the high-level part of your brain that's thinking, that's contemplating the scene. What Ubicept is building is more like the retina and the low-level visual system that's running fast subconsciously to give you the best image you can that you can then use to do downstream analysis. You're working with autonomous vehicles, right? So we're working with in the automotive industry generally. Uh the there's a few different use cases for imaging in the automotive industry. There's, of course, ADAS, which is the automatic systems for safety uh and for things like lane assist and so on, and eventually self-driving. There's also applications for being able to understand what's going around the car for the human driver. So, like a backup camera, or even detection systems inside the car for driver drowsiness detection.
Motion Blur Fix And Night Color
RoopinderAm I going too far from what Ubicept does? I imagine that's something that the automotive company may have to handle is the object recognition and behavior of that object. Is that person across the street? Is that that's behavior? That's probably outside the realm of the Ubicept, right?
Tristan SwedishSo exactly, exactly. We don't create the computer vision models themselves that do the detection, but what we are building is the uncertainty-aware layer that can be passed to those models. How is that uncertainty measured? Something that's absolutely amazing about photons that we get from SPADs, and to some extent with conventional cameras too. And I should mention that our processing, we started with SPADs and we realized that we can actually process conventional CMOS data as well. Photons have there's a you can derive actually what how certain you are that there's a certain object in the scene given the contrast. So there's the you can write out the the physics and say, okay, I'm able to detect something that has this contrast level and this much light because I've counted this number of photons. So because we have the photon counts, we're able to know with what level of confidence we're able to see different features. And computer vision algorithms that are trained on data to detect objects and so on are based on looking at features in the image, looking at things like a corner or an edge at a very low level. And then those are combined over the within the neural network to say, hey, this is a bicycle or a pedestrian. But that contrast can be uh derived. And so we can actually say, here is our uncertainty with this region of the image that that can be correlated with the detection itself. The detection piece is really the domain of the computer vision model where it's been trained on a ton of imagery and images. Uh what uh really we're perceiving is an image of the scene. So we have a we have an array of pixels, and we're trying to see how bright is each point in the scene. And the computer vision models aren't able to tell, and actually, any imaging isn't able to tell. There must be an object there. But what you do see is that there's contrast between that object in the background, for example. And so by looking at the shape of the edge of the object or the different ways that the features are composed within the object, the computer vision model is trained on a large library so that it's able to find those correlations.
RoopinderThat's outside the realm of Ubicept. Exactly. Okay, right. But you're detecting contrast, contrast in lighting in from one spot to the other spot. Can that be fooled by a reflection?
Tristan SwedishIt in the reflection is a challenge for general imaging. And the challenge is not necessarily being like fooling the system, it's that most imaging systems have a limited, what's known as dynamic range. So the ability to distinguish between what is that contrast level you can see. And with a reflection, if you're looking at night, for example, there might be a really bright headlight. Your floodlights will might be on in some area, and that reflects off a car. And then that goes into your imaging system, and it just looks like a really extremely bright. With SPADs, because we're counting these photons, and this is a little counterintuitive, you'll have to take my word for it, but it turns out that because we're counting photons, we can see a really large dynamic range. That when there's a really bright light, we don't have to count every photon, but we also don't become saturated. We are able to see both really bright areas and really dark areas. And that dynamic range is huge. Can your system look at an image of the solar eclipse? I think we maybe could. The dynamic ranges that we've demonstrated are dependent on a number of things, but the numbers that we've measured are 156 dB. So this is a logarithmic scale. The human eye is Roughly around 100 to 210. Every three decibels is doubling, correct? Yep, exactly. So you this is an enormous range. How many orders of magnitude more than CMOS? CMOS has done a lot to try and use various methods to increase their dynamic range. There's also a debate about how you make that measurement. So I don't want to I don't want to say here's how much it is for sure, but the the best CMOS sensors I've seen advertised are between 120 and 140 dB dynamic range, which are doing additional processing. It's not just a single sensor like or capture like you can with SPADs, but it's combining multiple frames that allow you to get that larger number. A conventional CMOS sensor, just a single frame, is about 80 dB.
RoopinderGot it. Hey, what have I got in here? Is this a CMOS?
Tristan SwedishYeah, so almost certainly your smartphone is using a CMOS sensor.
RoopinderYep. Okay. All right. Awesome before I can have a SPAD in there.
AI Hallucinations And Uncertainty-Aware Vision
Tristan SwedishOkay, there's two problems. First problem is we need to solve the data problem. And we think we have that solution. If you want to put SPAD in a smartphone for imaging, then you want a high-resolution sensor, and you need to be able to capture and process that data. We think we have a good solution. So now it's just a matter of time. But I will say, I have uh an iPhone Pro that this lidar on the back, this is a single shot lidar. There's actually a spad array in there.
RoopinderYeah, same. Yeah.
Tristan SwedishSo that is so in some ways you already have a spad. Now it's super specialized for depth detection, essentially LIDAR. Uh but it's not infeasible that you could have these spad arrays in a smartphone, foam factor.
RoopinderWe started talking about this, and I still don't get it. Seeing around corners.
Tristan SwedishI think the best way I can try to explain it, and I'll try two different ways, uh, because it is super non-intuitive. Uh my my PhD advisor, Ramesh Rasker, actually was one of the people who proposed this idea uh almost 20 years ago, and no one believed that it was possible. And in the intervening time, there's been quite a bit of work that's shown that you can do real-time reconstruction using light around corners. The simple way I like to think about it is that it's like sonar. You send out a pulse of sound, it propagates through the environment, and then it echoes as it reflects off of the different parts of the environment, and then you measure that echo, and you can reconstruct a uh a view of the scene. With uh light, you can do something similar. You can shine a point of light onto a wall, and then that point of light scatters off of that point everywhere. And that's like your sonar pulse. It goes and hits things in the hidden scene. And uh I think maybe it should try to uh provide a visual here, a verbal visual. You can illuminate a point on your wall with a laser pointer. If you can see that laser point, some light is hitting you. So you it's reflecting in all directions. Yeah. That light, you think of a small pulse, hits hits the wall, it uh reflects everywhere. And imagine that this is like a sphere that's expanding over time. And there's this wave front. As it hits objects, uh you get a reflection from those objects. So that's the second bounce, and it goes back to the wall that you're observing. On a different position of the wall, you have your spat array, and you have a number of different uh pixels that are focused on that wall. And you can see these reflections, these echoes going across that spat array. Because the spat array is able to capture uh imagery so fast, you can time tag light with a very high precision within a few hundred picoseconds. And that's uh that's like a a few a little less than a centimeter uh in distance. And by looking at that data, you can reconstruct just like sonar the hidden scene. So that's you know, maybe this this is another way to think about it's basically it's very similar to uh CT or computed tomography. Yeah.
RoopinderYou finally did it. Thank you so much. I finally get it. Okay, okay. Tell me if I got it right though. Okay, I can see that spot on the wall where the where the laser is hitting it, right? That wall can also see the laser. So I'm looking, I'm like eyes on the wall, I can see the laser.
Dynamic Range Plus Real See-Around-Corners Uses
Tristan SwedishOkay, it proves that if you can see the laser point, yeah, then we can in in in in theory, you know, there's uh some practical limitations, but if you see the laser point, in theory, that means light is hitting you and reflecting back to the wall that the laser point is projected on. And that means that there's a you're affecting the way the light is bouncing off of you. We've looked at the different applications for seeing around corners. And first responders is the first thing we're looking at is when a building can you see around reflecting in a building to see if there's a fire, if there's someone in a room, for example. I think the coolest application that came out of one of the professors that was involved in in our founding, Andreas Velton, worked on a proof of concept for using space-based methods for looking at lunar caves. Uh and so, of course, there's broad applications from defense, security, first responder, and even exploration and surveying. And so, as a technology for being able to see around corners or see off of reflected objects, there are a lot of use cases. There are practical challenges too. The main practical challenge is that you need a lot of light. So you need uh a pretty powerful laser. And so that limits how portable these systems are today. That said, we started this company with that in mind. And what we realized is that if we turned the laser off, we can do a ton. And that's kept us busy. At some point, uh, I think we'll turn the laser back on.
RoopinderUbicept has, I think I saw it on the list of uh million-dollar companies already. So it's making money, it's got customers, right? But it's still quite a bit of your money must be devoted to research still.
Tristan SwedishWe're right at this point where we think we've solved all the fundamental challenges. We have a really good idea of what the architecture should be for processing SPAD data, solving that compression problem, and then solving a real use case. But like any new technology, there's a ramp up time to bring the costs down for production. And so we're now past the phase where we have technology uncertainty, and we're now moving towards reducing those costs for certain markets that leads to production. And we see a path there for SPADs. And I think it's worth noting that part of our revenue is not just today, is not just from processing spads, which are there are some challenges that we've overcome technically. And now there's the market challenge of bringing those costs down. But our processing works today on CMOS cameras. So that that is an immediate way that we can provide value in the short term.
RoopinderAdvice from an old journalist. It's like the okay, we get all this new technology. This is the greatest thing since sliced bread, right? From every multiple sources a day, right? But I gotta say, if you lead with seeing around corners and you can explain it as well as you did today to me, I think that's a clear shot into through the into the door. People will be saying, wow, this is groundbreaking stuff. Great talking. I wanted to leave with one final question. What do people say you look most a movie movie star?
Costs, Deployment Path, And Closing
Tristan SwedishI I do need a haircut. Went into the hair salon and I felt like I came out looking a member from Oasis. So I don't know about movie star, but I think uh British rocker tradition is uh thank you so much.
RoopinderThanks for the explanations. It was great chatting with you, Roopinder. All right, all right, talk to you soon, I hope. Okay, bye bye. Thanks for listening to Edge Technica TV. If you'd like to tell your story on this podcast, contact me Roopinder at engtechnica dot com or message me on LinkedIn.