Deep Learning para carros autônomos

Mensagem copiada/colada da newsletter The Batch do dia 29/04/2020

Tracking the Elusive Stop Sign

Recognizing stop signs, with their bold color scheme and distinctive shape, ought to be easy for computer vision — but it turns out to be a tricky problem. Tesla pulled back the curtain on what it takes to train its self-driving software to perform this task and others.

What’s new

Tesla AI chief Andrej Karpathy describes in a video presentation how the electric car maker is moving toward fully autonomous vehicles. Shot at February’s ScaledML Conference, the video was posted on YouTube last week.

Not just a big red hexagon

Stop signs take a surprising variety of forms and appearances, and that can make them hard to identify. Rather than an oversized icon on a pole, they’re often waved by construction workers, hanging off school buses, or paired with other signs. Karpathy describes how his team trained the company’s Autopilot system to detect a particularly vexing case: stop signs partially obscured by foliage.

  • Engineers understood that AutoPilot was having trouble recognizing occluded stop signs because, among other things, the bounding boxes around them flickered.
  • Using images from the existing dataset, they trained a model to detect occluded stop signs. They sent this model to the fleet with instructions to send back similar images. This gave them tens of thousands of new examples.
  • They used the new examples to improve the model’s accuracy. Then they deployed it to HydraNet, the software that fuses outputs from AutoPilot’s 48 neural networks into a unified, labeled field of view.

Behind the news

Tesla is the only major autonomous driving company that doesn’t use lidar as its primary sensor. Instead, it relies on computer vision with help from radar and ultrasonic sensors. Cameras are relatively cheap, so Tesla can afford to install its self-driving sensor package into every car that comes off the line, even though self-driving software is still in the works. It’s also easier to label pictures than point clouds. The downside: Cameras need a lot of training to sense the world in three dimensions, which lidar units do right out of the box.

2 curtidas