There simply isn’t enough labeled data to train a model for each of these. The long tail: There are a massive number of poses people can be in when picking an object off the shelf, especially when you consider multiple customers in close proximity. ![]() To solve for this, the system needs to count all the items on the shelf rather than using a simple assumption based on space. Instead, a customer put an item back and pushed the remaining ones further back on the shelf. The obvious answer to the question is that an item was taken, but this is incorrect. One of the problems in this area can be seen in the picture above. They showed that it can be used on any video clips to aid in solving many other problems that rely on pose estimation.Īction determination: to avoid charging customers for items they didn’t take, the system must accurately account for a world where the customer can put items back on the shelf. This model is extremely interesting in and of itself. It uses a CNN with a cross entropy loss function to build the joint detection point cloud, self regression for vector generation, and pairwise regression to group the vectors together. The problems experienced in this phase were:Ī novel new Deep Learning model was needed to build an articulated model of each customer from the video. Linker: The next task was to ensure the labels are preserved across frames in the video, moving from locating to tracking the customers in the store. Finally, they build a location map from the frame using triangulation of each person across multiple cameras. ![]() From there, they segment the images into pixels, group pixels into blobs, and label each blob as person/not-person. To address these problems, Amazon uses custom camera hardware that does both RGB video and distance calculation.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |