{"author":{"name":"Ray Wang","slug":"ray-wang","article_count":1,"latest_published_at":"2026-05-06T17:29:40.192+00:00","profile_url":"https://vff.ai/authors/ray-wang","api_url":"https://vff.ai/api/authors/ray-wang"},"articles":[{"slug":"cost-effective-deployment-of-vision-language-models-for-pet-behavior-detection-o","title":"Pet Camera Startup Cuts Inference Costs with AWS Inferentia2","url":"https://vff.ai/article/2026/05/06/cost-effective-deployment-of-vision-language-models-for-pet-behavior-detection-o","content_type":"aggregated_news","summary":"Tomofun, maker of the Furbo pet camera, migrated its vision-language model inference from GPU-based EC2 instances to AWS Inferentia2 chips to reduce costs while maintaining real-time pet behavior detection at scale. The company deployed the BLIP model on Inf2 instances using the Neuron SDK, allowing it to handle continuous inference workloads across hundreds of thousands of devices without rewriting existing PyTorch code. The architecture uses a two-tier Auto Scaling setup that can route requests to either GPU or Inferentia2 backends in real-time, providing both cost efficiency and high availability.","published_at":"2026-05-06T17:29:40.192+00:00","updated_at":"2026-05-06T17:29:39.662331+00:00","source":{"url":"https://aws.amazon.com/blogs/machine-learning/cost-effective-deployment-of-vision-language-models-for-pet-behavior-detection-on-aws-inferentia2/","name":"AWS Machine Learning Blog"},"featured_image":{"url":"https://compote.slate.com/images/ca61f102-33b2-488a-9151-35fe0e95d407.jpeg?crop=1560%2C1040%2Cx0%2Cy0","alt":null},"categories":[{"name":"Multimodal","slug":"multimodal"},{"name":"AI Hardware","slug":"ai-hardware"},{"name":"Infrastructure","slug":"infrastructure"}]}]}