Keynote – Running anomaly detection at Scale!
We all know how to create ML models, but the path to turning them into a highly scalable easy to use system by users is not always clear. What happens when you need to run thousands of them, on many different datasets, simultaneously and at a huge scale? Will your users trust your results? If they don’t, will they even use this system that you’ve spend so much time to build?
AND, how do you build it reliably so you can sleep well at night?
To achieve exactly that, we’ve decided to go down the serverless route and build an anomaly detection system on top of it. We’ll go over the advantages for scale and reliability of building such a system using serverless, and what you need to do to build user trust.
Our Spotlight anomaly detection system is capable of easily reusing ML models, and scale to run millions of datasets simultaneously with ease. Our system eliminates manual work and allows our end users with no scientific background to set up anomalies to detect, getting alerts in no time.
In this talk, I’ll walk you through the architecture, critical points to pay attention to, and share useful ideas you can adopt and implement in your own projects.
Talk language: English
I am a director of data engineering at Nielsen.
My group builds massive data pipelines that are cost effective and scalable (~250 Billion events/day). Our projects run on AWS, using Kafka, Spark, Aerospike, serverless Lambda functions, Airflow, OpenFAAS, Kubernetes and more.
I am passionate about new technologies, data, algorithms and machine learning. I love to tackle difficult problems and come up with amazing solutions.
I have 4 patents in the area of security, and lots of ideas for more...