diff --git a/authors.html b/authors.html index 5b4471ae..db10ce01 100644 --- a/authors.html +++ b/authors.html @@ -599,6 +599,11 @@

Amanda Ng

@@ -1031,6 +1036,11 @@

Bjorn Jee

@@ -1471,6 +1481,11 @@

Daniel Tai

@@ -1850,6 +1865,11 @@

Feng Cheng

15 Jul 2024 +
  • + Enabling conversational data discovery with LLMs at Grab + 26 Sep 2024 +
  • + @@ -5111,6 +5131,11 @@

    Shreyas Parbat

    @@ -5292,6 +5317,11 @@

    Siddharth Pandey

    @@ -6096,6 +6126,11 @@

    Varun Torka

    @@ -6221,6 +6256,11 @@

    Vinnson Lee

    24 Aug 2020 +
  • + Enabling conversational data discovery with LLMs at Grab + 26 Sep 2024 +
  • + @@ -6292,6 +6332,11 @@

    Vishal Sharma

    5 Jun 2024 +
  • + Evolution of Catwalk: Model serving platform at Grab + 1 Oct 2024 +
  • + @@ -6444,6 +6489,11 @@

    Wen Bo Wei

    19 Apr 2022 +
  • + Evolution of Catwalk: Model serving platform at Grab + 1 Oct 2024 +
  • + @@ -7009,6 +7059,11 @@

    Yucheng Zeng

    diff --git a/blog/10/index.html b/blog/10/index.html index 0e1fa208..ca84de5b 100644 --- a/blog/10/index.html +++ b/blog/10/index.html @@ -148,6 +148,179 @@ +
  • +
    + +
    + + Protecting Personal Data in Grab's Imagery cover photo + +
    + +
    +
    +
    + + + +

    + Protecting Personal Data in Grab's Imagery +

    +
    Learn how Grab improves privacy protection to cater to various geographical locations.
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + Processing ETL tasks with Ratchet cover photo + +
    + +
    +
    +
    + + + +

    + Processing ETL tasks with Ratchet +

    +
    Read about what Data and ETL pipelines are and how they are used for processing multiple tasks in the Lending Team at Grab.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -706,205 +879,6 @@

  • -
  • -
    - -
    - - How Grab Leveraged Performance Marketing Automation to Improve Conversion Rates by 30% cover photo - -
    - -
    -
    -
    - - - -

    - How Grab Leveraged Performance Marketing Automation to Improve Conversion Rates by 30% -

    -
    Read to find out how Grab's Performance Marketing team leveraged on automation to improve conversion rates.
    -
    -
    - -
    -
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - One Small Step Closer to Containerising Service Binaries cover photo - -
    - -
    -
    -
    - - - -

    - One Small Step Closer to Containerising Service Binaries -

    -
    Learn how Grab is investigating and reducing service binary size for Golang projects.
    -
    -
    - -
    -
    - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/11/index.html b/blog/11/index.html index f687f74b..61c1873e 100644 --- a/blog/11/index.html +++ b/blog/11/index.html @@ -148,6 +148,205 @@ +
  • +
    + +
    + + How Grab Leveraged Performance Marketing Automation to Improve Conversion Rates by 30% cover photo + +
    + +
    +
    +
    + + + +

    + How Grab Leveraged Performance Marketing Automation to Improve Conversion Rates by 30% +

    +
    Read to find out how Grab's Performance Marketing team leveraged on automation to improve conversion rates.
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + One Small Step Closer to Containerising Service Binaries cover photo + +
    + +
    +
    +
    + + + +

    + One Small Step Closer to Containerising Service Binaries +

    +
    Learn how Grab is investigating and reducing service binary size for Golang projects.
    +
    +
    + +
    +
    + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -740,140 +939,6 @@

  • -
  • -
    - -
    - - How Grab is Blazing Through the Superapp Bazel Migration cover photo - -
    - -
    -
    -
    - - - -

    - How Grab is Blazing Through the Superapp Bazel Migration -

    -
    Learn how we planned and started migrating our superapp to Bazel at Grab.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - Democratising Fare Storage at Scale Using Event Sourcing cover photo - -
    - -
    -
    -
    - - - -

    - Democratising Fare Storage at Scale Using Event Sourcing -

    -
    Read how we built Grab's single source of truth for fare storage and management. In this post, we explain how we used the Event Sourcing pattern to build our fare data store.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/12/index.html b/blog/12/index.html index 03165f1d..411eb4a4 100644 --- a/blog/12/index.html +++ b/blog/12/index.html @@ -152,8 +152,8 @@
    - - Keeping 170 Libraries Up to Date on a Large Scale Android App cover photo + + How Grab is Blazing Through the Superapp Bazel Migration cover photo
    @@ -164,9 +164,9 @@ Engineering

    - Keeping 170 Libraries Up to Date on a Large Scale Android App + How Grab is Blazing Through the Superapp Bazel Migration

    -
    Learn how we maintain our libraries and prevent defect leaks in our Grab Passenger app.
    +
    Learn how we planned and started migrating our superapp to Bazel at Grab.
    @@ -176,7 +176,7 @@

    - +
    @@ -185,11 +185,11 @@

    - Lucas Nelaupe + Sergii Grechukha - +

    @@ -200,9 +200,15 @@

    Android - Engineering + Bazel - Mobile + Build Time + + Gradle + + iOS + + Xcode @@ -216,8 +222,8 @@

    @@ -228,9 +234,9 @@

    - Optimally Scaling Kafka Consumer Applications + Democratising Fare Storage at Scale Using Event Sourcing

    -
    Read this deep dive on our Kubernetes infrastructure setup for Grab's stream processing framework.
    +
    Read how we built Grab's single source of truth for fare storage and management. In this post, we explain how we used the Event Sourcing pattern to build our fare data store.
    @@ -240,7 +246,7 @@

    - +
    @@ -249,11 +255,11 @@

    - Shubham Badkur + Sourabh Suman - +

    @@ -262,17 +268,11 @@

    - Back End - Event Sourcing - Go - - Kubernetes - - Platform + Fare Storage - Stream Processing + Pricing
    @@ -286,8 +286,8 @@

    @@ -298,9 +298,9 @@

    - Our Journey to Continuous Delivery at Grab (Part 1) + Keeping 170 Libraries Up to Date on a Large Scale Android App

    -
    Continuous Delivery is the principle of delivering software often, everyday. Read more to find out how we implemented continuous delivery at Grab.
    +
    Learn how we maintain our libraries and prevent defect leaks in our Grab Passenger app.
    @@ -310,7 +310,7 @@

    - +
    @@ -319,11 +319,11 @@

    - Sylvain Bougerel + Lucas Nelaupe - +

    @@ -332,23 +332,11 @@

    - CI - - Cloud Agnostic - - Continuous Delivery - - Continuous Deployment - - Continuous Integration - - Deployment - - Deployment Process + Android - Multi Cloud + Engineering - Spinnaker + Mobile
    @@ -362,8 +350,8 @@

    @@ -374,9 +362,9 @@

    - Uncovering the Truth Behind Lua and Redis Data Consistency + Optimally Scaling Kafka Consumer Applications

    -
    Redis does not guarantee the consistency between master and its replica nodes when Lua scripts are used. Read more to find out why and how to guarantee data consistency.
    +
    Read this deep dive on our Kubernetes infrastructure setup for Grab's stream processing framework.
    @@ -386,7 +374,7 @@

    - +
    @@ -395,11 +383,11 @@

    - Allen Wang + Shubham Badkur - +

    @@ -408,13 +396,17 @@

    - Data Consistency + Back End - High CPU Usage + Event Sourcing - Lua Scripts + Go - Redis + Kubernetes + + Platform + + Stream Processing
    @@ -428,8 +420,8 @@

    @@ -439,14 +431,10 @@

    - - · - -

    - Securing and Managing Multi-cloud Presto Clusters with Grab’s DataGateway + Our Journey to Continuous Delivery at Grab (Part 1)

    -
    This blog post discusses how Grab's DataGateway plays a key role in supporting hundreds of users in our entire Presto ecosystem - from managing user access, cluster selection, workload distribution, and many more.
    +
    Continuous Delivery is the principle of delivering software often, everyday. Read more to find out how we implemented continuous delivery at Grab.
    @@ -456,7 +444,7 @@

    - +
    @@ -465,11 +453,11 @@

    - Vinnson Lee + Sylvain Bougerel - +

    @@ -478,19 +466,23 @@

    - Access Control + CI - Cluster + Cloud Agnostic - Data + Continuous Delivery - Data Pipeline + Continuous Deployment - Engineering + Continuous Integration - Presto + Deployment - Workload Distribution + Deployment Process + + Multi Cloud + + Spinnaker
    @@ -504,8 +496,8 @@

    @@ -516,9 +508,9 @@

    - Go Modules- A Guide for monorepos (Part 2) + Uncovering the Truth Behind Lua and Redis Data Consistency

    -
    This is the second post on the Go module series, which highlights Grab’s experience working with Go modules in a multi-module monorepo. Here, we discuss the additional solutions for addressing dependency issues, as well as cover automatic upgrades.
    +
    Redis does not guarantee the consistency between master and its replica nodes when Lua scripts are used. Read more to find out why and how to guarantee data consistency.
    @@ -528,7 +520,7 @@

    - +
    @@ -537,11 +529,11 @@

    - Michael Cartmell + Allen Wang - +

    @@ -550,15 +542,13 @@

    - Go - - Libraries + Data Consistency - Monorepo + High CPU Usage - Vendoring + Lua Scripts - Vendors + Redis
    @@ -572,8 +562,8 @@

    @@ -588,9 +578,9 @@

    - The Journey of Deploying Apache Airflow at Grab + Securing and Managing Multi-cloud Presto Clusters with Grab’s DataGateway

    -
    This blog post shares how we designed and implemented an Apache Airflow-based scheduling and orchestration platform for teams across Grab.
    +
    This blog post discusses how Grab's DataGateway plays a key role in supporting hundreds of users in our entire Presto ecosystem - from managing user access, cluster selection, workload distribution, and many more.
    @@ -600,7 +590,7 @@

    - +
    @@ -609,11 +599,11 @@

    - Chandulal Kavar + Vinnson Lee - +

    @@ -622,17 +612,19 @@

    - Airflow + Access Control + + Cluster + + Data Data Pipeline Engineering - Kubernetes - - Platform + Presto - Scheduling + Workload Distribution
    @@ -646,8 +638,8 @@

    @@ -658,9 +650,9 @@

    - How We Built Our In-house Chat Platform for the Web + Go Modules- A Guide for monorepos (Part 2)

    -
    This blog post shares our learnings from building our very own chat platform for the web.
    +
    This is the second post on the Go module series, which highlights Grab’s experience working with Go modules in a multi-module monorepo. Here, we discuss the additional solutions for addressing dependency issues, as well as cover automatic upgrades.
    @@ -670,7 +662,7 @@

    - +
    @@ -679,11 +671,11 @@

    - Vasudevan K. + Michael Cartmell - +

    @@ -692,13 +684,15 @@

    - Chat + Go - Customer Support + Libraries - Engineering + Monorepo - Web + Vendoring + + Vendors
    diff --git a/blog/13/index.html b/blog/13/index.html index 363fa3ad..cf652aa7 100644 --- a/blog/13/index.html +++ b/blog/13/index.html @@ -148,6 +148,146 @@ +
  • +
    + +
    + + The Journey of Deploying Apache Airflow at Grab cover photo + +
    + +
    +
    +
    + + + + + · + + +

    + The Journey of Deploying Apache Airflow at Grab +

    +
    This blog post shares how we designed and implemented an Apache Airflow-based scheduling and orchestration platform for teams across Grab.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + How We Built Our In-house Chat Platform for the Web cover photo + +
    + +
    +
    +
    + + + +

    + How We Built Our In-house Chat Platform for the Web +

    +
    This blog post shares our learnings from building our very own chat platform for the web.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -633,171 +773,6 @@

  • -
  • -
    - -
    - - Grab-Posisi - Southeast Asia’s First Comprehensive GPS Trajectory Dataset cover photo - -
    - -
    -
    -
    - - - -

    - Grab-Posisi - Southeast Asia’s First Comprehensive GPS Trajectory Dataset -

    -
    This blog highlights Grab's latest GPS trajectory dataset - its content, format, applications, and how you can access the dataset for your research purpose.
    -
    -
    - -
    -
    - - - - - - - - - - - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - How We Prevented App Performance Degradation from Sudden Ride Demand Spikes cover photo - -
    - -
    -
    -
    - - - -

    - How We Prevented App Performance Degradation from Sudden Ride Demand Spikes -

    -
    This blog addresses how engineers overcame the challenges Grab faced during the initial days due to sudden spike in ride demand.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/14/index.html b/blog/14/index.html index 36b250a7..54a3a19c 100644 --- a/blog/14/index.html +++ b/blog/14/index.html @@ -148,6 +148,171 @@ +
  • +
    + +
    + + Grab-Posisi - Southeast Asia’s First Comprehensive GPS Trajectory Dataset cover photo + +
    + +
    +
    +
    + + + +

    + Grab-Posisi - Southeast Asia’s First Comprehensive GPS Trajectory Dataset +

    +
    This blog highlights Grab's latest GPS trajectory dataset - its content, format, applications, and how you can access the dataset for your research purpose.
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + How We Prevented App Performance Degradation from Sudden Ride Demand Spikes cover photo + +
    + +
    +
    +
    + + + +

    + How We Prevented App Performance Degradation from Sudden Ride Demand Spikes +

    +
    This blog addresses how engineers overcame the challenges Grab faced during the initial days due to sudden spike in ride demand.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -629,177 +794,6 @@

  • -
  • -
    - -
    - - Using Grab’s Trust Counter Service to Detect Fraud Successfully cover photo - -
    - -
    -
    -
    - - - -

    - Using Grab’s Trust Counter Service to Detect Fraud Successfully -

    -
    This blog introduces Grab’s Trust Counter service for detecting fraud. It explains how the solution was designed so that different stakeholders like data analysts and data scientists can use the Counter service without any manual intervention from engineers. The Counter service provides a reliable data feed to the data science world.
    -
    -
    - -
    -
    - - - - - - - - - - - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - Being a Principal Engineer at Grab cover photo - -
    - -
    -
    -
    - - - -

    - Being a Principal Engineer at Grab -

    -
    Curious about what a Principal Engineer role at Grab entails? Our Principal Engineers' responsibilities range from solving complex problems, taking care of the system-level architecture, collaborating with cross-functional teams, providing mentorship, and more.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/15/index.html b/blog/15/index.html index 0ed68090..352ccccc 100644 --- a/blog/15/index.html +++ b/blog/15/index.html @@ -148,6 +148,177 @@ +
  • +
    + +
    + + Using Grab’s Trust Counter Service to Detect Fraud Successfully cover photo + +
    + +
    +
    +
    + + + +

    + Using Grab’s Trust Counter Service to Detect Fraud Successfully +

    +
    This blog introduces Grab’s Trust Counter service for detecting fraud. It explains how the solution was designed so that different stakeholders like data analysts and data scientists can use the Counter service without any manual intervention from engineers. The Counter service provides a reliable data feed to the data science world.
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + Being a Principal Engineer at Grab cover photo + +
    + +
    +
    +
    + + + +

    + Being a Principal Engineer at Grab +

    +
    Curious about what a Principal Engineer role at Grab entails? Our Principal Engineers' responsibilities range from solving complex problems, taking care of the system-level architecture, collaborating with cross-functional teams, providing mentorship, and more.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -709,160 +880,6 @@

  • -
  • -
    - -
    - - React Native in GrabPay cover photo - -
    - -
    -
    -
    - - - -

    - React Native in GrabPay -

    -
    This blog post describes how we used React Native to optimize the Grab PAX app.
    -
    -
    - -
    -
    - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - Connecting the Invisibles to Design Seamless Experiences cover photo - -
    - -
    -
    -
    - - - -

    - Connecting the Invisibles to Design Seamless Experiences -

    -
    Much of the work done by the service design team at Grab revolves around integrating people, processes, and systems to deliver seamless user experiences. In this blog post, we present an overview on how Grab's service design team goes about doing that.
    -
    -
    - -
    -
    - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/16/index.html b/blog/16/index.html index c97e77f9..01e944e0 100644 --- a/blog/16/index.html +++ b/blog/16/index.html @@ -148,6 +148,160 @@ +
  • +
    + +
    + + React Native in GrabPay cover photo + +
    + +
    +
    +
    + + + +

    + React Native in GrabPay +

    +
    This blog post describes how we used React Native to optimize the Grab PAX app.
    +
    +
    + +
    +
    + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + Connecting the Invisibles to Design Seamless Experiences cover photo + +
    + +
    +
    +
    + + + +

    + Connecting the Invisibles to Design Seamless Experiences +

    +
    Much of the work done by the service design team at Grab revolves around integrating people, processes, and systems to deliver seamless user experiences. In this blog post, we present an overview on how Grab's service design team goes about doing that.
    +
    +
    + +
    +
    + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -683,130 +837,6 @@

  • -
  • -
    - -
    - - How We Harnessed the Wisdom of Crowds to Improve Restaurant Location Accuracy cover photo - -
    - -
    -
    -
    - - - -

    - How We Harnessed the Wisdom of Crowds to Improve Restaurant Location Accuracy -

    -
    We questioned some of the estimates that our algorithm for calculating restaurant wait times was making, and found that the "errors" were actually useful to discover restaurants whose locations had been incorrectly registered in our system. By combining such error signals across multiple orders, we were able to identify correct restaurant locations and amend them to improve the experience for our consumers.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - Designing Resilient Systems Beyond Retries (Part 3): Architecture Patterns and Chaos Engineering cover photo - -
    - -
    -
    -
    - - - -

    - Designing Resilient Systems Beyond Retries (Part 3): Architecture Patterns and Chaos Engineering -

    -
    This post is the third of a three-part series on going beyond retries and circuit breakers to improve system resiliency. This whole series covers techniques and architectures that can be used as part of a strategy to improve resiliency. In this article, we will focus on architecture patterns and chaos engineering to reduce, prevent, and test resiliency.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/17/index.html b/blog/17/index.html index 7d57c22e..63df584b 100644 --- a/blog/17/index.html +++ b/blog/17/index.html @@ -148,6 +148,130 @@ +
  • +
    + +
    + + How We Harnessed the Wisdom of Crowds to Improve Restaurant Location Accuracy cover photo + +
    + +
    +
    +
    + + + +

    + How We Harnessed the Wisdom of Crowds to Improve Restaurant Location Accuracy +

    +
    We questioned some of the estimates that our algorithm for calculating restaurant wait times was making, and found that the "errors" were actually useful to discover restaurants whose locations had been incorrectly registered in our system. By combining such error signals across multiple orders, we were able to identify correct restaurant locations and amend them to improve the experience for our consumers.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + Designing Resilient Systems Beyond Retries (Part 3): Architecture Patterns and Chaos Engineering cover photo + +
    + +
    +
    +
    + + + +

    + Designing Resilient Systems Beyond Retries (Part 3): Architecture Patterns and Chaos Engineering +

    +
    This post is the third of a three-part series on going beyond retries and circuit breakers to improve system resiliency. This whole series covers techniques and architectures that can be used as part of a strategy to improve resiliency. In this article, we will focus on architecture patterns and chaos engineering to reduce, prevent, and test resiliency.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -657,177 +781,6 @@

  • -
  • -
    - -
    - - Understanding Supply & Demand in Ride-hailing Through the Lens of Data cover photo - -
    - -
    -
    -
    - - - -

    - Understanding Supply & Demand in Ride-hailing Through the Lens of Data -

    -
    Grab aims to ensure that our passengers can get a ride conveniently while providing our drivers better livelihood. To achieve this, balancing demand and supply is crucial. This article gives you a glimpse of one of our analytics initiatives - how to measure the supply and demand ratio at any given area and time.
    -
    -
    - -
    -
    - - - - - - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - A Lean and Scalable Data Pipeline to Capture Large Scale Events and Support Experimentation Platform cover photo - -
    - -
    -
    -
    - - - -

    - A Lean and Scalable Data Pipeline to Capture Large Scale Events and Support Experimentation Platform -

    -
    This blog post focuses on the lessons we learned while building our batch data pipeline.
    -
    -
    - -
    -
    - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/18/index.html b/blog/18/index.html index 88beeaae..8396a0e9 100644 --- a/blog/18/index.html +++ b/blog/18/index.html @@ -152,8 +152,8 @@
    @@ -162,11 +162,11 @@
    - +

    - Designing Resilient Systems: Circuit Breakers or Retries? (Part 2) + Understanding Supply & Demand in Ride-hailing Through the Lens of Data

    -
    Grab designs fault-tolerant systems that can withstand failures allowing us to continuously provide our consumers with the many services they expect from us.
    +
    Grab aims to ensure that our passengers can get a ride conveniently while providing our drivers better livelihood. To achieve this, balancing demand and supply is crucial. This article gives you a glimpse of one of our analytics initiatives - how to measure the supply and demand ratio at any given area and time.
    @@ -176,7 +176,17 @@

    - + + + + + + + + + + +
    @@ -185,11 +195,27 @@

    - Corey Scott + Aayush Garg + + + + + + · + + Lara PuReum Yim + + + + + + · + + ChunKai Phang - +

    @@ -198,9 +224,15 @@

    - Circuit Breakers + Analytics - Resiliency + Data + + Data Analytics + + Data Storytelling + + Data Visualisation
    @@ -214,8 +246,8 @@

    @@ -226,9 +258,9 @@

    - Querying Big Data in Real-time with Presto & Grab's TalariaDB + A Lean and Scalable Data Pipeline to Capture Large Scale Events and Support Experimentation Platform

    -
    In this article, we focus on TalariaDB, a distributed, highly available, and low latency time-series database that stores real-time data. For example, logs, metrics, and click streams generated by mobile apps and backend services that use Grab's Experimentation Platform SDK. It "stalks" the real-time data feed and only keeps the last one hour of data.
    +
    This blog post focuses on the lessons we learned while building our batch data pipeline.
    @@ -238,12 +270,12 @@

    - + - +
    @@ -252,7 +284,7 @@

    - Roman Atachiants + Oscar Cassetti @@ -260,11 +292,11 @@

    · - Oscar Cassetti + Roman Atachiants - +

    @@ -275,13 +307,9 @@

    Big Data - Database - - Presto - - Real-Time + Data Pipeline - TalariaDB + Experiment @@ -295,8 +323,8 @@

    @@ -307,7 +335,7 @@

    - Designing Resilient Systems: Circuit Breakers or Retries? (Part 1) + Designing Resilient Systems: Circuit Breakers or Retries? (Part 2)

    Grab designs fault-tolerant systems that can withstand failures allowing us to continuously provide our consumers with the many services they expect from us.
    @@ -332,7 +360,7 @@

    - 21 Dec 2018 | 17 min read + 8 Jan 2019 | 14 min read @@ -357,8 +385,8 @@

    @@ -369,9 +397,9 @@

    - Orchestrating Chaos Using Grab's Experimentation Platform + Querying Big Data in Real-time with Presto & Grab's TalariaDB

    -
    At Grab, we practice chaos engineering by intentionally introducing failures in a service or component in the overall business flow. But the failed’ service is not the experiment’s focus. We’re interested in testing the services dependent on that failed service.
    +
    In this article, we focus on TalariaDB, a distributed, highly available, and low latency time-series database that stores real-time data. For example, logs, metrics, and click streams generated by mobile apps and backend services that use Grab's Experimentation Platform SDK. It "stalks" the real-time data feed and only keeps the last one hour of data.
    @@ -386,12 +414,7 @@

    - - - - - - +
    @@ -408,19 +431,11 @@

    · - Tharaka Wijebandara - - - - - - · - - Abeesh Thomas + Oscar Cassetti - +

    @@ -429,11 +444,15 @@

    - Chaos Engineering + Big Data - Microservice + Database - Resiliency + Presto + + Real-Time + + TalariaDB
    @@ -447,8 +466,8 @@

    @@ -459,9 +478,9 @@

    - Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab + Designing Resilient Systems: Circuit Breakers or Retries? (Part 1)

    -
    Grab’s feature toggle SDK provides a dynamic feature toggle capability to our engineering, data, product, and even business teams. Feature toggles also let teams modify system behaviour without changing code. Developers use the feature flags to keep new features hidden until product and marketing teams are ready to share and to run experiments (A/B tests) by dynamically changing feature toggles for specific users, rides, etc.
    +
    Grab designs fault-tolerant systems that can withstand failures allowing us to continuously provide our consumers with the many services they expect from us.
    @@ -471,7 +490,7 @@

    - +
    @@ -480,11 +499,11 @@

    - Roman Atachiants + Corey Scott - +

    @@ -493,15 +512,9 @@

    - A/B Testing - - Back End - - Experiment - - Feature Toggle + Circuit Breakers - Front End + Resiliency
    @@ -515,8 +528,8 @@

    @@ -527,9 +540,9 @@

    - Mockers - Overcoming Testing Challenges at Grab + Orchestrating Chaos Using Grab's Experimentation Platform

    -
    Sustaining quality in fast paced development is a challenge. At Grab, we use Mockers - a tool to expand the scope of local box testing. It helps us overcome testing challenges in a microservice architecture.
    +
    At Grab, we practice chaos engineering by intentionally introducing failures in a service or component in the overall business flow. But the failed’ service is not the experiment’s focus. We’re interested in testing the services dependent on that failed service.
    @@ -539,27 +552,17 @@

    - - - - - - - - - - - + - + - +
    @@ -568,23 +571,7 @@

    - Mayank Gupta - - - - - - · - - Vineet Nair - - - - - - · - - Shivkumar Krishnan + Roman Atachiants @@ -592,7 +579,7 @@

    · - Thuy Nguyen + Tharaka Wijebandara @@ -600,11 +587,11 @@

    · - Vishal Prakash + Abeesh Thomas - +

    @@ -613,11 +600,11 @@

    - Back End + Chaos Engineering - Service + Microservice - Testing + Resiliency
    @@ -631,8 +618,8 @@

    @@ -641,11 +628,11 @@

    - +

    - Journey of a Tourist via Grab + Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab

    -
    Grab's services to tourists are an integral part of connecting tourists to various destinations and attractions. Do tourists travel on Grab to outlandishly fancy places like those you see in the movie "Crazy Rich Asians"? What are their favourite local places? Did you know that Grab's data reveals that medical tourism is growing in Singapore? Here are some exciting travel patterns that we found about our tourists' Grab rides in Singapore!
    +
    Grab’s feature toggle SDK provides a dynamic feature toggle capability to our engineering, data, product, and even business teams. Feature toggles also let teams modify system behaviour without changing code. Developers use the feature flags to keep new features hidden until product and marketing teams are ready to share and to run experiments (A/B tests) by dynamically changing feature toggles for specific users, rides, etc.

    @@ -655,7 +642,7 @@

    - +
    @@ -664,11 +651,11 @@

    - Lara PuReum Yim + Roman Atachiants - +

    @@ -677,15 +664,15 @@

    - Analytics + A/B Testing - Data + Back End - Data Analytics + Experiment - Tourism + Feature Toggle - Tourists + Front End
    @@ -699,8 +686,8 @@

    @@ -711,9 +698,9 @@

    - How We Designed the Quotas Microservice to Prevent Resource Abuse + Mockers - Overcoming Testing Challenges at Grab

    -
    Reliable, scalable, and high performing solutions for common system level issues are essential for microservice success, and there is a Grab-wide initiative to provide those common solutions. As an important component of the initiative, we wrote a microservice called Quotas, a highly scalable API request rate limiting solution to mitigate the problems of service abuse and cascading service failures.
    +
    Sustaining quality in fast paced development is a challenge. At Grab, we use Mockers - a tool to expand the scope of local box testing. It helps us overcome testing challenges in a microservice architecture.
    @@ -723,12 +710,27 @@

    - + + + + + + + + + + + - + + + + + +
    @@ -737,7 +739,15 @@

    - Jim Zhan + Mayank Gupta + + + + + + · + + Vineet Nair @@ -745,11 +755,27 @@

    · - Gao Chao + Shivkumar Krishnan + + + + + + · + + Thuy Nguyen + + + + + + · + + Vishal Prakash - +

    @@ -760,10 +786,10 @@

    Back End - Quota - Service + Testing + diff --git a/blog/19/index.html b/blog/19/index.html index b12258f9..3eb58a0b 100644 --- a/blog/19/index.html +++ b/blog/19/index.html @@ -148,6 +148,151 @@ +
  • +
    + +
    + + Journey of a Tourist via Grab cover photo + +
    + +
    +
    +
    + + + +

    + Journey of a Tourist via Grab +

    +
    Grab's services to tourists are an integral part of connecting tourists to various destinations and attractions. Do tourists travel on Grab to outlandishly fancy places like those you see in the movie "Crazy Rich Asians"? What are their favourite local places? Did you know that Grab's data reveals that medical tourism is growing in Singapore? Here are some exciting travel patterns that we found about our tourists' Grab rides in Singapore!
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + How We Designed the Quotas Microservice to Prevent Resource Abuse cover photo + +
    + +
    +
    +
    + + + +

    + How We Designed the Quotas Microservice to Prevent Resource Abuse +

    +
    Reliable, scalable, and high performing solutions for common system level issues are essential for microservice success, and there is a Grab-wide initiative to provide those common solutions. As an important component of the initiative, we wrote a microservice called Quotas, a highly scalable API request rate limiting solution to mitigate the problems of service abuse and cascading service failures.
    +
    +
    + +
    +
    + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -597,154 +742,6 @@

  • -
  • -
    - -
    - - GrabShare at the Intelligent Transportation Engineering Conference cover photo - -
    - -
    -
    -
    - - - -

    - GrabShare at the Intelligent Transportation Engineering Conference -

    -
    We're excited to share the publication of our paper GrabShare: The Construction of a Realtime Ridesharing Service, which was Grab's contribution to the Intelligent Transportation Engineering Conference in Singapore last month.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - Grabbing Growth: A Growth Hacking Story cover photo - -
    - -
    -
    -
    - - - -

    - Grabbing Growth: A Growth Hacking Story -

    -
    Disrupt or be disrupted - that was exactly the spirit in which the Growth Hacking team was created this year. This was a deliberate decision to nurture our scrappy DNA, and ensure that we had a dedicated space to experiment and enable intelligent risk-taking.
    -
    -
    - -
    -
    - - - - - - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/2/index.html b/blog/2/index.html index 712fdf80..f135a22f 100644 --- a/blog/2/index.html +++ b/blog/2/index.html @@ -148,6 +148,203 @@ +
  • +
    + +
    + + How we evaluated the business impact of marketing campaigns cover photo + +
    + +
    +
    +
    + + + +

    + How we evaluated the business impact of marketing campaigns +

    +
    Discover how Grab assesses marketing effectiveness using advanced attribution models and strategic testing to improve campaign precision and impact.
    +
    +
    + +
    +
    + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + No version left behind: Our epic journey of GitLab upgrades cover photo + +
    + +
    +
    +
    + + + +

    + No version left behind: Our epic journey of GitLab upgrades +

    +
    Join us as we share our experience in developing and implementing a consistent upgrade routine. This process underscored the significance of adaptability, comprehensive preparation, efficient communication, and ongoing learning.
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -669,164 +866,6 @@

  • -
  • -
    - -
    - - Managing dynamic marketplace content at scale: Grab's approach to content moderation cover photo - -
    - -
    -
    -
    - - - -

    - Managing dynamic marketplace content at scale: Grab's approach to content moderation -

    -
    Understand how Grab employs a combination of automated and manual content moderation to manage its dynamic marketplace content efficiently, while also collaborating with Google to ensure marketplace safety.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - Rethinking Stream Processing: Data Exploration cover photo - -
    - -
    -
    -
    - - - -

    - Rethinking Stream Processing: Data Exploration -

    -
    As Grab matures along the digitalisation journey, it is collecting and streaming event data generated from the end users of its superapp on a larger magnitude than before. Coban, Grab’s data-streaming platform team, is looking to help unlock the value of streaming data at an earlier stage of the data journey before this data is typically stored in a central location (“Data Lake”). This allows Grab to serve its superapp users more efficiently.
    -
    -
    - -
    -
    - - - - - - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/20/index.html b/blog/20/index.html index 361dcae3..54c6a16c 100644 --- a/blog/20/index.html +++ b/blog/20/index.html @@ -152,8 +152,8 @@
    @@ -164,9 +164,9 @@

    - The Data and Science Behind GrabShare Part I: Verifying Potential and Developing the Algorithm + GrabShare at the Intelligent Transportation Engineering Conference

    -
    Launching GrabShare was no easy feat. After reviewing the academic literature, we decided to take a different approach and build a new matching algorithm from the ground up.
    +
    We're excited to share the publication of our paper GrabShare: The Construction of a Realtime Ridesharing Service, which was Grab's contribution to the Intelligent Transportation Engineering Conference in Singapore last month.
    @@ -176,7 +176,7 @@

    - +
    @@ -185,11 +185,11 @@

    - Tang Muchen + Dominic Widdows - +

    @@ -214,8 +214,8 @@

    @@ -224,11 +224,11 @@

    - +

    - The Art of Hiring Good Engineers + Grabbing Growth: A Growth Hacking Story

    -
    Hiring the first five good engineers in your team requires a different approach to hiring the first twenty good engineers. The approach to designing this process will be even more different, when you want to hire to scale up to a 100 Engineers... or even to 300.
    +
    Disrupt or be disrupted - that was exactly the spirit in which the Growth Hacking team was created this year. This was a deliberate decision to nurture our scrappy DNA, and ensure that we had a dedicated space to experiment and enable intelligent risk-taking.

    @@ -238,7 +238,17 @@

    - + + + + + + + + + + +
    @@ -247,11 +257,27 @@

    - Rachel Lee + Gaurav Sachdeva + + + + + + · + + Huan Yang + + + + + + · + + Jiaying Lim - +

    @@ -260,7 +286,7 @@

    - Hiring + Growth Hacking
    @@ -274,8 +300,8 @@

    @@ -284,11 +310,11 @@

    - +

    - Migrating Existing Datastores + The Data and Science Behind GrabShare Part I: Verifying Potential and Developing the Algorithm

    -
    At Grab we take pride in creating solutions that impact millions of people in Southeast Asia and as they say, with great power comes great responsibility. As an app with 55 million downloads and 1.2 million drivers, it's our responsibility to keep our systems up-and-running. Any downtime causes drivers to miss earning and passengers to miss their appointments.
    +
    Launching GrabShare was no easy feat. After reviewing the academic literature, we decided to take a different approach and build a new matching algorithm from the ground up.

    @@ -298,7 +324,7 @@

    - +
    @@ -307,11 +333,11 @@

    - Nishant Gupta + Tang Muchen - +

    @@ -320,9 +346,9 @@

    - Back End + Data Science - Redis + GrabShare
    @@ -336,8 +362,8 @@

    @@ -348,9 +374,9 @@

    - So You Need to Hire Good Engineers + The Art of Hiring Good Engineers

    -
    If you are in a fast growing tech startup, you're probably actively interviewing and hiring engineers to scale teams. My question to you is, what hiring strategy are you using when interviewing engineering warriors?
    +
    Hiring the first five good engineers in your team requires a different approach to hiring the first twenty good engineers. The approach to designing this process will be even more different, when you want to hire to scale up to a 100 Engineers... or even to 300.
    @@ -373,7 +399,7 @@

    - 24 Jul 2017 | 6 min read + 4 Oct 2017 | 8 min read @@ -396,8 +422,8 @@

    @@ -408,9 +434,9 @@

    - Come and #hackallthethings at Grab + Migrating Existing Datastores

    -
    For the longest time, security has been at the center of our priorities. There’s nothing more self-evident about the trust our millions of driving partners and consumers put in Grab. We strive every day to build the best tools available to ensure their data stays secure.
    +
    At Grab we take pride in creating solutions that impact millions of people in Southeast Asia and as they say, with great power comes great responsibility. As an app with 55 million downloads and 1.2 million drivers, it's our responsibility to keep our systems up-and-running. Any downtime causes drivers to miss earning and passengers to miss their appointments.
    @@ -420,7 +446,7 @@

    - +
    @@ -429,11 +455,11 @@

    - Grab Engineering + Nishant Gupta - +

    @@ -442,7 +468,9 @@

    - Security + Back End + + Redis
    @@ -455,16 +483,22 @@

  • -
    +
    + + So You Need to Hire Good Engineers cover photo + +
    + +

    - How We Scaled Our Cache and Got a Good Night's Sleep + So You Need to Hire Good Engineers

    -
    Caching is arguably the most important and widely used technique in computer industry, from CPU to Facebook live videos, cache is everywhere.
    +
    If you are in a fast growing tech startup, you're probably actively interviewing and hiring engineers to scale teams. My question to you is, what hiring strategy are you using when interviewing engineering warriors?
    @@ -474,7 +508,7 @@

    - +

    @@ -496,9 +530,7 @@

    @@ -512,8 +544,8 @@

    @@ -524,10 +556,9 @@

    - Grab's Front End Study Guide + Come and #hackallthethings at Grab

    -
    Grab is Southeast Asia (SEA)’s leading transportation platform and our mission is to drive SEA forward, leveraging on the latest technology and the talented people we have in the company. As of May 2017, we handle 2.3 million rides daily and we are growing and hiring at a rapid scale. -To keep up with Grab’s phenomenal growth, our web team and web platforms have to grow as well. Fortunately, or unfortunately, at Grab, the web team has been keeping up with the latest best practices and has incorporated the modern JavaScript ecosystem in our web apps.
    +
    For the longest time, security has been at the center of our priorities. There’s nothing more self-evident about the trust our millions of driving partners and consumers put in Grab. We strive every day to build the best tools available to ensure their data stays secure.

    @@ -537,7 +568,7 @@

    - + @@ -559,11 +590,7 @@

    @@ -583,9 +610,9 @@

    - DNS Resolution in Go and Cgo + How We Scaled Our Cache and Got a Good Night's Sleep

    -
    This article is part two of a two-part series. In this article, we will talk about RFC 6724 (3484), how DNS resolution works in Go and Cgo, and finally explaining why disabling IPv6 also disables the sorting of IP Addresses.
    +
    Caching is arguably the most important and widely used technique in computer industry, from CPU to Facebook live videos, cache is everywhere.
    @@ -595,7 +622,7 @@

    - + @@ -617,9 +644,9 @@

    diff --git a/blog/21/index.html b/blog/21/index.html index 36deb0bb..d7a5f6c1 100644 --- a/blog/21/index.html +++ b/blog/21/index.html @@ -148,6 +148,127 @@ +
  • +
    + +
    + + Grab's Front End Study Guide cover photo + +
    + +
    +
    +
    + + + +

    + Grab's Front End Study Guide +

    +
    Grab is Southeast Asia (SEA)’s leading transportation platform and our mission is to drive SEA forward, leveraging on the latest technology and the talented people we have in the company. As of May 2017, we handle 2.3 million rides daily and we are growing and hiring at a rapid scale. +To keep up with Grab’s phenomenal growth, our web team and web platforms have to grow as well. Fortunately, or unfortunately, at Grab, the web team has been keeping up with the latest best practices and has incorporated the modern JavaScript ecosystem in our web apps.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    +
    +
    + + + +

    + DNS Resolution in Go and Cgo +

    +
    This article is part two of a two-part series. In this article, we will talk about RFC 6724 (3484), how DNS resolution works in Go and Cgo, and finally explaining why disabling IPv6 also disables the sorting of IP Addresses.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -555,123 +676,6 @@

  • -
  • -
    - -
    -
    -
    - - - -

    - A Key Expired in Redis, You Won't Believe What Happened Next -

    -
    One of Grab's more popular caching solutions is Redis (often in the flavour of the misleadingly named ElastiCache), and for most cases, it works. Except for that time it didn't. Follow our story as we investigate how Redis deals with consistency on key expiration.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - How Grab Hires Engineers in Singapore cover photo - -
    - -
    -
    -
    - - - -

    - How Grab Hires Engineers in Singapore -

    -
    Working at Grab will be the “most challenging yet rewarding opportunity” any employee will ever encounter. -
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/22/index.html b/blog/22/index.html index c9ef2c58..7bec4fe6 100644 --- a/blog/22/index.html +++ b/blog/22/index.html @@ -151,23 +151,16 @@
  • -
    - - Battling with Tech Giants for the World's Best Talent cover photo - -
    - -
    +

    - Battling with Tech Giants for the World's Best Talent + A Key Expired in Redis, You Won't Believe What Happened Next

    -
    Grab steadily attracts a diverse set of engineers from around the world in its three R&D centres: Singapore, Seattle, and Beijing. Right now, half of Grab’s top leadership team is made up of women and we have attracted people from five continents to work together on solving the biggest challenges for Southeast Asia. -
    +
    One of Grab's more popular caching solutions is Redis (often in the flavour of the misleadingly named ElastiCache), and for most cases, it works. Except for that time it didn't. Follow our story as we investigate how Redis deals with consistency on key expiration.
    @@ -177,7 +170,7 @@

    - +

    @@ -199,7 +192,9 @@

    @@ -213,8 +208,8 @@

    @@ -225,9 +220,9 @@

    - This Rocket Ain't Stopping - Achieving Zero Downtime for Rails to Golang API Migration + How Grab Hires Engineers in Singapore

    -
    Grab has been transitioning from a Rails + NodeJS stack to a full Golang Service Oriented Architecture. To contribute to a single common code base, we wanted to transfer engineers working on the Rails server powering our passenger app APIs to other Go teams. +
    Working at Grab will be the “most challenging yet rewarding opportunity” any employee will ever encounter.
    @@ -238,7 +233,7 @@

    - +

    @@ -260,11 +255,7 @@

    @@ -278,8 +269,8 @@

    @@ -287,10 +278,13 @@

    + +

    - Grab Vietnam Careers Week + Battling with Tech Giants for the World's Best Talent

    -
    Grab is organising our first ever Grab Vietnam Careers Week in Ho Chi Minh City, Vietnam, from 22 to 26 October 2016. We are eager to have more engineers join our ranks to make a difference to improving transportation and reducing congestion in Southeast Asia. We are now on 23 million mobile devices supported by 460,000 drivers in the region, but we're only started and have much more to achieve! To find out more about Grab, take a look at our corporate profile at the end of this post.
    +
    Grab steadily attracts a diverse set of engineers from around the world in its three R&D centres: Singapore, Seattle, and Beijing. Right now, half of Grab’s top leadership team is made up of women and we have attracted people from five continents to work together on solving the biggest challenges for Southeast Asia. +
    @@ -313,7 +307,7 @@

    - +

    @@ -335,14 +329,23 @@

  • -
    +
    + + This Rocket Ain't Stopping - Achieving Zero Downtime for Rails to Golang API Migration cover photo + +
    + +
    + +

    - GrabPay Wins Best Fraud Prevention Innovation at the Florin Awards + This Rocket Ain't Stopping - Achieving Zero Downtime for Rails to Golang API Migration

    -
    I am honoured to receive the Best Fraud Prevention Innovation (Community Votes) Award at the 2016 Florin Awards on behalf of Grab. For those of you who voted for Grab, we thank you for your support that made this award possible.
    +
    Grab has been transitioning from a Rails + NodeJS stack to a full Golang Service Oriented Architecture. To contribute to a single common code base, we wanted to transfer engineers working on the Rails server powering our passenger app APIs to other Go teams. +
    @@ -352,7 +355,7 @@

    - +

    @@ -374,7 +377,11 @@

    @@ -387,16 +394,20 @@

  • -
    +
    + + Grab Vietnam Careers Week cover photo + +
    + +
    - -

    - Round-robin in Distributed Systems + Grab Vietnam Careers Week

    -
    While working on Grab's Common Data Service (CDS), there was the need to implement client side load balancing between CDS clients and servers. However, I kept encountering persistent connection issues with Elastic Load Balance (ELB).
    +
    Grab is organising our first ever Grab Vietnam Careers Week in Ho Chi Minh City, Vietnam, from 22 to 26 October 2016. We are eager to have more engineers join our ranks to make a difference to improving transportation and reducing congestion in Southeast Asia. We are now on 23 million mobile devices supported by 460,000 drivers in the region, but we're only started and have much more to achieve! To find out more about Grab, take a look at our corporate profile at the end of this post.
    @@ -406,7 +417,7 @@

    - +

    @@ -428,15 +439,7 @@

    @@ -453,12 +456,10 @@

    - -

    - Why Test the Design with Only 5 Users + GrabPay Wins Best Fraud Prevention Innovation at the Florin Awards

    -
    The reasoning behind small sample sizes in qualitative usability research.
    +
    I am honoured to receive the Best Fraud Prevention Innovation (Community Votes) Award at the 2016 Florin Awards on behalf of Grab. For those of you who voted for Grab, we thank you for your support that made this award possible.
    @@ -468,7 +469,7 @@

    - +

  • @@ -490,9 +491,7 @@

    @@ -512,11 +511,9 @@

    - Programmers Beware - UX is Not Just for Designers + Round-robin in Distributed Systems

    -
    Perhaps one of the biggest missed opportunities in Tech in recent history is UX. -Somehow, UX became the domain of Product Designers and User Interface Designers. -While they definitely are the right people to be thinking about web pages, mobile app screens and so on, we've missed a huge part of what we engineers work on every day: SDKs and APIs.
    +
    While working on Grab's Common Data Service (CDS), there was the need to implement client side load balancing between CDS clients and servers. However, I kept encountering persistent connection issues with Elastic Load Balance (ELB).
    @@ -526,7 +523,7 @@

    - + @@ -548,9 +545,15 @@

    @@ -568,12 +571,11 @@

    - +

    - Grab You Some Post-Mortem Reports + Why Test the Design with Only 5 Users

    -
    Grab adopts a Service-Oriented Architecture (SOA) to rapidly develop and deploy new feature services. One of the drawbacks of such a design is that team members find it hard to help with debugging production issues that inevitably arise in services belonging to other stakeholders. -
    +
    The reasoning behind small sample sizes in qualitative usability research.
    @@ -583,7 +585,7 @@

    - + @@ -605,7 +607,9 @@

    diff --git a/blog/23/index.html b/blog/23/index.html index ca045c8d..01752901 100644 --- a/blog/23/index.html +++ b/blog/23/index.html @@ -148,6 +148,119 @@ +
  • +
    + +
    +
    +
    + + + +

    + Programmers Beware - UX is Not Just for Designers +

    +
    Perhaps one of the biggest missed opportunities in Tech in recent history is UX. +Somehow, UX became the domain of Product Designers and User Interface Designers. +While they definitely are the right people to be thinking about web pages, mobile app screens and so on, we've missed a huge part of what we engineers work on every day: SDKs and APIs.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    +
    +
    + + + +

    + Grab You Some Post-Mortem Reports +

    +
    Grab adopts a Service-Oriented Architecture (SOA) to rapidly develop and deploy new feature services. One of the drawbacks of such a design is that team members find it hard to help with debugging production issues that inevitably arise in services belonging to other stakeholders. +
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • diff --git a/blog/3/index.html b/blog/3/index.html index 95ed6a61..c3002558 100644 --- a/blog/3/index.html +++ b/blog/3/index.html @@ -148,6 +148,164 @@ +
  • +
    + +
    + + Managing dynamic marketplace content at scale: Grab's approach to content moderation cover photo + +
    + +
    +
    +
    + + + +

    + Managing dynamic marketplace content at scale: Grab's approach to content moderation +

    +
    Understand how Grab employs a combination of automated and manual content moderation to manage its dynamic marketplace content efficiently, while also collaborating with Google to ensure marketplace safety.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + Rethinking Stream Processing: Data Exploration cover photo + +
    + +
    +
    +
    + + + +

    + Rethinking Stream Processing: Data Exploration +

    +
    As Grab matures along the digitalisation journey, it is collecting and streaming event data generated from the end users of its superapp on a larger magnitude than before. Coban, Grab’s data-streaming platform team, is looking to help unlock the value of streaming data at an earlier stage of the data journey before this data is typically stored in a central location (“Data Lake”). This allows Grab to serve its superapp users more efficiently.
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -695,152 +853,6 @@

  • -
  • -
    - -
    - - Scaling marketing for merchants with targeted and intelligent promos cover photo - -
    - -
    -
    -
    - - - - - · - - -

    - Scaling marketing for merchants with targeted and intelligent promos -

    -
    Apart from ensuring advertisements reach the right audience, it is also important to make promos by merchants more targeted and intelligent to help scale their marketing. With Grab’s innovative AI tool, merchants can boost sales while cutting costs. Dive into this game-changing tool that’s reshaping the future of marketing and find out how the Data Science team at Grab used automation and made promo assignments a more seamless and intelligent process.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - Stepping up marketing for advertisers: Scalable lookalike audience cover photo - -
    - -
    -
    -
    - - - - - · - - -

    - Stepping up marketing for advertisers: Scalable lookalike audience -

    -
    A key challenge in advertising is reaching the right audience who are most likely to use your product. Read this article to find out how the Data Science team improved advertising effectiveness by using lookalike audiences to identify individuals who share similar characteristics with an existing consumer base.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/4/index.html b/blog/4/index.html index 3ea6e917..94812227 100644 --- a/blog/4/index.html +++ b/blog/4/index.html @@ -148,6 +148,152 @@ +
  • +
    + +
    + + Scaling marketing for merchants with targeted and intelligent promos cover photo + +
    + +
    +
    +
    + + + + + · + + +

    + Scaling marketing for merchants with targeted and intelligent promos +

    +
    Apart from ensuring advertisements reach the right audience, it is also important to make promos by merchants more targeted and intelligent to help scale their marketing. With Grab’s innovative AI tool, merchants can boost sales while cutting costs. Dive into this game-changing tool that’s reshaping the future of marketing and find out how the Data Science team at Grab used automation and made promo assignments a more seamless and intelligent process.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + Stepping up marketing for advertisers: Scalable lookalike audience cover photo + +
    + +
    +
    +
    + + + + + · + + +

    + Stepping up marketing for advertisers: Scalable lookalike audience +

    +
    A key challenge in advertising is reaching the right audience who are most likely to use your product. Read this article to find out how the Data Science team improved advertising effectiveness by using lookalike audiences to identify individuals who share similar characteristics with an existing consumer base.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -607,171 +753,6 @@

  • -
  • -
    - -
    - - Performance bottlenecks of Go application on Kubernetes with non-integer (floating) CPU allocation cover photo - -
    - -
    -
    -
    - - - -

    - Performance bottlenecks of Go application on Kubernetes with non-integer (floating) CPU allocation -

    -
    At Grab, we have been running our Go based stream processing framework (SPF) on Kubernetes for several years. But as the number of SPF pipelines increases, we noticed some performance bottlenecks and other issues. Read to find out how this issue came about and how the Coban team resolved it with non-integer CPU allocation.
    -
    -
    - -
    -
    - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - How we improved our iOS CI infrastructure with observability tools cover photo - -
    - -
    -
    -
    - - - -

    - How we improved our iOS CI infrastructure with observability tools -

    -
    After upgrading to Xcode 13.1, we noticed a few issues such as instability of the CI tests and high CPU utilisation. Read to find out how the Test Automation - Mobile team investigated these issues and resolved them by integrating observability tools into our iOS CI development process.
    -
    -
    - -
    -
    - - - - - - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/5/index.html b/blog/5/index.html index 610cdb67..c155d947 100644 --- a/blog/5/index.html +++ b/blog/5/index.html @@ -148,6 +148,171 @@ +
  • +
    + +
    + + Performance bottlenecks of Go application on Kubernetes with non-integer (floating) CPU allocation cover photo + +
    + +
    +
    +
    + + + +

    + Performance bottlenecks of Go application on Kubernetes with non-integer (floating) CPU allocation +

    +
    At Grab, we have been running our Go based stream processing framework (SPF) on Kubernetes for several years. But as the number of SPF pipelines increases, we noticed some performance bottlenecks and other issues. Read to find out how this issue came about and how the Coban team resolved it with non-integer CPU allocation.
    +
    +
    + +
    +
    + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + How we improved our iOS CI infrastructure with observability tools cover photo + +
    + +
    +
    +
    + + + +

    + How we improved our iOS CI infrastructure with observability tools +

    +
    After upgrading to Xcode 13.1, we noticed a few issues such as instability of the CI tests and high CPU utilisation. Read to find out how the Test Automation - Mobile team investigated these issues and resolved them by integrating observability tools into our iOS CI development process.
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -634,146 +799,6 @@

  • -
  • -
    - -
    - - Securing GitOps pipelines cover photo - -
    - -
    -
    -
    - - - -

    - Securing GitOps pipelines -

    -
    This article illustrates how Grab’s real-time data platform team secured GitOps pipelines at scale with our in-house GitOps implementation.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    - -
    -
    -
  • - -
  • -
    - -
    - - New zoom freezing feature for Geohash plugin cover photo - -
    - -
    -
    -
    - - - - - · - - -

    - New zoom freezing feature for Geohash plugin -

    -
    Built by Grab, the Geohash Java OpenStreetMap Editor (JOSM) plugin is widely used in map-making, but a common pain point is the inability to zoom in to a specific region without displaying new geohashes. Read to find out more about the issue and how the latest update addresses it.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/6/index.html b/blog/6/index.html index 48bb236a..223f2368 100644 --- a/blog/6/index.html +++ b/blog/6/index.html @@ -148,6 +148,146 @@ +
  • +
    + +
    + + Securing GitOps pipelines cover photo + +
    + +
    +
    +
    + + + +

    + Securing GitOps pipelines +

    +
    This article illustrates how Grab’s real-time data platform team secured GitOps pipelines at scale with our in-house GitOps implementation.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    + +
    +
    +
  • + +
  • +
    + +
    + + New zoom freezing feature for Geohash plugin cover photo + +
    + +
    +
    +
    + + + + + · + + +

    + New zoom freezing feature for Geohash plugin +

    +
    Built by Grab, the Geohash Java OpenStreetMap Editor (JOSM) plugin is widely used in map-making, but a common pain point is the inability to zoom in to a specific region without displaying new geohashes. Read to find out more about the issue and how the latest update addresses it.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -832,198 +972,6 @@

  • -
  • -
    - -
    - - Automatic rule backtesting with large quantities of data cover photo - -
    - -
    -
    -
    - - - - - · - - -

    - Automatic rule backtesting with large quantities of data -

    -
    At Grab, real-time fraud detection is built on a rule engine. As data scientists and analysts, we need to analyse and simulate a rule on historical data to check the performance and accuracy of the rule. Backtesting, also known as Replay, enables analysts to run simulations of either newly-invented rules, or evaluate the performance of existing rules using past events ranging from days to months, and significantly improve rule creation efficiency.
    -
    -
    - -
    -
    - - - - - - - - - - - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - How we store and process millions of orders daily cover photo - -
    - -
    -
    -
    - - - - - · - - -

    - How we store and process millions of orders daily -

    -
    The Grab Order Platform is a distributed system that processes millions of GrabFood or GrabMart orders every day. Learn about how the Grab order platform stores food order data to serve transactional (OLTP) and analytical (OLAP) queries.
    -
    -
    - -
    -
    - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/7/index.html b/blog/7/index.html index 921ff9fa..8cd3d07e 100644 --- a/blog/7/index.html +++ b/blog/7/index.html @@ -148,6 +148,198 @@ +
  • +
    + +
    + + Automatic rule backtesting with large quantities of data cover photo + +
    + +
    +
    +
    + + + + + · + + +

    + Automatic rule backtesting with large quantities of data +

    +
    At Grab, real-time fraud detection is built on a rule engine. As data scientists and analysts, we need to analyse and simulate a rule on historical data to check the performance and accuracy of the rule. Backtesting, also known as Replay, enables analysts to run simulations of either newly-invented rules, or evaluate the performance of existing rules using past events ranging from days to months, and significantly improve rule creation efficiency.
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • + +
  • +
    + +
    + + How we store and process millions of orders daily cover photo + +
    + +
    +
    +
    + + + + + · + + +

    + How we store and process millions of orders daily +

    +
    The Grab Order Platform is a distributed system that processes millions of GrabFood or GrabMart orders every day. Learn about how the Grab order platform stores food order data to serve transactional (OLTP) and analytical (OLAP) queries.
    +
    +
    + +
    +
    + + + + + + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -712,142 +904,6 @@

  • -
  • -
    - -
    - - Embracing a Docs-as-Code approach cover photo - -
    - -
    -
    -
    - - - -

    - Embracing a Docs-as-Code approach -

    -
    Read to find out how Grab is using the Docs-as-Code approach to improve technical documentation.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    - -
    -
    -
  • - -
  • -
    - -
    - - Graph Networks - Striking fraud syndicates in the dark cover photo - -
    - -
    -
    -
    - - - - - · - - -

    - Graph Networks - Striking fraud syndicates in the dark -

    -
    As fraudulent entities evolve and get smarter, Grab needs to continuously enhance our defences to protect our consumers. Read to find out how Graph Networks help the Integrity team advance fraud detection at Grab.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/8/index.html b/blog/8/index.html index 39a6414c..bf168ebb 100644 --- a/blog/8/index.html +++ b/blog/8/index.html @@ -148,6 +148,142 @@ +
  • +
    + +
    + + Embracing a Docs-as-Code approach cover photo + +
    + +
    +
    +
    + + + +

    + Embracing a Docs-as-Code approach +

    +
    Read to find out how Grab is using the Docs-as-Code approach to improve technical documentation.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    + +
    +
    +
  • + +
  • +
    + +
    + + Graph Networks - Striking fraud syndicates in the dark cover photo + +
    + +
    +
    +
    + + + + + · + + +

    + Graph Networks - Striking fraud syndicates in the dark +

    +
    As fraudulent entities evolve and get smarter, Grab needs to continuously enhance our defences to protect our consumers. Read to find out how Graph Networks help the Integrity team advance fraud detection at Grab.
    +
    +
    + +
    +
    + + + + + + + + +
    +
    +
    + + + + +
    +
    +
    +
  • +
  • @@ -647,212 +783,6 @@

  • -
  • -
    - -
    - - Exposing a Kafka Cluster via a VPC Endpoint Service cover photo - -
    - -
    -
    -
    - - - -

    - Exposing a Kafka Cluster via a VPC Endpoint Service -

    -
    Establishing communications between cloud resources that are hosted on different Virtual Private Clouds (VPC) can be complex and costly. Find out how the Coban team used a VPC Endpoint Service to expose an Apache Kafka cluster across multiple Availability Zones to a different VPC.
    -
    -
    - -
    -
    - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - -
  • -
    - -
    - - How Grab built a scalable, high-performance ad server cover photo - -
    - -
    -
    -
    - - - -

    - How Grab built a scalable, high-performance ad server -

    -
    Like many businesses, Grab leverages ads to create awareness and increase engagement with our consumers. Read to find out how the GrabAds team built an ad server that could scale according to our growing consumer base.
    -
    -
    - -
    -
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -
    -
    - - - - -
    -
    -
    -
  • - diff --git a/blog/9/index.html b/blog/9/index.html index 1eece4c4..26653748 100644 --- a/blog/9/index.html +++ b/blog/9/index.html @@ -152,8 +152,8 @@
    @@ -162,11 +162,11 @@
    - +

    - Biometric authentication - Why do we need it? + Exposing a Kafka Cluster via a VPC Endpoint Service

    -
    As cyberattacks get more advanced, authentication methods like one-time passwords (OTPs) and personal identification numbers (PINs) are no longer enough to protect your users. Find out how biometric authentication can help enhance security.
    +
    Establishing communications between cloud resources that are hosted on different Virtual Private Clouds (VPC) can be complex and costly. Find out how the Coban team used a VPC Endpoint Service to expose an Apache Kafka cluster across multiple Availability Zones to a different VPC.
    @@ -176,12 +176,7 @@

    - - - - - - + @@ -211,9 +198,11 @@

    @@ -227,8 +216,8 @@

    @@ -237,11 +226,11 @@

    - +

    - Using real-world patterns to improve matching in theory and practice + How Grab built a scalable, high-performance ad server

    -
    Find out how real-world patterns can be used to improve algorithm performance when performing bipartite matching for passengers and driver-partners.
    +
    Like many businesses, Grab leverages ads to create awareness and increase engagement with our consumers. Read to find out how the GrabAds team built an ad server that could scale according to our growing consumer base.

    @@ -251,12 +240,37 @@

    - + - + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -286,9 +340,11 @@

    @@ -302,8 +358,8 @@

    @@ -312,11 +368,11 @@

    - +

    - Designing products and services based on Jobs to be Done + Biometric authentication - Why do we need it?

    -
    In this post, we explain how the Jobs to be Done (JTBD) framework helps uncover the JTBD for consumers, as well as how we uncovered the core needs of GrabFood consumers and aligned the product with these needs.
    +
    As cyberattacks get more advanced, authentication methods like one-time passwords (OTPs) and personal identification numbers (PINs) are no longer enough to protect your users. Find out how biometric authentication can help enhance security.

    @@ -326,17 +382,12 @@

    - - - - - - + - + @@ -374,13 +417,9 @@

    @@ -394,8 +433,8 @@

    @@ -404,11 +443,11 @@

    - +

    - Search indexing optimisation + Using real-world patterns to improve matching in theory and practice

    -
    Learn about the different optimisation techniques when building a search index.
    +
    Find out how real-world patterns can be used to improve algorithm performance when performing bipartite matching for passengers and driver-partners.

    @@ -418,12 +457,12 @@

    - + - + @@ -453,13 +492,9 @@

    @@ -473,8 +508,8 @@

    @@ -483,11 +518,11 @@

    - +

    - Automating Multi-Armed Bandit testing during feature rollout + Designing products and services based on Jobs to be Done

    -
    Find out how you can run an automated test and simultaneously roll out a new feature.
    +
    In this post, we explain how the Jobs to be Done (JTBD) framework helps uncover the JTBD for consumers, as well as how we uncovered the core needs of GrabFood consumers and aligned the product with these needs.

    @@ -497,22 +532,17 @@

    - - - - - - + - + - + @@ -558,11 +580,13 @@

    @@ -576,8 +600,8 @@

    @@ -586,11 +610,11 @@

    - +

    - How We Cut GrabFood.com’s Page JavaScript Asset Sizes by 3x + Search indexing optimisation

    -
    Find out how the GrabFood team cut their bundle size by 3 times with these 7 webpack bundle optimisation strategies.
    +
    Learn about the different optimisation techniques when building a search index.

    @@ -600,7 +624,12 @@

    - + + + + + + @@ -622,13 +659,13 @@

    @@ -642,8 +679,8 @@

    @@ -654,9 +691,9 @@

    - Protecting Personal Data in Grab's Imagery + Automating Multi-Armed Bandit testing during feature rollout

    -
    Learn how Grab improves privacy protection to cater to various geographical locations.
    +
    Find out how you can run an automated test and simultaneously roll out a new feature.
    @@ -666,22 +703,22 @@

    - + - + - + - + @@ -727,15 +764,11 @@

    @@ -749,8 +782,8 @@

    @@ -759,11 +792,11 @@

    - +

    - Processing ETL tasks with Ratchet + How We Cut GrabFood.com’s Page JavaScript Asset Sizes by 3x

    -
    Read about what Data and ETL pipelines are and how they are used for processing multiple tasks in the Lending Team at Grab.
    +
    Find out how the GrabFood team cut their bundle size by 3 times with these 7 webpack bundle optimisation strategies.

    @@ -773,7 +806,7 @@

    - + @@ -795,13 +828,13 @@

    diff --git a/categories/data-science/index.html b/categories/data-science/index.html index 2709b945..08ab3e78 100644 --- a/categories/data-science/index.html +++ b/categories/data-science/index.html @@ -146,6 +146,134 @@

    Data Science

      +
    • +
      + +
      + + Evolution of Catwalk: Model serving platform at Grab cover photo + +
      + +
      +
      +
      + + + + + + · + + + +

      + Evolution of Catwalk: Model serving platform at Grab +

      +
      Read about the evolution of Catwalk, Grab's model serving platform, from its inception to its current state. Discover how it has evolved to meet the needs of Grab's growing machine learning model serving requirements.
      +
      +
      + +
      +
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      +
      +
      + + + + +
      +
      +
      +
    • +
    • diff --git a/categories/engineering/index.html b/categories/engineering/index.html index 80b27d27..aeb924f8 100644 --- a/categories/engineering/index.html +++ b/categories/engineering/index.html @@ -146,6 +146,268 @@

      Engineering

        +
      • +
        + +
        + + Evolution of Catwalk: Model serving platform at Grab cover photo + +
        + +
        +
        +
        + + + + + + · + + + +

        + Evolution of Catwalk: Model serving platform at Grab +

        +
        Read about the evolution of Catwalk, Grab's model serving platform, from its inception to its current state. Discover how it has evolved to meet the needs of Grab's growing machine learning model serving requirements.
        +
        +
        + +
        +
        + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
        +
        +
        + + + + +
        +
        +
        +
      • + +
      • +
        + +
        + + Enabling conversational data discovery with LLMs at Grab cover photo + +
        + +
        +
        +
        + + + + +

        + Enabling conversational data discovery with LLMs at Grab +

        +
        Discover how Grab is revolutionising data discovery with the power of AI and LLMs. Dive into our journey as we overcome challenges, build groundbreaking tools like HubbleIQ, and transform the way our employees find and access data. Get ready to be inspired by our innovative approach and learn how you can harness the potential of AI to unlock the full value of your organisation's data.
        +
        +
        + +
        +
        + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
        +
        +
        + + + + +
        +
        +
        +
      • +
      • diff --git a/catwalk-evolution.html b/catwalk-evolution.html new file mode 100644 index 00000000..300a943d --- /dev/null +++ b/catwalk-evolution.html @@ -0,0 +1,575 @@ + + + + + Evolution of Catwalk: Model serving platform at Grab + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
        + +
        +
        +
        + Evolution of Catwalk: Model serving platform at Grab cover photo + + + + + + · + + + + +

        Evolution of Catwalk: Model serving platform at Grab

        + + + +
        +
        +
        +

        Introduction

        + +

        As Southeast Asia’s leading super app, Grab serves millions of users across multiple countries every day. Our services range from ride-hailing and food delivery to digital payments and much more. The backbone of our operations? Machine Learning (ML) models. They power our real-time decision-making capabilities, enabling us to provide a seamless and personalised experience to our users. Whether it’s determining the most efficient route for a ride, suggesting a food outlet based on a user’s preference, or detecting fraudulent transactions, ML models are at the forefront.

        + +

        However, serving these ML models at Grab’s scale is no small feat. It requires a robust, efficient, and scalable model serving platform, which is where our ML model serving platform, Catwalk, comes in.

        + +

        Catwalk has evolved over time, adapting to the growing needs of our business and the ever-changing tech landscape. It has been a journey of continuous learning and improvement, with each step bringing new challenges and opportunities.

        + +

        Evolution of the platform

        + +

        Phase 0: The need for a model serving platform

        + +

        Before Catwalk’s debut as our dedicated model serving platform, data scientists across the company employed various ad-hoc approaches to serve ML models. These included:

        + +
          +
        • Shipping models online using custom solutions.
        • +
        • Relying on backend engineering teams to deploy and manage trained ML models.
        • +
        • Embedding ML logic within Go backend services.
        • +
        + +

        These methods, however, led to several challenges, undercovering the need for a unified, company-wide platform for serving machine learning models:

        + +
          +
        • Operational overhead: Data scientists often lacked the necessary expertise to handle the operational aspects of their models, leading to service outages.
        • +
        • Resource wastage: There was frequently low resource utilisation (e.g., 1%) for data science services, leading to inefficient use of resources.
        • +
        • Friction with engineering teams: Differences in release cycles and unclear ownership when code was embedded into backend systems resulted in tension between data scientists and engineers.
        • +
        • Reinventing the wheel: Multiple teams independently attempted to solve model serving problems, leading to a duplication of effort.
        • +
        + +

        ​​These challenges highlighted the need for a company-wide, centralised platform for serving machine learning models.

        + +

        Phase 1: No-code, managed platform for TensorFlow Serving models

        + +

        Our initial foray into model serving was centred around creating a managed platform for deploying TensorFlow Serving models. The process involved data scientists submitting their models to the platform’s engineering admin, who could then deploy the model with an endpoint. Infrastructure and networking were managed using Amazon Elastic Kubernetes Service (EKS) and Helm Charts as illustrated below.

        + +
        + +
        +
        + +

        This phase of our platform, which we also detailed in our previous article, was beneficial for some users. However, we quickly encountered scalability challenges:

        + +
          +
        • Codebase maintenance: Applying changes to every TensorFlow Serving (TFS) version was cumbersome and difficult to maintain.
        • +
        • Limited scalability: The fully managed nature of the platform made it difficult to scale.
        • +
        • Admin bottleneck: The engineering admin’s limited bandwidth became a bottleneck for onboarding new models.
        • +
        • Limited serving types: The platform only supported TensorFlow, limiting its usefulness for data scientists using other frameworks like LightGBM, XGBoost, or PyTorch.
        • +
        + +

        After a year of operation, only eight models were onboarded to the platform, highlighting the need for a more scalable and flexible solution.

        + +

        Phase 2: From models to model serving applications

        + +

        To address the limitations of Phase 1, we transitioned from deploying individual models to self-contained model serving applications. This “low-code, self-serving” strategy introduced several new components and changes as illustrated in the points and diagram below:

        + +
          +
        • Support for multiple serving types: Users gained the ability to deploy models trained with a variety of frameworks like Open Neural Network Exchange (ONNX), PyTorch, and TensorFlow.
        • +
        • Self-served platform through CI/CD pipelines: Data scientists could self-serve and independently manage their model serving applications through CI/CD pipelines.
        • +
        • New components: We introduced these new components to support the self-serving approach: +
            +
          • Catwalk proxy, a managed reverse proxy to various serving types.
          • +
          • Catwalk transformer, a low-code component to transform input and output data.
          • +
          • Amphawa, a feature fetching component to augment model inputs.
          • +
          +
        • +
        + +
        + +
        +
        + +

        API request flow

        + +

        The Catwalk proxy acts as the orchestration layer. Clients send requests to Catwalk proxy then it orchestrates calls to different components like transformers, feature-store, and so on. A typical end to end request flow is illustrated below.

        + +
        + +
        +
        + +

        Within a year of implementing these changes, the number of models on the platform increased from 8 to 300, demonstrating the success of this approach. However, new challenges emerged:

        + +
          +
        • Complexity of maintaining Helm chart: As the platform continued to grow with new components and functionalities, maintaining the Helm chart became increasingly complex. The readability and flow control became more challenging, making the helm chart updating process prone to errors.
        • +
        • Process-level mistakes: The self-serving approach led to errors such as pushing empty or incompatible models to production, setting too few replicas, or allocating insufficient resources, which resulted in service crashes.
        • +
        + +

        We knew that our work was nowhere near done. We had to keep iterating and explore ways to address the new challenges.

        + +

        Phase 3: Replacing Helm charts with Kubernetes CRDs

        + +

        To tackle the deployment challenges and gain more control, we made the significant decision to replace Helm charts with Kubernetes Custom Resource Definitions (CRDs). This required substantial engineering effort, but the outcomes have been rewarding. This transition gave us improved control over deployment pipelines, enabling customisations such as:

        + +
          +
        • Smart defaults for AutoML
        • +
        • Blue-green deployments
        • +
        • Capacity management
        • +
        • Advanced scaling
        • +
        • Application set groupings
        • +
        + +

        Below is an example of a simple model serving CRD manifest:

        + +
        apiVersion: ml.catwalk.kubebuilder.io/v1
        +kind: ModelServing
        +spec:
        +  hpa:
        +    desired: 1
        +    max: 1
        +    min: 1
        +  modelMeta:
        +    modelName: "my-model"
        +    modelOwner: john.doe
        +  proxyLayer:
        +    enableLogging: true
        +    logHTTPBody: true
        +  servingLayer:
        +    servingType: "tensorflow-serving"
        +    version: "20"
        +
        + +

        Model serving CRD deployment state machine

        + +

        Every model serving CRD submission follows a sequence of steps. If there are failures at any step, the controller keeps retrying after small intervals. The major steps on the deployment cycle are described below:

        + +
          +
        1. Validate whether the new CRD specs are acceptable. Along with sanity checks, we also enforce a lot of platform constraints through this step.
        2. +
        3. Clean up previous non-ready deployment resources. Sometimes a deployment submission might keep crashing and hence it doesn’t proceed to a ready state. On every submission, it’s important to check and clean up such previous deployments.
        4. +
        5. Create resources for the new deployment and ensure that the new deployment is ready.
        6. +
        7. Switch traffic from old deployment to the new deployment.
        8. +
        9. Clean up resources for old deployment. At this point, traffic is already being served by the new deployment resources. So, we can clean up the old deployment.
        10. +
        + +
        + +
        +
        + +

        Phase 4: Transition to a high-code, self-served, process-managed platform

        + +

        As the number of model serving applications and use cases multiplied, clients sought greater control over orchestrations between different models, experiment executions, traffic shadowing, and responses archiving. To cater to these needs, we introduced several changes and components with the Catwalk Orchestrator, a high code orchestration solution, leading the pack.

        + +

        Catwalk orchestrator

        + +

        The Catwalk Orchestrator is a highly abstracted framework for building ML applications that replaced the catwalk-proxy from previous phases. The key difference is that users can now write their own business/orchestration logic. The orchestrator offers a range of utilities, reducing the need for users to write extensive boilerplate code. Key components of the Catwalk Orchestrator include HTTP server, gRPC server, clients for different model serving flavours (TensorFlow, ONNX, PyTorch, etc), client for fetching features from the feature bank, and utilities for logging, metrics, and data lake ingestion.

        + +

        The Catwalk Orchestrator is designed to streamline the deployment of machine learning models. Here’s a typical user journey:

        + +
          +
        1. Scaffold a model serving application: Users begin by scaffolding a model serving application using a command-line tool.
        2. +
        3. Write business logic: Users then write the business logic for the application.
        4. +
        5. Deploy to staging: The application is then deployed to a staging environment for testing.
        6. +
        7. Complete load testing: Users test the application in the staging environment and complete load testing to ensure it can handle the expected traffic.
        8. +
        9. Deploy to production: Once testing is completed, the application is deployed to the production environment.
        10. +
        + +

        Bundled deployments

        + +

        To support multiple ML models as part of a single model serving application, we introduced the concept of bundled deployments. Multiple Kubernetes deployments are bundled together as a single model serving application deployment, allowing each component (e.g., models, catwalk-orchestrator, etc) to have its own Kubernetes deployment and to scale independently.

        + +
        + +
        +
        + +

        In addition to the major developments, we implemented other changes to enhance our platform’s efficiency. We made load testing mandatory for all ML application updates to ensure robust performance. This testing process was streamlined with a single command that runs the load test in the staging environment, with the results directly shared with the user.

        + +

        Furthermore, we boosted deployment transparency by sharing deployment details through Slack and Datadog. This empowered users to diagnose issues independently, reducing the dependency on on-call support. This transparency not only improved our issue resolution times but also enhanced user confidence in our platform.

        + +

        The results of these changes speak for themselves. The Catwalk Orchestrator has evolved into our flagship product. In just two years, we have deployed 200 Catwalk Orchestrators serving approximately 1,400 ML models.

        + +

        What’s next?

        + +

        As we continue to innovate and enhance our model serving platform, we are venturing into new territories:

        + +
          +
        • Catwalk serverless: We aim to further abstract the model serving experience, making it even more user-friendly and efficient.
        • +
        • Catwalk data serving: We are looking to extend Catwalk’s capabilities to serve data online, providing a more comprehensive service.
        • +
        • LLM serving: In line with the trend towards generative AI and large language models (LLMs), we’re pivoting Catwalk to support these developments, ensuring we stay at the forefront of the AI and machine learning field.
        • +
        + +

        Stay tuned as we continue to advance our technology and bring these exciting developments to life.

        + +

        Join us

        + +

        Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 700 cities in eight countries.

        + +

        Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

        + +
        +
        + + + + +
        +
        + + + +
        + +
        + + +
        +
        +
        + +
        +
        + + + + +
        +
        +
        +
        +
        + + + +
        + + +
        +
        +
        +
        + +

        + Want to join us in our mission to revolutionize transportation? +

        + View open positions + +
        +
        + + + + + + + + + + + diff --git a/feed.json b/feed.json index 093ece71..88bf275c 100644 --- a/feed.json +++ b/feed.json @@ -9,6 +9,72 @@ "items": [ + { + "id": "https://engineering.grab.com/catwalk-evolution", + "url": "https://engineering.grab.com/catwalk-evolution", + "title": "Evolution of Catwalk: Model serving platform at Grab", + "date_published": "Tue, 01 Oct 2024 00:00:50 +0000", + "authors": [ + + + "Vishal Sharma" + , + + + "Wen Bo Wei" + , + + + "Siddharth Pandey" + , + + + "Daniel Tai" + , + + + "Bjorn Jee" + + + ] + } + , + + { + "id": "https://engineering.grab.com/hubble-data-discovery", + "url": "https://engineering.grab.com/hubble-data-discovery", + "title": "Enabling conversational data discovery with LLMs at Grab", + "date_published": "Thu, 26 Sep 2024 00:00:40 +0000", + "authors": [ + + + "Shreyas Parbat" + , + + + "Amanda Ng" + , + + + "Yucheng Zeng" + , + + + "Vinnson Lee" + , + + + "Feng Cheng" + , + + + "Varun Torka" + + + ] + } + , + { "id": "https://engineering.grab.com/live-activity-2", "url": "https://engineering.grab.com/live-activity-2", @@ -223,48 +289,6 @@ ] } - , - - { - "id": "https://engineering.grab.com/data-observability", - "url": "https://engineering.grab.com/data-observability", - "title": "Ensuring data reliability and observability in risk systems", - "date_published": "Tue, 23 Apr 2024 00:15:10 +0000", - "authors": [ - - - "Yi Ni Ong" - , - - - "Kamesh Chandran" - , - - - "Jia Long Loh" - - - ] - } - , - - { - "id": "https://engineering.grab.com/grabx-decision-engine", - "url": "https://engineering.grab.com/grabx-decision-engine", - "title": "Grab Experiment Decision Engine - a Unified Toolkit for Experimentation", - "date_published": "Tue, 09 Apr 2024 02:22:10 +0000", - "authors": [ - - - "Ruike Zhang" - , - - - "Panos Mavrokonstantis" - - - ] - } ] diff --git a/feed.xml b/feed.xml index bf306601..859c0f05 100644 --- a/feed.xml +++ b/feed.xml @@ -6,10 +6,388 @@ https://engineering.grab.com/ - Tue, 01 Oct 2024 07:05:05 +0000 - Tue, 01 Oct 2024 07:05:05 +0000 + Tue, 01 Oct 2024 08:32:32 +0000 + Tue, 01 Oct 2024 08:32:32 +0000 Jekyll v4.2.0 + + Evolution of Catwalk: Model serving platform at Grab + <h1 id="introduction">Introduction</h1> + +<p>As Southeast Asia’s leading super app, Grab serves millions of users across multiple countries every day. Our services range from ride-hailing and food delivery to digital payments and much more. The backbone of our operations? Machine Learning (ML) models. They power our real-time decision-making capabilities, enabling us to provide a seamless and personalised experience to our users. Whether it’s determining the most efficient route for a ride, suggesting a food outlet based on a user’s preference, or detecting fraudulent transactions, ML models are at the forefront.</p> + +<p>However, serving these ML models at Grab’s scale is no small feat. It requires a robust, efficient, and scalable model serving platform, which is where our ML model serving platform, Catwalk, comes in.</p> + +<p>Catwalk has evolved over time, adapting to the growing needs of our business and the ever-changing tech landscape. It has been a journey of continuous learning and improvement, with each step bringing new challenges and opportunities.</p> + +<h1 id="evolution-of-the-platform">Evolution of the platform</h1> + +<h2 id="phase-0-the-need-for-a-model-serving-platform">Phase 0: The need for a model serving platform</h2> + +<p>Before Catwalk’s debut as our dedicated model serving platform, data scientists across the company employed various ad-hoc approaches to serve ML models. These included:</p> + +<ul> + <li>Shipping models online using custom solutions.</li> + <li>Relying on backend engineering teams to deploy and manage trained ML models.</li> + <li>Embedding ML logic within Go backend services.</li> +</ul> + +<p>These methods, however, led to several challenges, undercovering the need for a unified, company-wide platform for serving machine learning models:</p> + +<ul> + <li><strong>Operational overhead</strong>: Data scientists often lacked the necessary expertise to handle the operational aspects of their models, leading to service outages.</li> + <li><strong>Resource wastage</strong>: There was frequently low resource utilisation (e.g., 1%) for data science services, leading to inefficient use of resources.</li> + <li><strong>Friction with engineering teams</strong>: Differences in release cycles and unclear ownership when code was embedded into backend systems resulted in tension between data scientists and engineers.</li> + <li><strong>Reinventing the wheel</strong>: Multiple teams independently attempted to solve model serving problems, leading to a duplication of effort.</li> +</ul> + +<p>​​These challenges highlighted the need for a company-wide, centralised platform for serving machine learning models.</p> + +<h2 id="phase-1-no-code-managed-platform-for-tensorflow-serving-models">Phase 1: No-code, managed platform for TensorFlow Serving models</h2> + +<p>Our initial foray into model serving was centred around creating a managed platform for deploying TensorFlow Serving models. The process involved data scientists submitting their models to the platform’s engineering admin, who could then deploy the model with an endpoint. Infrastructure and networking were managed using Amazon Elastic Kubernetes Service (EKS) and Helm Charts as illustrated below.</p> + +<div class="post-image-section"><figure> + <img src="/img/catwalk-evolution/phase1.png" alt="" style="width:80%" /> + </figure> +</div> + +<p>This phase of our platform, which we also detailed in our <a href="https://engineering.grab.com/catwalk-serving-machine-learning-models-at-scale">previous article</a>, was beneficial for some users. However, we quickly encountered scalability challenges:</p> + +<ul> + <li><strong>Codebase maintenance</strong>: Applying changes to every TensorFlow Serving (TFS) version was cumbersome and difficult to maintain.</li> + <li><strong>Limited scalability</strong>: The fully managed nature of the platform made it difficult to scale.</li> + <li><strong>Admin bottleneck</strong>: The engineering admin’s limited bandwidth became a bottleneck for onboarding new models.</li> + <li><strong>Limited serving types</strong>: The platform only supported TensorFlow, limiting its usefulness for data scientists using other frameworks like LightGBM, XGBoost, or PyTorch.</li> +</ul> + +<p>After a year of operation, only eight models were onboarded to the platform, highlighting the need for a more scalable and flexible solution.</p> + +<h2 id="phase-2-from-models-to-model-serving-applications">Phase 2: From models to model serving applications</h2> + +<p>To address the limitations of Phase 1, we transitioned from deploying individual models to self-contained model serving applications. This “low-code, self-serving” strategy introduced several new components and changes as illustrated in the points and diagram below:</p> + +<ul> + <li><strong>Support for multiple serving types</strong>: Users gained the ability to deploy models trained with a variety of frameworks like Open Neural Network Exchange (ONNX), PyTorch, and TensorFlow.</li> + <li><strong>Self-served platform through CI/CD pipelines</strong>: Data scientists could self-serve and independently manage their model serving applications through CI/CD pipelines.</li> + <li><strong>New components</strong>: We introduced these new components to support the self-serving approach: + <ul> + <li><strong>Catwalk proxy</strong>, a managed reverse proxy to various serving types.</li> + <li><strong>Catwalk transformer</strong>, a low-code component to transform input and output data.</li> + <li><strong>Amphawa</strong>, a feature fetching component to augment model inputs.</li> + </ul> + </li> +</ul> + +<div class="post-image-section"><figure> + <img src="/img/catwalk-evolution/phase2.png" alt="" style="width:80%" /> + </figure> +</div> + +<h4 id="api-request-flow">API request flow</h4> + +<p>The Catwalk proxy acts as the orchestration layer. Clients send requests to Catwalk proxy then it orchestrates calls to different components like transformers, feature-store, and so on. A typical end to end request flow is illustrated below.</p> + +<div class="post-image-section"><figure> + <img src="/img/catwalk-evolution/phase2-api-request-flow.png" alt="" style="width:80%" /> + </figure> +</div> + +<p>Within a year of implementing these changes, the number of models on the platform increased from 8 to 300, demonstrating the success of this approach. However, new challenges emerged:</p> + +<ul> + <li><strong>Complexity of maintaining Helm chart</strong>: As the platform continued to grow with new components and functionalities, maintaining the Helm chart became increasingly complex. The readability and flow control became more challenging, making the helm chart updating process prone to errors.</li> + <li><strong>Process-level mistakes</strong>: The self-serving approach led to errors such as pushing empty or incompatible models to production, setting too few replicas, or allocating insufficient resources, which resulted in service crashes.</li> +</ul> + +<p>We knew that our work was nowhere near done. We had to keep iterating and explore ways to address the new challenges.</p> + +<h2 id="phase-3-replacing-helm-charts-with-kubernetes-crds">Phase 3: Replacing Helm charts with Kubernetes CRDs</h2> + +<p>To tackle the deployment challenges and gain more control, we made the significant decision to replace Helm charts with Kubernetes Custom Resource Definitions (CRDs). This required substantial engineering effort, but the outcomes have been rewarding. This transition gave us improved control over deployment pipelines, enabling customisations such as:</p> + +<ul> + <li>Smart defaults for AutoML</li> + <li>Blue-green deployments</li> + <li>Capacity management</li> + <li>Advanced scaling</li> + <li>Application set groupings</li> +</ul> + +<p>Below is an example of a simple model serving CRD manifest:</p> + +<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>apiVersion: ml.catwalk.kubebuilder.io/v1 +kind: ModelServing +spec: + hpa: + desired: 1 + max: 1 + min: 1 + modelMeta: + modelName: "my-model" + modelOwner: john.doe + proxyLayer: + enableLogging: true + logHTTPBody: true + servingLayer: + servingType: "tensorflow-serving" + version: "20" +</code></pre></div></div> + +<h4 id="model-serving-crd-deployment-state-machine">Model serving CRD deployment state machine</h4> + +<p>Every model serving CRD submission follows a sequence of steps. If there are failures at any step, the controller keeps retrying after small intervals. The major steps on the deployment cycle are described below:</p> + +<ol> + <li>Validate whether the new CRD specs are acceptable. Along with sanity checks, we also enforce a lot of platform constraints through this step.</li> + <li>Clean up previous non-ready deployment resources. Sometimes a deployment submission might keep crashing and hence it doesn’t proceed to a ready state. On every submission, it’s important to check and clean up such previous deployments.</li> + <li>Create resources for the new deployment and ensure that the new deployment is ready.</li> + <li>Switch traffic from old deployment to the new deployment.</li> + <li>Clean up resources for old deployment. At this point, traffic is already being served by the new deployment resources. So, we can clean up the old deployment.</li> +</ol> + +<div class="post-image-section"><figure> + <img src="/img/catwalk-evolution/phase3.png" alt="" style="width:80%" /> + </figure> +</div> + +<h2 id="phase-4-transition-to-a-high-code-self-served-process-managed-platform">Phase 4: Transition to a high-code, self-served, process-managed platform</h2> + +<p>As the number of model serving applications and use cases multiplied, clients sought greater control over orchestrations between different models, experiment executions, traffic shadowing, and responses archiving. To cater to these needs, we introduced several changes and components with the Catwalk Orchestrator, a high code orchestration solution, leading the pack.</p> + +<h4 id="catwalk-orchestrator">Catwalk orchestrator</h4> + +<p>The <strong>Catwalk Orchestrator</strong> is a highly abstracted framework for building ML applications that replaced the catwalk-proxy from previous phases. The key difference is that users can now write their own business/orchestration logic. The orchestrator offers a range of utilities, reducing the need for users to write extensive boilerplate code. Key components of the Catwalk Orchestrator include HTTP server, gRPC server, clients for different model serving flavours (TensorFlow, ONNX, PyTorch, etc), client for fetching features from the feature bank, and utilities for logging, metrics, and data lake ingestion.</p> + +<p>The Catwalk Orchestrator is designed to streamline the deployment of machine learning models. Here’s a typical user journey:</p> + +<ol> + <li><strong>Scaffold a model serving application</strong>: Users begin by scaffolding a model serving application using a command-line tool.</li> + <li><strong>Write business logic</strong>: Users then write the business logic for the application.</li> + <li><strong>Deploy to staging</strong>: The application is then deployed to a staging environment for testing.</li> + <li><strong>Complete load testing</strong>: Users test the application in the staging environment and complete load testing to ensure it can handle the expected traffic.</li> + <li><strong>Deploy to production</strong>: Once testing is completed, the application is deployed to the production environment.</li> +</ol> + +<h4 id="bundled-deployments">Bundled deployments</h4> + +<p>To support multiple ML models as part of a single model serving application, we introduced the concept of <strong>bundled deployments</strong>. Multiple Kubernetes deployments are bundled together as a single model serving application deployment, allowing each component (e.g., models, catwalk-orchestrator, etc) to have its own Kubernetes deployment and to scale independently.</p> + +<div class="post-image-section"><figure> + <img src="/img/catwalk-evolution/phase4.png" alt="" style="width:80%" /> + </figure> +</div> + +<p>In addition to the major developments, we implemented other changes to enhance our platform’s efficiency. We made <strong>load testing</strong> mandatory for all ML application updates to ensure robust performance. This testing process was streamlined with a single command that runs the load test in the staging environment, with the results directly shared with the user.</p> + +<p>Furthermore, we boosted <strong>deployment transparency</strong> by sharing deployment details through Slack and Datadog. This empowered users to diagnose issues independently, reducing the dependency on on-call support. This transparency not only improved our issue resolution times but also enhanced user confidence in our platform.</p> + +<p>The results of these changes speak for themselves. The Catwalk Orchestrator has evolved into our flagship product. In just two years, we have deployed 200 Catwalk Orchestrators serving approximately 1,400 ML models.</p> + +<h1 id="whats-next">What’s next?</h1> + +<p>As we continue to innovate and enhance our model serving platform, we are venturing into new territories:</p> + +<ul> + <li><strong>Catwalk serverless</strong>: We aim to further abstract the model serving experience, making it even more user-friendly and efficient.</li> + <li><strong>Catwalk data serving</strong>: We are looking to extend Catwalk’s capabilities to serve data online, providing a more comprehensive service.</li> + <li><strong>LLM serving</strong>: In line with the trend towards generative AI and large language models (LLMs), we’re pivoting Catwalk to support these developments, ensuring we stay at the forefront of the AI and machine learning field.</li> +</ul> + +<p>Stay tuned as we continue to advance our technology and bring these exciting developments to life.</p> + +<h1 id="join-us">Join us</h1> + +<p>Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 700 cities in eight countries.</p> + +<p>Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, <a href="https://grab.careers/">join our team</a> today!</p> + + Tue, 01 Oct 2024 00:00:50 +0000 + https://engineering.grab.com/catwalk-evolution + https://engineering.grab.com/catwalk-evolution + + Machine Learning + + Models + + Data Science + + TensorFlow + + Kubernetes + + Docker + + + Engineering + + Data Science + + + + + Enabling conversational data discovery with LLMs at Grab + <p>Imagine a world where finding the right data is like searching for a needle in a haystack. In today’s data-driven landscape, companies are drowning in a sea of information, struggling to navigate through countless datasets to uncover valuable insights. At Grab, we faced a similar challenge. With over 200,000 tables in our data lake, along with numerous Kafka streams, production databases, and ML features, locating the most suitable dataset for our Grabber’s use cases promptly has historically been a significant hurdle.</p> + +<h2 id="problem-space">Problem Space</h2> + +<p>Our internal data discovery tool, Hubble, built on top of the popular open-source platform Datahub, was primarily used as a reference tool. While it excelled at providing metadata for known datasets, it struggled with true data discovery due to its reliance on Elasticsearch, which performs well for keyword searches but cannot accept and use user-provided context (i.e., it can’t perform semantic search, at least in its vanilla form). The Elasticsearch parameters provided by Datahub out of the box also had limitations: our monthly average click-through rate was only 82%, meaning that in 18% of sessions, users abandoned their searches without clicking on any dataset. This suggested that the search results were not meeting their needs.</p> + +<p>Another indispensable requirement for efficient data discovery that was missing at Grab was documentation. Documentation coverage for our data lake tables was low, with only 20% of the most frequently queried tables (colloquially referred to as P80 tables) having existing documentation. This made it difficult for users to understand the purpose and contents of different tables, even when browsing through them on the Hubble UI.</p> + +<p>Consequently, data consumers heavily relied on tribal knowledge, often turning to their colleagues via Slack to find the datasets they needed. A survey conducted last year revealed that 51% of data consumers at Grab took multiple days to find the dataset they required, highlighting the inefficiencies in our data discovery process.</p> + +<p>To address these challenges and align with Grab’s ongoing journey towards a data mesh architecture, the Hubble team recognised the importance of improving data discovery. We embarked on a journey to revolutionise the way our employees find and access the data they need, leveraging the power of AI and Large Language Models (LLMs).</p> + +<h2 id="vision">Vision</h2> + +<p>Given the historical context, our vision was clear: to remove humans in the data discovery loop by automating the entire process using LLM-powered products. We aimed to reduce the time taken for data discovery from multiple days to mere seconds, eliminating the need for anyone to ask their colleagues data discovery questions ever again.</p> + +<div class="post-image-section"><figure> + <img src="/img/hubble-data-discovery/image2.png" alt="" style="width:80%" /> + </figure> +</div> + +<h2 id="goals">Goals</h2> + +<p>To achieve our vision, we set the following goals for ourselves for the first half of 2024:</p> + +<ul> + <li><strong>Build HubbleIQ:</strong> An LLM-based chatbot that could serve as the equivalent of a Lead Data Analyst for data discovery. Just as a lead is an expert in their domain and can guide data consumers to the right dataset, we wanted HubbleIQ to do the same across all domains at Grab. We also wanted HubbleIQ to be accessible where data consumers hang out the most: Slack.</li> + <li><strong>Improve documentation coverage:</strong> A new Lead Analyst joining the team would require extensive documentation coverage of very high quality. Without this, they wouldn’t know what data exists and where. Thus, it was important for us to improve documentation coverage.</li> + <li><strong>Enhance Elasticsearch:</strong> We aimed to tune our Elasticsearch implementation to better meet the requirements of Grab’s data consumers.</li> +</ul> + +<h2 id="a-systematic-path-to-success">A Systematic Path to Success</h2> + +<h3 id="step-1-enhance-elasticsearch">Step 1: Enhance Elasticsearch</h3> + +<p>Through clickstream analysis and user interviews, the Hubble team identified four categories of data search queries that were seen either on the Hubble UI or in Slack channels:</p> + +<ul> + <li><strong>Exact search:</strong> Queries belonging to this category were a substring of an existing dataset’s name at Grab, with the query length being at least 40% of the dataset’s name.</li> + <li><strong>Partial search:</strong> The Levenshtein distance between a query in this category and any existing dataset’s name was greater than 80. This category usually comprised queries that closely resembled an existing dataset name but likely contained spelling mistakes or were shorter than the actual name.</li> +</ul> + +<p>Exact and partial searches accounted for 75% of searches on Hubble (and were non-existent on Slack: as a human, receiving a message that just had the name of a dataset would feel rather odd). Given the effectiveness of vanilla Elasticsearch for these categories, the click rank was close to 0.</p> + +<div class="post-image-section"><figure> + <img src="/img/hubble-data-discovery/image8.png" alt="" style="width:80%" /> + </figure> +</div> + +<ul> + <li><strong>Inexact search:</strong> This category comprised queries that were usually colloquial keywords or phrases that may be semantically related to a given table, column, or piece of documentation (e.g., “city” or “taxi type”). Inexact searches accounted for the remaining 25% of searches on Hubble. Vanilla Elasticsearch did not perform well in this category since it relied on pure keyword matching and did not consider any additional context.</li> +</ul> + +<div class="post-image-section"><figure> + <img src="/img/hubble-data-discovery/image1.png" alt="" style="width:80%" /> + </figure> +</div> + +<ul> + <li><strong>Semantic search:</strong> These were free text queries with abundant contextual information supplied by the user. Hubble did not see any such queries as users rightly expected that Hubble would not be able to fulfil their search needs. Instead, these queries were sent by data consumers to data producers via Slack. Such queries were numerous, but usually resulted in data hunting journeys that spanned multiple days - the root of frustration amongst data consumers.</li> +</ul> + +<p>The first two search types can be seen as “reference” queries, where the data consumer already knows what they are looking for. Inexact and contextual searches are considered “discovery” queries. The Hubble team noticed drop-offs in inexact searches because users learned that Hubble could not fulfil their discovery needs, forcing them to search for alternatives.</p> + +<p>Through user interviews, the team discovered how Elasticsearch should be tuned to better fit the Grab context. They implemented the following optimisations:</p> + +<ul> + <li>Tagging and boosting P80 tables</li> + <li>Boosting the most relevant schemas</li> + <li>Hiding irrelevant datasets like PowerBI dataset tables</li> + <li>Deboosting deprecated tables</li> + <li>Improving the search UI by simplifying and reducing clutter</li> + <li>Adding relevant tags</li> + <li>Boosting certified tables</li> +</ul> + +<p>As a result of these enhancements, the click-through rate rose steadily over the course of the half to 94%, a 12 percentage point increase.</p> + +<p>While this helped us make significant improvements to the first three search categories, we knew we had to build HubbleIQ to truly automate the last category - semantic search.</p> + +<h3 id="step-2-build-a-context-store-for-hubbleiq">Step 2: Build a Context Store for HubbleIQ</h3> + +<p>To support HubbleIQ, we built a documentation generation engine that used GPT-4 to generate documentation based on table schemas and sample data. We refined the prompt through multiple iterations of feedback from data producers.</p> + +<p>We added a “generate” button on the Hubble UI, allowing data producers to easily generate documentation for their tables. This feature also supported the ongoing Grab-wide initiative to certify tables.</p> + +<div class="post-image-section"><figure> + <img src="/img/hubble-data-discovery/image7.png" alt="" /> + </figure> +</div> + +<p>In conjunction, we took the initiative to pre-populate docs for the most critical tables, while notifying data producers to review the generated documentation. Such docs were visible to data consumers with an “AI-generated” tag as a precaution. When data producers accepted or edited the documentation, the tag was removed.</p> + +<div class="post-image-section"><figure> + <img src="/img/hubble-data-discovery/image3.png" alt="" /> + </figure> +</div> + +<p>As a result, documentation coverage for P80 tables increased by 70 percentage points to ~90%. User feedback showed that ~95% of users found the generated docs useful.</p> + +<h3 id="step-3-build-and-launch-hubbleiq">Step 3: Build and Launch HubbleIQ</h3> + +<p>With high documentation coverage in place, we were ready to harness the power of LLMs for data discovery. To speed up go-to-market, we decided to leverage <a href="https://www.glean.com/">Glean</a>, an enterprise search tool used by Grab.</p> + +<p>First, we integrated Hubble with Glean, making all data lake tables with documentation available on the Glean platform. Next, we used <a href="https://www.glean.com/product/apps">Glean Apps</a> to create the HubbleIQ bot, which was essentially an LLM with a custom system prompt that could access all Hubble datasets that were catalogued on Glean. Finally, we integrated this bot into Hubble search, such that for any search that is likely to be a semantic search, HubbleIQ results are shown on top, followed by regular search results.</p> + +<div class="post-image-section"><figure> + <img src="/img/hubble-data-discovery/image5.png" alt="" /> + </figure> +</div> + +<p>Recently, we integrated HubbleIQ with Slack, allowing data consumers to discover datasets without breaking their flow. Currently, we are working with analytics teams to add the bot to their “ask” channels (where data consumers come to ask contextual search queries for their domains). After integration, HubbleIQ will act as the first line of defence for answering questions in these channels, reducing the need for human intervention.</p> + +<div class="post-image-section"><figure> + <img src="/img/hubble-data-discovery/image4.png" alt="" style="width:80%" /> + </figure> +</div> + +<p>The impact of these improvements was significant. A follow-up survey revealed that 73% of respondents found it easy to discover datasets, marking a substantial 17 percentage point increase from the previous survey. Moreover, Hubble reached an all-time high in monthly active users, demonstrating the effectiveness of the enhancements made to the platform.</p> + +<h2 id="next-steps">Next Steps</h2> + +<p>We’ve made significant progress towards our vision, but there’s still work to be done. Looking ahead, we have several exciting initiatives planned to further enhance data discovery at Grab.</p> + +<p>On the documentation generation front, we aim to enrich the generator with more context, enabling it to produce even more accurate and relevant documentation. We also plan to streamline the process by allowing analysts to auto-update data docs based on Slack threads directly from Slack. To ensure the highest quality of documentation, we will develop an evaluator model that leverages LLMs to assess the quality of both human and AI-written docs. Additionally, we will implement Reflexion, an agentic workflow that utilises the outputs from the doc evaluator to iteratively regenerate docs until a quality benchmark is met or a maximum try-limit is reached.</p> + +<p>As for HubbleIQ, our focus will be on continuous improvement. We’ve already added support for metric datasets and are actively working on incorporating other types of datasets as well. To provide a more seamless user experience, we will enable users to ask follow-up questions to HubbleIQ directly on the HubbleUI, with the system intelligently pulling additional metadata when a user mentions a specific dataset.</p> + +<h2 id="conclusion">Conclusion</h2> + +<p>By harnessing the power of AI and LLMs, the Hubble team has made significant strides in improving documentation coverage, enhancing search capabilities, and drastically reducing the time taken for data discovery. While our efforts so far have been successful, there are still steps to be taken before we fully achieve our vision of completely replacing the reliance on data producers for data discovery. Nonetheless, with our upcoming initiatives and the groundwork we have laid, we are confident that we will continue to make substantial progress in the right direction over the next few production cycles.</p> + +<p>As we forge ahead, we remain dedicated to refining and expanding our AI-powered data discovery tools, ensuring that Grabbers have every dataset they need to drive Grab’s success at their fingertips. The future of data discovery at Grab is brimming with possibilities, and the Hubble team is thrilled to be at the forefront of this exciting journey.</p> + +<p>To our readers, we hope that our journey has inspired you to explore how you can leverage the power of AI to transform data discovery within your own organisations. The challenges you face may be unique, but the principles and strategies we have shared can serve as a foundation for your own data discovery revolution. By embracing innovation, focusing on user needs, and harnessing the potential of cutting-edge technologies, you too can unlock the full potential of your data and propel your organisation to new heights. The future of data-driven innovation is here, and we invite you to join us on this exhilarating journey.</p> + +<h1 id="join-us">Join us</h1> + +<p>Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 700 cities in eight countries.</p> + +<p>Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, <a href="https://grab.careers/">join our team</a> today!</p> + + Thu, 26 Sep 2024 00:00:40 +0000 + https://engineering.grab.com/hubble-data-discovery + https://engineering.grab.com/hubble-data-discovery + + Data Discovery + + AI + + LLM + + Documentation + + Elasticsearch + + + Engineering + + + Bringing Grab’s Live Activity to Android: Enhancing user experience through custom notifications <p>In May 2023, Grab unveiled the Live Activity feature for iOS, which received positive feedback from users. Live Activity is a feature that enhances user experience by displaying a user interface (UI) outside of the app, delivering real-time updates and interactive content. At Grab, we leverage this feature to keep users informed about their order updates without requiring them to manually open the app.</p> @@ -1641,403 +2019,5 @@ By factoring in both relevance and recency, MAM avoids crediting the same touchp - - Ensuring data reliability and observability in risk systems - <p>Grab has an in-house Risk Management platform called <a href="https://www.grab.com/sg/business/defence/">GrabDefence</a> which relies on ingesting large amounts of data gathered from upstream services to power our heuristic risk rules and data science models in real time.</p> - -<div class="post-image-section"><figure> - <img src="/img/data-observability/image4.png" alt="" style="width:80%" /><figcaption align="middle">Fig 1. GrabDefence aggregates data from different upstream services</figcaption> - </figure> -</div> - -<p>As Grab’s business grows, so does the amount of data. It becomes imperative that the data which fuels our risk systems is of reliable quality as any data discrepancy or missing data could impact fraud detection and prevention capabilities.</p> - -<p>We need to quickly detect any data anomalies, which is where data observability comes in.</p> - -<h2 id="data-observability-as-a-solution">Data observability as a solution</h2> - -<p>Data observability is a type of data operation (DataOps; similar to DevOps) where teams build visibility over the health and quality of their data pipelines. This enables teams to be notified of data quality issues, and allows teams to investigate and resolve these issues faster.</p> - -<p>We needed a solution that addresses the following issues:</p> - -<ol> - <li>Alerts for any data quality issues as soon as possible - so this means the observability tool had to work in real time.</li> - <li>With hundreds of data points to observe, we needed a neat and scalable solution which allows users to quickly pinpoint which data points were having issues.</li> - <li>A consistent way to compare, analyse, and compute data that might have different formats.</li> -</ol> - -<p>Hence, we decided to use Flink to standardise data transformations, compute, and observe data trends quickly (in real time) and scalably.</p> - -<h2 id="utilising-flink-for-real-time-computations-at-scale">Utilising Flink for real-time computations at scale</h2> - -<h3 id="what-is-flink">What is Flink?</h3> - -<p>Flink SQL is a powerful, flexible tool for performing real-time analytics on streaming data. It allows users to query continuous data streams using standard SQL syntax, enabling complex event processing and data transformation within the Apache Flink ecosystem, which is particularly useful for scenarios requiring low-latency insights and decisions.</p> - -<h3 id="how-we-used-flink-to-compute-data-output">How we used Flink to compute data output</h3> - -<p>In Grab, data comes from multiple sources and while most of the data is in JSON format, the actual JSON structure differs between services. Because of JSON’s nested and dynamic data structure, it is difficult to consistently analyse the data – posing a significant challenge for real-time analysis.</p> - -<p>To help address this issue, Apache Flink SQL has the capability to manage such intricacies with ease. It offers specialised functions tailored for parsing and querying JSON data, ensuring efficient processing.</p> - -<p>Another standout feature of Flink SQL is the use of custom table functions, such as JSONEXPLOAD, which serves to deconstruct and flatten nested JSON structures into tabular rows. This transformation is crucial as it enables subsequent aggregation operations. By implementing a 5-minute tumbling window, Flink SQL can easily aggregate these now-flattened data streams. This technique is pivotal for monitoring, observing, and analysing data patterns and metrics in near real-time.</p> - -<p>Now that data is aggregated by Flink for easy analysis, we still needed a way to incorporate comprehensive monitoring so that teams could be notified of any data anomalies or discrepancies in real time.</p> - -<h3 id="how-we-interfaced-the-output-with-datadog">How we interfaced the output with Datadog </h3> - -<p>Datadog is the observability tool of choice in Grab, with many teams using Datadog for their service reliability observations and alerts. By aggregating data from Apache Flink and integrating it with Datadog, we can harness the synergy of real-time analytics and comprehensive monitoring. Flink excels in processing and aggregating data streams, which, when pushed to Datadog, can be further analysed and visualised. Datadog also provides seamless integration with collaboration tools like Slack, which enables teams to receive instant notifications and alerts.</p> - -<p>With Datadog’s out-of-the-box features such as anomaly detection, teams can identify and be alerted to unusual patterns or outliers in their data streams. Taking a proactive approach to monitoring is crucial in maintaining system health and performance as teams can be alerted, then collaborate quickly to diagnose and address anomalies.</p> - -<p>This integrated pipeline—from Flink’s real-time data aggregation to Datadog’s monitoring and Slack’s communication capabilities—creates a robust framework for real-time data operations. It ensures that any potential issues are quickly traced and brought to the team’s attention, facilitating a rapid response. Such an ecosystem empowers organisations to maintain high levels of system reliability and performance, ultimately enhancing the overall user experience.</p> - -<h2 id="organising-monitors-and-alerts-using-out-of-the-box-solutions-from-datadog">Organising monitors and alerts using out-of-the-box solutions from Datadog</h2> - -<p>Once we integrated Flink data into Datadog, we realised that it could become unwieldy to try to identify the data point with issues from hundreds of other counters.</p> - -<div class="post-image-section"><figure> - <img src="/img/data-observability/image3.png" alt="" style="width:80%" /><figcaption align="middle">Fig 2. Hundreds of data points on a graph make it hard to decipher which ones have issues</figcaption> - </figure> -</div> - -<p>We decided to organise the counters according to the service stream it was coming from, and create individual monitors for each service stream. We used Datadog’s Monitor Summary tool to help visualise the total number of service streams we are reading from and the number of underlying data points within each stream.  </p> - -<div class="post-image-section"><figure> - <img src="/img/data-observability/image2.png" alt="" style="width:80%" /><figcaption align="middle">Fig 3. Data is grouped according to their source stream</figcaption> - </figure> -</div> - -<p>Within each individual stream, we used Datadog’s <a href="https://docs.datadoghq.com/monitors/types/anomaly/">Anomaly Detection</a> feature to create an alert whenever a data point from the stream exceeds a predefined threshold. This can be configured by the service teams on Datadog.</p> - -<div class="post-image-section"><figure> - <img src="/img/data-observability/image1.png" alt="" style="width:80%" /><figcaption align="middle">Fig 4. Datadog’s built-in Anomaly Detection function triggers alerts whenever a data point exceeds a threshold</figcaption> - </figure> -</div> - -<p>These alerts are then sent to a Slack channel where the Data team is informed when a data point of interest starts throwing anomalous values.</p> - -<div class="post-image-section"><figure> - <img src="/img/data-observability/image5.png" alt="" style="width:80%" /><figcaption align="middle">Fig 5. Datadog integration with Slack to help alert users</figcaption> - </figure> -</div> - -<h2 id="impact">Impact</h2> - -<p>Since the deployment of this data observability tool, we have seen significant improvement in the detection of anomalous values. If there are any anomalies or issues, we now get alerts within the same day (or hour) instead of days to weeks later.</p> - -<p>Organising the alerts according to source streams have also helped simplify the monitoring load and allows users to quickly narrow down and identify which pipeline has failed.</p> - -<h2 id="whats-next">What’s next?</h2> - -<p>At the moment, this data observability tool is only implemented on selected checkpoints in GrabDefence. We plan to expand the observability tool’s coverage to include more checkpoints, and continue to refine the workflows to detect and resolve these data issues.</p> - -<h1 id="join-us">Join us</h1> - -<p>Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.</p> - -<p>Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, <a href="https://grab.careers/">join our team</a> today!</p> - - Tue, 23 Apr 2024 00:15:10 +0000 - https://engineering.grab.com/data-observability - https://engineering.grab.com/data-observability - - Data Science - - Security - - Risk - - Data observability - - Data reliability - - - Data Science - - Engineering - - Security - - - - - Grab Experiment Decision Engine - a Unified Toolkit for Experimentation - <h2 id="introduction">Introduction</h2> - -<p>This article introduces the GrabX Decision Engine, an internal open-source package that offers a comprehensive framework for designing and analysing experiments conducted on online experiment platforms. The package encompasses a wide range of functionalities, including a pre-experiment advisor, a post-experiment analysis toolbox, and other advanced tools. In this article, we explore the motivation behind the development of these functionalities, their integration into the unique ecosystem of Grab’s multi-sided marketplace, and how these solutions strengthen the culture and calibre of experimentation at Grab.</p> - -<h2 id="background">Background</h2> - -<p>Today, <a href="/building-grab-s-experimentation-platform">Grab’s Experimentation (GrabX) platform</a> orchestrates the testing of thousands of experimental variants each week. As the platform continues to expand and manage a growing volume of experiments, the need for dependable, scalable, and trustworthy experimentation tools becomes increasingly critical for data-driven and evidence-based -decision-making.</p> - -<p>In our previous article, we presented the <a href="https://engineering.grab.com/automated-experiment-analysis">Automated Experiment Analysis</a> application, a tool designed to automate data pipelines for analyses. However, during the development of this application for Grab’s experimenter community, we noticed a prevailing trend: experiments were predominantly analysed on a one-by-one, manual basis. While such a federated approach may be needed in a few cases, it presents numerous challenges at -the organisational level:</p> - -<ul> - <li><strong>Lack of a contextual toolkit</strong>: GrabX facilitates executing a diverse range of experimentation designs, catering to the varied needs and contexts of different tech teams across the organisation. However, experimenters may often rely on generic online tools for experiment configurations (e.g. sample size calculations), which were not specifically designed to cater to the nuances of GrabX experiments or the recommended evaluation method, given the design. This is exacerbated by the fact -that most online tutorials or courses on experimental design do not typically address the nuances of multi-sided marketplaces, and cannot consider the nature or constraints of specific experiments.</li> - <li><strong>Lack of standards</strong>: In this federated model, the absence of standardised and vetted practices can lead to reliability issues. In some cases, these can include poorly designed experiments, inappropriate evaluation methods, suboptimal testing choices, and unreliable inferences, all of which are difficult to monitor and rectify.</li> - <li><strong>Lack of scalability and efficiency</strong>: Experimenters, coming from varied backgrounds and possessing distinct skill sets, may adopt significantly different approaches to experimentation and inference. This diversity, while valuable, often impedes the transferability and sharing of methods, hindering a cohesive and scalable experimentation framework. Additionally, this variance in methods can extend the lifecycle of experiment analysis, as disagreements over approaches may give rise to -repeated requests for review or modification.</li> -</ul> - -<h2 id="solution">Solution</h2> - -<p>To address these challenges, we developed the GrabX Decision Engine, a Python package open-sourced internally across all of Grab’s development platforms. Its central objective is to institutionalise best practices in experiment efficiency and analytics, thereby ensuring the derivation of precise and reliable conclusions from each experiment.</p> - -<p>In particular, this unified toolkit significantly enhances our end-to-end experimentation processes by:</p> - -<ul> - <li><strong>Ensuring compatibility with GrabX and Automated Experiment Analysis</strong>: The package is fully integrated with the <a href="https://engineering.grab.com/automated-experiment-analysis">Automated Experiment Analysis</a> app, and provides analytics and test results tailored to the designs supported by GrabX. The outcomes can be further used for other downstream jobs, e.g. market modelling, simulation-based calibrations, or auto-adaptive configuration tuning.</li> - <li><strong>Standardising experiment analytics</strong>: By providing a unified framework, the package ensures that the rationale behind experiment design and the interpretation of analysis results adhere to a company-wide standard, promoting consistency and ease of review across different teams.</li> - <li><strong>Enhancing collaboration and quality</strong>: As an open-source package, it not only fosters a collaborative culture but also upholds quality through peer reviews. It invites users to tap into a rich pool of features while encouraging contributions that refine and expand the toolkit’s capabilities.</li> -</ul> - -<p>The package is designed for everyone involved in the experimentation process, with data scientists and product analysts being the primary users. Referred to as experimenters in this article, these key stakeholders can not only leverage the existing capabilities of the package to support their projects, but can also contribute their own innovations. Eventually, the experiment results and insights generated from the package via the <a href="https://engineering.grab.com/automated-experiment-analysis">Automated Experiment Analysis</a> app have an even wider reach to stakeholders across all functions.</p> - -<p>In the following section, we go deeper into the key functionalities of the package.</p> - -<h2 id="feature-details">Feature details</h2> - -<p>The package comprises three key components:</p> - -<ul> - <li>An experimentation trusted advisor</li> - <li>A comprehensive post-experiment analysis toolbox</li> - <li>Advanced tools</li> -</ul> - -<p>These have been built taking into account the type of experiments we typically run at Grab. To understand their functionality, it’s useful to first discuss the key experimental designs supported by GrabX.</p> - -<h3 id="a-note-on-experimental-designs">A note on experimental designs</h3> - -<p>While there is a wide variety of specific experimental designs implemented, they can be bucketed into two main categories: a <strong>between-subject</strong> design and a <strong>within-subject</strong> design.</p> - -<p>In a between-subject design, participants — like our app users, driver-partners, and merchant-partners — are split into experimental groups, and each group gets exposed to a distinct condition throughout the experiment. One challenge in this design is that each participant may provide multiple observations to our experimental analysis sample, causing a high within-subject correlation among observations and deviations between the randomisation and session unit. This can affect the accuracy of -pre-experiment power analysis, and post-experiment inference, since it necessitates adjustments, e.g. clustering of standard errors when conducting hypothesis testing.</p> - -<p>Conversely, a within-subject design involves every participant experiencing all conditions. Marketplace-level switchback experiments are a common GrabX use case, where a timeslice becomes the experimental unit. This design not only faces the aforementioned challenges, but also creates other complications that need to be accounted for, such as spillover effects across timeslices.</p> - -<p>Designing and analysing the results of both experimental approaches requires careful nuanced statistical tools. Ensuring proper duration, sample size, controlling for confounders, and addressing potential biases are important considerations to enhance the validity of the results.</p> - -<h3 id="trusted-advisor">Trusted Advisor</h3> - -<p>The first key component of the Decision Engine is the Trusted Advisor, which provides a recommendation to the experimenter on key experiment attributes to be considered when preparing the experiment. This is dependent on the design; at a minimum, the experimenter needs to define whether the experiment design is between- or within-subject.</p> - -<p><strong>The between-subject design</strong>: We strongly recommend that experimenters utilise the “Trusted Advisor” feature in the Decision Engine for estimating their required sample size. This is designed to account for the multiple observations per user the experiment is expected to generate and adjusts for the presence of clustered errors (Moffatt, 2020; List, Sadoff, &amp; Wagner, 2011). This feature allows users to input their data, either as a PySpark or Pandas dataframe. Alternatively, a function is -provided to extract summary statistics from their data, which can then be inputted into the Trusted Advisor. Obtaining the data beforehand is actually not mandatory; users have the option to directly query the recommended sample size based on common metrics derived from a regular data pipeline job. These functionalities are illustrated in the flowchart below.</p> - -<div class="post-image-section"><figure> - <img src="/img/grabx-decision-engine/image1.png" alt="" style="width:80%" /><figcaption align="middle">Trusted Advisor functionalities</figcaption> - </figure> -</div> - -<p>Furthermore, the Trusted Advisor feature can identify the underlying characteristics of the data, whether it’s passed directly, or queried from our common metrics database. This enables it to determine the appropriate power analysis for the experiment, without further guidance. For instance, it can detect if the target metric is a binary decision variable, and will adapt the power analysis to the correct context.</p> - -<p><strong>The within-subject design</strong>: In this case, we instead provide a best practices guideline to follow. Through our experience supporting various Tech Families running switchback experiments, we have observed various challenges highly dependent on the use case. This makes it difficult to create a one-size-fits-all solution.</p> - -<p>For instance, an important factor affecting the final sample size requirement is how frequently treatments switch, which is also tied to what data granularity is appropriate to use in the post-experiment analysis. These considerations are dependent on, among other factors, how quickly a given treatment is expected to cause an effect. Some treatments may take effect relatively quickly (near-instantly, e.g. if applied to price checks), while others may take significantly longer (e.g. 15-30 minutes because they may require a trip to be completed). This has further consequences, e.g. autocorrelation between observations within a treatment window, spillover effects between different treatment windows, requirements for cool-down windows when treatments switch, etc.</p> - -<p>Another issue we have identified from analysing the history of experiments on our platform is that a significant portion is prone to issues related to sample ratio mismatch (SRM). We therefore also heavily emphasise the post-experiment analysis corrections and robustness checks that are needed in switchback experiments, and do not simply rely on pre-experiment guidance such as power analysis.</p> - -<h3 id="post-experiment-analysis">Post-experiment analysis</h3> - -<p>Upon completion of the experiment, a comprehensive toolbox for post-experiment analysis is available. This toolbox consists of a wide range of statistical tests, ranging from normality tests to non-parametric and parametric tests. Here is an overview of the different types of tests included in the toolbox for different experiment setups:</p> - -<div class="post-image-section"><figure> - <img src="/img/grabx-decision-engine/image2.png" /><figcaption align="middle">Tests supported by the post-experiment analysis component</figcaption> - </figure> -</div> - -<p>Though we make all the relevant tests available, the package sets a default list of output. With just two lines of code specifying the desired experiment design, experimenters can easily retrieve the recommended results, as summarised in the following table.</p> - -<table class="table"> -<thead> - <tr> - <th>Types</th> - <th>Details</th> - </tr> -</thead> -<tbody> - <tr> - <td>Basic statistics</td> - <td>The mean, variance, and sample size of Treatment and Control </td> - </tr> - <tr> - <td>Uplift tests</td> - <td>Welch's t-test;<br />Non-parametric tests, such as Wilcoxon signed-rank test and Mann-Whitney U Test</td> - </tr> - <tr> - <td>Misc tests</td> - <td>Normality tests such as the Shapiro-Wilk test, Anderson-Darling test, and Kolmogorov-Smirnov test;<br />Levene test which assesses the equality of variances between groups</td> - </tr> - <tr> - <td>Regression models</td> - <td>A standard OLS/Logit model to estimate the treatment uplift;<br /><b>Recommended regression models</b> </td> - </tr> - <tr> - <td>Warning</td> - <td>Provides a warning or notification related to the statistical analysis or results, for example:<br />- Lack of variation in the variables<br />- Sample size is too small<br />- Too few randomisation units which will lead to under-estimated standard errors</td> - </tr> -</tbody> -</table> - -<h3 id="recommended-regression-models">Recommended regression models</h3> - -<p>Besides reporting relevant statistical test results, we adopt regression models to leverage their flexibility in controlling for confounders, fixed effects and heteroskedasticity, as is commonly observed in our experiments. As mentioned in the section “A note on experimental design”, each approach has different implications on the achieved randomisation, and hence requires its own customised regression models.</p> - -<p><strong>Between-subject design</strong>: the observations are not independent and identically distributed (i.i.d) but clustered due to repeated observations of the same experimental units. Therefore, we set the default clustering level at the participant level in our regression models, considering that most of our between-subject experiments only take a small portion of the population (Abadie et al., 2022).</p> - -<p><strong>Within-subject design</strong>: this has further challenges, including spillover effects and randomisation imbalances. As a result, they often require better control of confounding factors. We adopt panel data methods and impose time fixed effects, with no option to remove them. Though users have the flexibility to define these themselves, we use hourly fixed effects as our default as we have found that these match the typical seasonality we observe in marketplace metrics. Similar to between-subject -designs, we use standard error corrections for clustered errors, and small number of clusters, as the default. Our API is flexible for users to include further controls, as well as further fixed effects to adapt the estimator to geo-timeslice designs.</p> - -<h3 id="advanced-tools">Advanced tools</h3> - -<p>Apart from the pre-experiment Trusted Advisor and the post-experiment Analysis Toolbox, we have enriched this package by providing more advanced tools. Some of them are set as a default feature in the previous two components, while others are ad-hoc capabilities which the users can utilise via calling the functions directly.</p> - -<h4 id="variance-reduction">Variance reduction</h4> - -<p>We bring in multiple methods to reduce variance and improve the power and sensitivity of experiments:</p> - -<ul> - <li>Stratified sampling: recognised for reducing variance during assignment</li> - <li>Post stratification: a post-assignment variance reduction technique</li> - <li><a href="https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf">CUPED</a>: utilises ANCOVA to decrease variances</li> - <li><a href="https://arxiv.org/pdf/2106.07263.pdf">MLRATE</a>: an extension of CUPED that allows for the use of non-linear / machine learning models</li> -</ul> - -<p>These approaches offer valuable ways to mitigate variance and improve the overall effectiveness of experiments. The experimenters can directly access these ad hoc capabilities via the package.</p> - -<h4 id="multiple-comparisons-problem">Multiple comparisons problem</h4> - -<p>A multiple comparisons problem occurs when multiple hypotheses are simultaneously tested, leading to a higher likelihood of false positives. To address this, we implement various statistical correction techniques in this package, as illustrated below.</p> - -<div class="post-image-section"><figure> - <img src="/img/grabx-decision-engine/image3.png" alt="" style="width:80%" /><figcaption align="middle">Statistical correction techniques</figcaption> - </figure> -</div> - -<p>Experimenters can specify if they have concerns about the dependency of the tests and whether the test results are expected to be negatively related. This capability will adopt the following procedures and choose the relevant tests to mitigate the risk of false positives accordingly:</p> - -<ul> - <li>False Discovery Rate (FDR) procedures, which control the expected rate of false discoveries.</li> - <li>Family-wise Error Rate (FWER) procedures, which control the probability of making at least one false discovery within a set of related tests referred to as a family.</li> -</ul> - -<h4 id="multiple-treatments-and-unequal-treatment-sizes">Multiple treatments and unequal treatment sizes</h4> - -<p>We developed a capability to deal with experiments where there are multiple treatments. This capability employs a conservative approach to ensure that the size reaches a minimum level where any pairwise comparison between the control and treatment groups has a sufficient sample size.</p> - -<h4 id="heterogeneous-treatment-effects">Heterogeneous treatment effects</h4> - -<p>Heterogeneous treatment effects refer to a situation where the treatment effect varies across different groups or subpopulations within a larger population. For instance, it may be of interest to examine treatment effects specifically on rainy days compared to non-rainy days. We have incorporated this functionality into the tests for both experiment designs. By enabling this feature, we facilitate a more nuanced analysis that accounts for potential variations in treatment effects based on different factors or contexts.</p> - -<h2 id="maintenance-and-support">Maintenance and support</h2> - -<p>The package is available across all internal DS/Machine Learning platforms and individual local development environments within Grab. Its source code is openly accessible to all developers within Grab and its release adheres to a semantic release standard.</p> - -<p>In addition to the technical maintenance efforts, we have introduced a dedicated committee and a workspace to address issues that may extend beyond the scope of the package’s current capabilities.</p> - -<h3 id="experiment-council">Experiment Council</h3> - -<p>Within Grab, there is a dedicated committee known as the ‘Experiment Council’. This committee includes data scientists, analysts, and economists from various functions. One of their responsibilities is to collaborate to enhance and maintain the package, as well as guide users in effectively utilising its functionalities. The Experiment Council plays a crucial role in enhancing the overall operational excellence of conducting experiments and deriving meaningful insights from them.</p> - -<h3 id="grabcausal-methodology-bank">GrabCausal Methodology Bank</h3> - -<p>Experimenters frequently encounter challenges regarding the feasibility of conducting experiments for causal problems. To address this concern, we have introduced an alternative workspace called GrabCausal Methodology Bank. Similar to the internal open-source nature of this project, the GrabCausal Methodology bank is open to contributions from all users within Grab. It provides a collaborative space where users can readily share their code, case studies, guidelines, and suggestions related to -causal methodologies. By fostering an open and inclusive environment, this workspace encourages knowledge sharing and promotes the advancement of causal research methods.</p> - -<p>The workspace functions as a platform, which now exhibits a wide range of commonly used methods, including Diff-in-Diff, Event studies, Regression Discontinuity Designs (RDD), Instrumental Variables (IV), Bayesian structural time series, and Bunching. Additionally, we are dedicated to incorporating more, such as Synthetic control, Double ML (Chernozhukov et al. 2018), DAG discovery/validation, etc., to further enhance our offerings in this space.</p> - -<h2 id="learnings">Learnings</h2> - -<p>Over the past few years, we have invested in developing and expanding this package. Our initial motivation was humble yet motivating - to contribute to improving the quality of experimentation at Grab, helping it develop from its initial start-up modus operandi to a more consolidated, rigorous, and guided approach.</p> - -<p>Throughout this journey, we have learned that prioritisation holds the utmost significance in open-source projects of this nature; the majority of user demands can be met through relatively small yet pivotal efforts. By focusing on these core capabilities, we avoid spreading resources too thinly across all areas at the initial stage of planning and development.</p> - -<p>Meanwhile, we acknowledge that there is still a significant journey ahead. While the package now focuses solely on individual experiments, an inherent challenge in online-controlled experimentation platforms is the interference between experiments (Gupta, et al, 2019). A recent development in the field is to embrace simultaneous tests (<a href="https://exp-platform.com/Documents/2013%2520controlledExperimentsAtScale.pdf">Microsoft</a>, <a href="https://medium.datadriveninvestor.com/how-google-conducts-more-better-faster-experiments-3b91446cd3b5">Google</a>, <a href="https://www.infoq.com/news/2016/12/large-experimentation-spotify/">Spotify</a> and <a href="https://cxl.com/blog/can-you-run-multiple-ab-tests-at-the-same-time/">booking.com and Optimizely</a>), and to carefully consider the tradeoff between accuracy and velocity.</p> - -<p>The key to overcoming this challenge will be a close collaboration between the community of experimenters, the teams developing this unified toolkit, and the GrabX platform engineers. In particular, the platform developers will continue to enrich the experimentation SDK by providing diverse assignment strategies, sampling mechanisms, and user interfaces to manage potential inference risks better. Simultaneously, the community of experimenters can coordinate among themselves effectively to -avoid severe interference, which will also be monitored by GrabX. Last but not least, the development of this unified toolkit will also focus on monitoring, evaluating, and managing inter-experiment interference.</p> - -<p>In addition, we are committed to keeping this package in sync with industry advancements. Many existing tools in this package, despite being labelled as “advanced” in the earlier discussions, are still relatively simplified. For instance,</p> - -<ul> - <li>Incorporating standard errors clustering based on the diverse assignment and sampling strategies requires attention (Abadie, et al, 2023).</li> - <li>Sequential testing will play a vital role in detecting uplifts earlier and safely, avoiding p-hacking. One recent innovation is the “always valid inference” (Johari, et al., 2022)</li> - <li>The advancements in investigating heterogeneous effects, such as Causal Forest (Athey and Wager, 2019), have extended beyond linear approaches, now incorporating nonlinear and more granular analyses.</li> - <li>Estimating the long-term treatment effects observed from short-term follow-ups is also a long-term objective, and one approach is using a Surrogate Index (Athey, et al 2019).</li> - <li>Continuous effort is required to stay updated and informed about the latest advancements in statistical testing methodologies, to ensure accuracy and effectiveness.</li> -</ul> - -<p>This article marks the beginning of our journey towards automating the experimentation and product decision-making process among the data scientist community. We are excited about the prospect of expanding the toolkit further in these directions. Stay tuned for more updates and posts.</p> - -<h2 id="references">References</h2> - -<ul> - <li> - <p>Abadie, Alberto, et al. “When should you adjust standard errors for clustering?.” The Quarterly Journal of Economics 138.1 (2023): 1-35.</p> - </li> - <li> - <p>Athey, Susan, et al. “The surrogate index: Combining short-term proxies to estimate long-term treatment effects more rapidly and precisely.” No. w26463. National Bureau of Economic Research, 2019.</p> - </li> - <li> - <p>Athey, Susan, and Stefan Wager. “Estimating treatment effects with causal forests: An application.” Observational studies 5.2 (2019): 37-51.</p> - </li> - <li> - <p>Chernozhukov, Victor, et al. “Double/debiased machine learning for treatment and structural parameters.” (2018): C1-C68.</p> - </li> - <li> - <p>Facure, Matheus. Causal Inference in Python. O’Reilly Media, Inc., 2023.</p> - </li> - <li> - <p>Gupta, Somit, et al. “Top challenges from the first practical online controlled experiments summit.” ACM SIGKDD Explorations Newsletter 21.1 (2019): 20-35.</p> - </li> - <li> - <p>Huntington-Klein, Nick. The Effect: An Introduction to Research Design and Causality. CRC Press, 2021.</p> - </li> - <li> - <p>Imbens, Guido W. and Donald B. Rubin. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, 2015.</p> - </li> - <li> - <p>Johari, Ramesh, et al. “Always valid inference: Continuous monitoring of a/b tests.” Operations Research 70.3 (2022): 1806-1821.</p> - </li> - <li> - <p>List, John A., Sally Sadoff, and Mathis Wagner. “So you want to run an experiment, now what? Some simple rules of thumb for optimal experimental design.” Experimental Economics 14 (2011): 439-457.</p> - </li> - <li> - <p>Moffatt, Peter. Experimetrics: Econometrics for Experimental Economics. Bloomsbury Publishing, 2020.</p> - </li> -</ul> - -<h1 id="join-us">Join us</h1> - -<p>Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.</p> - -<p>Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, <a href="https://grab.careers/">join our team</a> today!</p> - - Tue, 09 Apr 2024 02:22:10 +0000 - https://engineering.grab.com/grabx-decision-engine - https://engineering.grab.com/grabx-decision-engine - - Data Science - - Experiment - - Statistics - - Econometrics - - Python Package - - - Engineering - - Data Science - - - diff --git a/hubble-data-discovery.html b/hubble-data-discovery.html new file mode 100644 index 00000000..b08586b6 --- /dev/null +++ b/hubble-data-discovery.html @@ -0,0 +1,533 @@ + + + + + Enabling conversational data discovery with LLMs at Grab + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
        + +
        +
        +
        + Enabling conversational data discovery with LLMs at Grab cover photo + + + + + +

        Enabling conversational data discovery with LLMs at Grab

        + + + +
        +
        +
        +

        Imagine a world where finding the right data is like searching for a needle in a haystack. In today’s data-driven landscape, companies are drowning in a sea of information, struggling to navigate through countless datasets to uncover valuable insights. At Grab, we faced a similar challenge. With over 200,000 tables in our data lake, along with numerous Kafka streams, production databases, and ML features, locating the most suitable dataset for our Grabber’s use cases promptly has historically been a significant hurdle.

        + +

        Problem Space

        + +

        Our internal data discovery tool, Hubble, built on top of the popular open-source platform Datahub, was primarily used as a reference tool. While it excelled at providing metadata for known datasets, it struggled with true data discovery due to its reliance on Elasticsearch, which performs well for keyword searches but cannot accept and use user-provided context (i.e., it can’t perform semantic search, at least in its vanilla form). The Elasticsearch parameters provided by Datahub out of the box also had limitations: our monthly average click-through rate was only 82%, meaning that in 18% of sessions, users abandoned their searches without clicking on any dataset. This suggested that the search results were not meeting their needs.

        + +

        Another indispensable requirement for efficient data discovery that was missing at Grab was documentation. Documentation coverage for our data lake tables was low, with only 20% of the most frequently queried tables (colloquially referred to as P80 tables) having existing documentation. This made it difficult for users to understand the purpose and contents of different tables, even when browsing through them on the Hubble UI.

        + +

        Consequently, data consumers heavily relied on tribal knowledge, often turning to their colleagues via Slack to find the datasets they needed. A survey conducted last year revealed that 51% of data consumers at Grab took multiple days to find the dataset they required, highlighting the inefficiencies in our data discovery process.

        + +

        To address these challenges and align with Grab’s ongoing journey towards a data mesh architecture, the Hubble team recognised the importance of improving data discovery. We embarked on a journey to revolutionise the way our employees find and access the data they need, leveraging the power of AI and Large Language Models (LLMs).

        + +

        Vision

        + +

        Given the historical context, our vision was clear: to remove humans in the data discovery loop by automating the entire process using LLM-powered products. We aimed to reduce the time taken for data discovery from multiple days to mere seconds, eliminating the need for anyone to ask their colleagues data discovery questions ever again.

        + +
        + +
        +
        + +

        Goals

        + +

        To achieve our vision, we set the following goals for ourselves for the first half of 2024:

        + +
          +
        • Build HubbleIQ: An LLM-based chatbot that could serve as the equivalent of a Lead Data Analyst for data discovery. Just as a lead is an expert in their domain and can guide data consumers to the right dataset, we wanted HubbleIQ to do the same across all domains at Grab. We also wanted HubbleIQ to be accessible where data consumers hang out the most: Slack.
        • +
        • Improve documentation coverage: A new Lead Analyst joining the team would require extensive documentation coverage of very high quality. Without this, they wouldn’t know what data exists and where. Thus, it was important for us to improve documentation coverage.
        • +
        • Enhance Elasticsearch: We aimed to tune our Elasticsearch implementation to better meet the requirements of Grab’s data consumers.
        • +
        + +

        A Systematic Path to Success

        + +

        Step 1: Enhance Elasticsearch

        + +

        Through clickstream analysis and user interviews, the Hubble team identified four categories of data search queries that were seen either on the Hubble UI or in Slack channels:

        + +
          +
        • Exact search: Queries belonging to this category were a substring of an existing dataset’s name at Grab, with the query length being at least 40% of the dataset’s name.
        • +
        • Partial search: The Levenshtein distance between a query in this category and any existing dataset’s name was greater than 80. This category usually comprised queries that closely resembled an existing dataset name but likely contained spelling mistakes or were shorter than the actual name.
        • +
        + +

        Exact and partial searches accounted for 75% of searches on Hubble (and were non-existent on Slack: as a human, receiving a message that just had the name of a dataset would feel rather odd). Given the effectiveness of vanilla Elasticsearch for these categories, the click rank was close to 0.

        + +
        + +
        +
        + +
          +
        • Inexact search: This category comprised queries that were usually colloquial keywords or phrases that may be semantically related to a given table, column, or piece of documentation (e.g., “city” or “taxi type”). Inexact searches accounted for the remaining 25% of searches on Hubble. Vanilla Elasticsearch did not perform well in this category since it relied on pure keyword matching and did not consider any additional context.
        • +
        + +
        + +
        +
        + +
          +
        • Semantic search: These were free text queries with abundant contextual information supplied by the user. Hubble did not see any such queries as users rightly expected that Hubble would not be able to fulfil their search needs. Instead, these queries were sent by data consumers to data producers via Slack. Such queries were numerous, but usually resulted in data hunting journeys that spanned multiple days - the root of frustration amongst data consumers.
        • +
        + +

        The first two search types can be seen as “reference” queries, where the data consumer already knows what they are looking for. Inexact and contextual searches are considered “discovery” queries. The Hubble team noticed drop-offs in inexact searches because users learned that Hubble could not fulfil their discovery needs, forcing them to search for alternatives.

        + +

        Through user interviews, the team discovered how Elasticsearch should be tuned to better fit the Grab context. They implemented the following optimisations:

        + +
          +
        • Tagging and boosting P80 tables
        • +
        • Boosting the most relevant schemas
        • +
        • Hiding irrelevant datasets like PowerBI dataset tables
        • +
        • Deboosting deprecated tables
        • +
        • Improving the search UI by simplifying and reducing clutter
        • +
        • Adding relevant tags
        • +
        • Boosting certified tables
        • +
        + +

        As a result of these enhancements, the click-through rate rose steadily over the course of the half to 94%, a 12 percentage point increase.

        + +

        While this helped us make significant improvements to the first three search categories, we knew we had to build HubbleIQ to truly automate the last category - semantic search.

        + +

        Step 2: Build a Context Store for HubbleIQ

        + +

        To support HubbleIQ, we built a documentation generation engine that used GPT-4 to generate documentation based on table schemas and sample data. We refined the prompt through multiple iterations of feedback from data producers.

        + +

        We added a “generate” button on the Hubble UI, allowing data producers to easily generate documentation for their tables. This feature also supported the ongoing Grab-wide initiative to certify tables.

        + +
        + +
        +
        + +

        In conjunction, we took the initiative to pre-populate docs for the most critical tables, while notifying data producers to review the generated documentation. Such docs were visible to data consumers with an “AI-generated” tag as a precaution. When data producers accepted or edited the documentation, the tag was removed.

        + +
        + +
        +
        + +

        As a result, documentation coverage for P80 tables increased by 70 percentage points to ~90%. User feedback showed that ~95% of users found the generated docs useful.

        + +

        Step 3: Build and Launch HubbleIQ

        + +

        With high documentation coverage in place, we were ready to harness the power of LLMs for data discovery. To speed up go-to-market, we decided to leverage Glean, an enterprise search tool used by Grab.

        + +

        First, we integrated Hubble with Glean, making all data lake tables with documentation available on the Glean platform. Next, we used Glean Apps to create the HubbleIQ bot, which was essentially an LLM with a custom system prompt that could access all Hubble datasets that were catalogued on Glean. Finally, we integrated this bot into Hubble search, such that for any search that is likely to be a semantic search, HubbleIQ results are shown on top, followed by regular search results.

        + +
        + +
        +
        + +

        Recently, we integrated HubbleIQ with Slack, allowing data consumers to discover datasets without breaking their flow. Currently, we are working with analytics teams to add the bot to their “ask” channels (where data consumers come to ask contextual search queries for their domains). After integration, HubbleIQ will act as the first line of defence for answering questions in these channels, reducing the need for human intervention.

        + +
        + +
        +
        + +

        The impact of these improvements was significant. A follow-up survey revealed that 73% of respondents found it easy to discover datasets, marking a substantial 17 percentage point increase from the previous survey. Moreover, Hubble reached an all-time high in monthly active users, demonstrating the effectiveness of the enhancements made to the platform.

        + +

        Next Steps

        + +

        We’ve made significant progress towards our vision, but there’s still work to be done. Looking ahead, we have several exciting initiatives planned to further enhance data discovery at Grab.

        + +

        On the documentation generation front, we aim to enrich the generator with more context, enabling it to produce even more accurate and relevant documentation. We also plan to streamline the process by allowing analysts to auto-update data docs based on Slack threads directly from Slack. To ensure the highest quality of documentation, we will develop an evaluator model that leverages LLMs to assess the quality of both human and AI-written docs. Additionally, we will implement Reflexion, an agentic workflow that utilises the outputs from the doc evaluator to iteratively regenerate docs until a quality benchmark is met or a maximum try-limit is reached.

        + +

        As for HubbleIQ, our focus will be on continuous improvement. We’ve already added support for metric datasets and are actively working on incorporating other types of datasets as well. To provide a more seamless user experience, we will enable users to ask follow-up questions to HubbleIQ directly on the HubbleUI, with the system intelligently pulling additional metadata when a user mentions a specific dataset.

        + +

        Conclusion

        + +

        By harnessing the power of AI and LLMs, the Hubble team has made significant strides in improving documentation coverage, enhancing search capabilities, and drastically reducing the time taken for data discovery. While our efforts so far have been successful, there are still steps to be taken before we fully achieve our vision of completely replacing the reliance on data producers for data discovery. Nonetheless, with our upcoming initiatives and the groundwork we have laid, we are confident that we will continue to make substantial progress in the right direction over the next few production cycles.

        + +

        As we forge ahead, we remain dedicated to refining and expanding our AI-powered data discovery tools, ensuring that Grabbers have every dataset they need to drive Grab’s success at their fingertips. The future of data discovery at Grab is brimming with possibilities, and the Hubble team is thrilled to be at the forefront of this exciting journey.

        + +

        To our readers, we hope that our journey has inspired you to explore how you can leverage the power of AI to transform data discovery within your own organisations. The challenges you face may be unique, but the principles and strategies we have shared can serve as a foundation for your own data discovery revolution. By embracing innovation, focusing on user needs, and harnessing the potential of cutting-edge technologies, you too can unlock the full potential of your data and propel your organisation to new heights. The future of data-driven innovation is here, and we invite you to join us on this exhilarating journey.

        + +

        Join us

        + +

        Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 700 cities in eight countries.

        + +

        Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

        + +
        +
        + + + + +
        +
        + + + +
        + +
        + + +
        +
        +
        + +
        +
        + + + + +
        +
        +
        +
        +
        + + + +
        + + +
        +
        +
        +
        + +

        + Want to join us in our mission to revolutionize transportation? +

        + View open positions + +
        +
        + + + + + + + + + + + diff --git a/img/authors/bjorn-jee.jpg b/img/authors/bjorn-jee.jpg new file mode 100644 index 00000000..50181d86 Binary files /dev/null and b/img/authors/bjorn-jee.jpg differ diff --git a/img/authors/daniel-tai.jpg b/img/authors/daniel-tai.jpg new file mode 100644 index 00000000..a0dc77e8 Binary files /dev/null and b/img/authors/daniel-tai.jpg differ diff --git a/img/authors/siddhart-pandey.jpg b/img/authors/siddhart-pandey.jpg new file mode 100644 index 00000000..1383a4b9 Binary files /dev/null and b/img/authors/siddhart-pandey.jpg differ diff --git a/img/catwalk-evolution/cover.png b/img/catwalk-evolution/cover.png new file mode 100644 index 00000000..c24a7d6d Binary files /dev/null and b/img/catwalk-evolution/cover.png differ diff --git a/img/catwalk-evolution/phase1.png b/img/catwalk-evolution/phase1.png new file mode 100644 index 00000000..27df9b86 Binary files /dev/null and b/img/catwalk-evolution/phase1.png differ diff --git a/img/catwalk-evolution/phase2-api-request-flow.png b/img/catwalk-evolution/phase2-api-request-flow.png new file mode 100644 index 00000000..7a17c45b Binary files /dev/null and b/img/catwalk-evolution/phase2-api-request-flow.png differ diff --git a/img/catwalk-evolution/phase2.png b/img/catwalk-evolution/phase2.png new file mode 100644 index 00000000..fa7a3fbb Binary files /dev/null and b/img/catwalk-evolution/phase2.png differ diff --git a/img/catwalk-evolution/phase3.png b/img/catwalk-evolution/phase3.png new file mode 100644 index 00000000..1fa65a1d Binary files /dev/null and b/img/catwalk-evolution/phase3.png differ diff --git a/img/catwalk-evolution/phase4.png b/img/catwalk-evolution/phase4.png new file mode 100644 index 00000000..29a8c2b8 Binary files /dev/null and b/img/catwalk-evolution/phase4.png differ diff --git a/index.html b/index.html index 638e0322..0c599f81 100644 --- a/index.html +++ b/index.html @@ -148,20 +148,282 @@
        - - Bringing Grab’s Live Activity to Android: Enhancing user experience through custom notifications cover photo + + Evolution of Catwalk: Model serving platform at Grab cover photo + + · + + + +

        + Evolution of Catwalk: Model serving platform at Grab +

        +
        Read about the evolution of Catwalk, Grab's model serving platform, from its inception to its current state. Discover how it has evolved to meet the needs of Grab's growing machine learning model serving requirements.
        +
        +
        +
        +
        + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
        +
        +
        + + + + +
        + + +
        + + + +
      • +
        + +
        + + Enabling conversational data discovery with LLMs at Grab cover photo + +
        + +
        +
        +
        + + + +

        + Enabling conversational data discovery with LLMs at Grab +

        +
        Discover how Grab is revolutionising data discovery with the power of AI and LLMs. Dive into our journey as we overcome challenges, build groundbreaking tools like HubbleIQ, and transform the way our employees find and access data. Get ready to be inspired by our innovative approach and learn how you can harness the potential of AI to unlock the full value of your organisation's data.
        +
        +
        + +
        +
        + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
        +
        +
        + + + + +
        +
        +
        +
      • + +
      • +
        + +
        + + Bringing Grab’s Live Activity to Android: Enhancing user experience through custom notifications cover photo + +
        + +
        +
        +
        + + +

        - Bringing Grab’s Live Activity to Android: Enhancing user experience through custom notifications + Bringing Grab’s Live Activity to Android: Enhancing user experience through custom notifications

        Unleashing Live Activity feature for iOS. Live Activity is a feature that enhances user experience by displaying a user interface (UI) outside of the app, delivering real-time updates and interactive content. Discover how its was solutionised at Grab.
        +
        @@ -200,11 +462,9 @@

        - - -
        - - +
        +
        +
      • @@ -809,203 +1069,6 @@

      • -
      • -
        - -
        - - How we evaluated the business impact of marketing campaigns cover photo - -
        - -
        -
        -
        - - - -

        - How we evaluated the business impact of marketing campaigns -

        -
        Discover how Grab assesses marketing effectiveness using advanced attribution models and strategic testing to improve campaign precision and impact.
        -
        -
        - -
        -
        - - - - - - - - - - - - - -
        -
        -
        - - - - -
        -
        -
        -
      • - -
      • -
        - -
        - - No version left behind: Our epic journey of GitLab upgrades cover photo - -
        - -
        -
        -
        - - - -

        - No version left behind: Our epic journey of GitLab upgrades -

        -
        Join us as we share our experience in developing and implementing a consistent upgrade routine. This process underscored the significance of adaptability, comprehensive preparation, efficient communication, and ongoing learning.
        -
        -
        - -
        -
        - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        -
        -
        - - - - -
        -
        -
        -
      • -
      diff --git a/search.html b/search.html index f1b69b9c..96a7ba16 100644 --- a/search.html +++ b/search.html @@ -143,6 +143,26 @@

      Search Results