Leading Advertising Agency

jidabyte helps implement multimodal website classification

Overview

A leading advertising agency sought to develop a tool to classify websites as MFA (Made for Advertising) or Non-MFA. The goal was to leverage various website features like screenshots, text, HTML code, and metadata to determine if a website was primarily designed for ads or had other primary objectives.

Challenge

The agency faced several key challenges while implementing this classification system:

  • Multimodal Data Integration – Needed to combine diverse data types such as text, images, and HTML to make an accurate classification.
  • Effective Classification – Required a method that could reliably classify websites based on various features, ensuring high accuracy in distinguishing MFA websites.
  • Scalability and Performance – Needed to process a large number of websites quickly and accurately without compromising on performance.

Solution

The following solutions were proposed and implemented to address the challenges:

  • Multimodal Retrieval-Augmented Generation (RAG) – Combined multiple data sources, including website screenshots, text, and metadata, with the RAG design to generate predictions.
  • Traditional Machine Learning Model – Used XGBoost in combination with the RAG design to analyze features and predict whether a website was MFA or Non-MFA.
  • AI Integration with Claude-3.5 and Sonnet – Leveraged advanced AI models like Claude-3.5 Sonnet for better feature extraction and classification from various website components.

Outcome

The proposed solution resulted in the following key outcomes for the agency

  • Accurate Website Classification – The system was able to reliably classify websites as MFA or Non-MFA with high accuracy, improving ad targeting.
  • Improved Data Analysis – By combining multimodal features and machine learning, the agency was able to better understand website structures and marketing strategies.
  • Scalable and Efficient System – The AI-powered tool processed large volumes of websites quickly, enabling fast decision-making and data-driven strategies for ad placement.