Geographic Scoring: Leverage local differences in Google Ads
with the help of Machine Learning

Geographic Scoring: Leverage local differences in Google Ads with the help of Machine Learning

Home » Geographic Scoring: Leverage local differences in Google Ads with the help of Machine Learning

If your Google Ads campaign has an average conversion rate in Spain of 1%, and in Avilés there have been 0 sales after 60 visits, should we invest less in Avilés? Is this locality worse than the average or have we just been unlucky? And if in Trujillo there has been 1 sale after 40 visits, should we invest more? Have we been lucky or is Trujillo better than the average?

SUMMARY

Proper segmentation of Google Ads investment based on geographic differences in conversion rates helps maximize the return on investment (ROI) of campaigns. But when there is not enough data for a locality, instead of the observed conversion rate, we must use an estimated rate. In these cases, it is common to use the average conversion rate observed at the country level (or at the regional or provincial level) as an estimate. However, this tends to assimilate the behavior of rural areas and small towns to that of larger urban centers, even though their behaviors can be very different, and the estimation error is significant.

At Fáktica, we have developed 3 machine learning models to produce conversion rate estimates that are more accurate than simply assuming the same conversion rate for the whole province, region, or country. These models can reduce the estimation error by at least one-tenth, and in some cases by one-thousandth, of the error made by traditional methods, allowing for better resource allocation in each locality and significantly improving the overall ROI achieved by the campaigns.

THE CHALLENGE

One of the greatest challenges in improving lead generation and scoring is assigning part of their value based on their geographic origin.

The most commonly used options to address this issue in Google Ads campaigns are the “flat rate” approach (that is, assuming homogeneous behavior at the country, region, or provincial level) or relying on Google’s Smart Bidding. However, when data is scarce, Smart Bidding is less smart than we would like, as it also tends to treat all localities within the same geography equally.

This means that, for example, in a region like Madrid, where the vast majority of the population lives in the capital, the behavior of all localities is associated with the capital. In reality, though, a small municipality like Villaconejos is expected to behave more similarly to Borox (Toledo), which is only a few kilometers away and has a very similar size and socioeconomic profile. At Fáktica, we have sought a better solution by leveraging Machine Learning (ML).

The performance improvement that can be achieved through proper geographic scoring is substantial: identifying the most and least profitable geographic areas allows us to adjust our investment and bidding strategies, prioritizing the territories where we expect better results, and significantly increasing the profits generated by the campaigns.

The challenge arises in those territories where available data is limited due to their small size or the short duration of the campaign. In these cases (which in practice occur in most accounts), directly calculating performance indicators is not valid due to a lack of statistical relevance. Therefore, we set out to improve the forecasting of these indicators for such territories using machine learning. The basic idea is shown in this figure:

Conceptual illustration of the adjustment process

Assuming that the indicator depends on a particular characteristic of the territory (population, average age, latitude, per capita income, distance to a large urban center, etc.), learning from the available data—both from statistically relevant territories and non-relevant ones—allows us to uncover the hidden relationship between the indicator and the characteristic, providing us with an estimate of the true value of the indicator that is expected to be more reliable than the direct data.

THE MODELS

To achieve our goal—a better forecast of each locality’s performance—we have developed four models that compete with each other:

Best Naive: An estimation method based on relatively simple statistical techniques developed by Fáktica, which do not require the use of ML. It is based on “filling in” data for localities whose results are not statistically significant, using data extracted from higher administrative levels, following a Russian doll scheme.
Neural Network: Multilayer perceptron. This is a supervised machine learning algorithm that learns a function by training on a dataset. Given a set of features and a target, it can generate a non-linear function approximator for classification or regression. Its main advantages are: (1) the ability to learn non-linear functions, and (2) good generalization in noisy environments, making it suitable for cases with complex feature spaces. Disadvantages include: (1) a non-convex loss function, so different random weight initializations may lead to different validation accuracies, (2) it requires tuning several hyperparameters like the number of hidden neurons, layers, and iterations, and (3) it is sensitive to feature scaling. The model implementation is based on the open-source scikit-learn package.
XG Boost: XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost sequentially builds decision trees to correct errors made by previous trees, improving predictions by capturing complex patterns in the data. XGBoost incorporates regularization techniques to prevent overfitting, provides feature importance insights, automatically handles missing data, and supports parallel and distributed computing for scalability.
Widely used across various domains, XGBoost is valued for its versatility, high performance, and ease of use.
It offers some advantages over neural networks for problems with many degrees of freedom: it is often more interpretable, generally faster to train on small datasets, more robust to overfitting, and uses fewer hyperparameters.
On the other hand, its ability to learn non-linear relationships is lower compared to neural networks.
Clustering: ML method fully developed by Fáktica. Cluster analysis methods group data points into clusters based on the similarity of their characteristics. These methods are used in unsupervised learning, where the goal is to discover structures or patterns in unlabeled data. Cluster analysis is useful in data exploration, pattern recognition, and segmentation. The prediction of unavailable data points consists of assigning them to the most similar cluster based on the same criteria used to build the clusters.

In our case, we have developed a method inspired by conventional cluster analysis. We build a cluster for each territory, adding the most similar territories until the cluster has enough data to achieve statistical relevance for the parameter to be predicted. The performance indicator value of the cluster is taken as the model’s estimate for that territory.

RESULTS ACHIEVED

We conducted a comparative study across several of our clients’ accounts. Whether in the presence of strong or moderate patterns or in highly noisy environments, the three ML methods deliver reductions in estimation error by one to three orders of magnitude compared to more conventional methods, such as assuming a uniform conversion rate across the entire country or province.

COMPARATIVE RESULTS: Squared error made by each method for five different cases

CONCLUSION

The four methods developed offer significantly better predictions than assuming the same conversion rates at the national or provincial level.

Among the three tested ML methods, XGBoost (XGB) consistently delivers the best performance across all types of contexts.

Artificial Neural Networks (ANN) are more fragile: while they perform very well in the presence of strong signals, they struggle more when patterns are faint.

CLUSTERING, the method developed by Fáktica, is less accurate in ideal, noise-free scenarios, but has proven to be robust and less sensitive to noise. This makes it competitive with ANN and XGB in noisy contexts, while also offering better interpretability of the results.

Finally, a surprising finding is the strong performance of the “best naive” statistical methods in highly noisy scenarios. These relatively simple models, which do not rely on ML, can compete head-to-head with more complex approaches under certain conditions.

CONVERSION PREDICTION IN EACH LOCALITY: Comparison of 6 methods

Are you interested in using geographic scoring in your digital marketing campaigns? Contact us with no obligation—we’d be happy to help you.

By the way, this technology was initially developed for Google Ads, but in later versions we have adapted it to other channels: Microsoft Advertising, Meta Ads, LinkedIn Ads, TikTok, achieving similar results.
No matter which channel you use, we adapt to your needs.