Ad Tech Insights Methodology
What is Ad Tech Insights?
Ad Tech Insights is a suite of reports detailing Ad Tech industry trends. Currently it hosts three reports:
Previously hosted by ServerBid (which Adzerk founded), the HBIX has moved to Adzerk's Ad Tech Insights portal. It tracks Header Bidding adoption and vendor usage across the same 1,000 US sites over time.
GDPR has prompted many publishers to use Consent Management Platforms for storing consent and passing that to programmatic partners. This tracker analyzes how many companies in the Top 10K US and UK sites use either IAB-registered CMPs or other 3rd-party consent tools.
Ads.txt is an IAB initiative to help fight ad fraud. It's a text file that publishers host on their servers that list what companies are authorized to sell or resell their inventory.
What is Adzerk Pro?
While the aggregated data is free, companies looking for a site-level breakdown of the data have the opportunity to access it with Ad Tech Insights Pro. Learn more about that here.
For the CMP and Ads.txt trackers, when you filter by 'publishers only', what does that mean?
For the CMP and Ads.txt trackers, we broke down the adoptions graph into two buckets: all sites and just sites that show programmatic ads. Why? Because some sites would likely never show programmatic ads or care about collecting consent (say, Wikipedia.org). Therefore, it's more interesting to look at the adoption rates for just programmatic publishers.
This list is manually compiled by identifying sites that are making an ad ping to an exchange/network, that do header bidding, that use an IAB 3rd-party CMP, and/or have an ads.txt tracker.
Header Bidding Industry Index (HBIX) Methodology
The Header Bidding Industry Index tracks header bidding adoption - as well as what vendors publishers are using - across the most-trafficked US sites that do programmatic advertising. It follows the same 1,000 sites over time, much like how the S&P 500 works.
We compiled the list with the following method:
1. We first pulled the Top 5K US sites according to Alexa in August, 2017
2. Next, we removed any sites that didn't show ads, used only an in-house ad platform (like Facebook/Yelp), or required a log-in to access a page with ads. We then whittled the list down to just the Top 1K sites (by monthly visitors)
3. Additionally, we manually compiled a list of 700+ endpoints that indicated a site was making a header bidding call (and to which vendor)
4. Every month we look at each site in the list and record what endpoints they are pinging 5. Periodically we will update this list to account for ranking fluctuations. The last time we did was November 2018
1. We analyze more than just each URL's homepage (such as an article sub-page). Not doing this would underreport adoption by about 5%
2. We run the tool about seven times over the span of a couple days using an US IP. We also mix up the user agent, so we can track both mobile and desktop pings (but not in-app)
3. We also check the source content of JS files to look for open-source code
4. We curated the list so that all duplicate sites are thrown out. For instance, we would not include both 'Twitter.com' and 'Twitter.co'
5. Unless otherwise noted, for the bidder and wrapper breakdown, the percentage denominator is for all sites in the HBIX (1,000). So when we say 15% of sites use a certain adapter, that's 150 of the 1000 sites, not 150 of the ~750 that do header bidding
6. For the "average bidders per site" data, the denominator includes only companies that do header bidding
7. There will be a discrepancy between total number of wrappers and # of sites doing header bidding. This is because some sites use multiple wrappers (including multiple client-side and server-side wrappers)
8. We considered a site to do header bidding if it made a call to one or more header bidding partners. We did not count sites that have header bidding code but were not actively making calls
9. Some sites only "do" header bidding via a network's JS code (networks include Disqus, 33Across, and SRAX). We decided to exclude these sites, since the publisher may not even know they are doing header bidding
10. We ignored Google/DFP in the analysis
11. A client-side wrapper is a container that holds the codes for 2 or more header bidding adapters
12. Individual HB tags - like Criteo's - are not considered wrappers, since they contain only their codes. These are included in the bidder breakdown, but not the wrapper breakdown
13. When we say "Proprietary" or "Custom" wrapper, we are referring to home-grown solutions not based on Prebid.js
14. A server-side endpoint is a header bidding endpoint that pings additional exchanges server-side
15. We are not able to track what exchanges those solutions then ping, however
16. Identifying this 100% correctly is tricky, as it's not always possible to know if a given exchange's endpoint is hitting just the exchange's demand, or their demand PLUS server-side integrations
17. Some header bidding vendors - such as DistrictM and bRealTime - are AppNexus aliases. Because of this, it becomes tougher to identify them, because we can't differentiate between a standard AppNexus call and them. This means that AppNexus aliases are likely undercounted in our report.
18. As mentioned in #9, we remove sites that only "do" header bidding via a network's JS code (which we call a "network wrapper"). Some sites use both an actual wrapper and a network wrapper, though. For instance, Site A may use Index Exchange's wrapper but also use the Disqus widget (which does header bidding) in their comment sections. In these cases, we have removed the network wrapper from the report, but did not remove any bidders. This is because it's not feasible to separate out which bidders live in each wrapper.
19. Followers of HBIX when it was on ServerBid may notice the absence of our server-side/hybrid breakdown. This is because Amazon recently updated their endpoints, making it impossible to differentiate between a standard call and a TAM call. Because of this we have deprecated any graph that relies on Amazon TAM, which includes the S2S wrapper breakdown and the breakdown by header bidding type (client-side vs server-side vs hybrid).
Consent Management Platform (CMP) Tracker Methodology
CMPs are a relatively new ad tech term and have arisen thanks to the General Data Protection Regulation. They are a way to track consent and show programmatic ads in a GDPR-compliant manner. You can read more about them in our blog post "CMPs: The Definitive Guide".
While CMPs will differ by company, the more robust ones share similar qualities, including:
1. Being able to sniff the user's location and show or not show a consent prompt
2. Track whether the user has consented or not
3. Track what type of data the user has approved
4. Track what vendors the user has given permission to share data with
5. Based on 2-4, integrate with ad server/programmatic partners to determine whom to source ads to
6. Allow for enablement of data rights (such as being deleted)
7. Analytics on all of the above
How we built the CMP Tracker
1. We first manually built a list of URL endpoints that signify the publisher is using a CMP and which one. This list includes over 500 expressions, including the IAB URL formatting, open-source code from AppNexus and Axel Springer, WordPress plug-ins, and miscellaneous other vendors
2. Next, we pull the Top 10K US and UK sites using Amazon Alexa's API. This list is updated every 2-3 months to account for traffic fluctuations
3. Finally, we look at every site in the list using multiple geo IPs (France, Spain) to see if they are pinging any of the CMP endpoints and which ones
While doing the research, we identified five main types of consent collection tools:
1. IAB-Registered Consent Management Platforms: these integrate with the IAB-list of vendors, enable company-level consent, and in general offer more complexity than other solutions. Most are 3rd-party vendors, but some are individual publishers/media groups that wanted to certify their in-house solution
2. Other 3rd-Party Consent Tools: these are consent collection tools not registered with the IAB. They vary in complexity, with some enabling company-level consent, while others are just basic cookie notification banners, such as WordPress plugins
3. In-House Code Using an Open-Source Solution: these are pubs or media companies that built their own consent tools using an open-source solution like AppNexus or Axel Springer
4. In-House Code Using the IAB 'vendorlist' File: these are pubs or media companies that built their own consent tools using the IAB 'vendorlist' file, effectively building their own CMP using the IAB framework
In the report, we are tracking #1-#4. We exclude #5 because the goal of this report is to track 3rd-party CMP adoption, not whether sites are asking for consent at all (nearly all are). This methodology does mean that some 1st-party solutions will be included, though, if (1) they use open-source code or (2) they are registered with the IAB / use the IAB 'vendorlist' file. These two buckets account for less than 5% of all CMP usage, though.
Since adoption isn't at 100%, does this mean other sites aren't tracking consent?
Not at all. We are tracking 3rd-party usage, and many publishers have written their own consent-collection code. Therefore, if we say 10% of UK sites use a CMP, we aren't necessarily saying that only 10% of sites ask for cookie tracking consent; just that only 10% of sites have chosen to use a 3rd-party tool.
1. How we pull data: We scrape just the desktop homepages of the sites on the list. We run the tool multiple times, using a mixture of EU IP addresses
2. Multiple CMP codes: Registered IAB vendors use a specific endpoint URL like 'quantcast.mgr.consensu.org'. However, in doing our research, we found that some of these IAB vendors had other CMP codes too (likely due to building their CMP before registering with the IAB). In compiling the data, we decided to group by vendor, not by endpoint, meaning that when we say a vendor is IAB-registered, some of their instances may come from endpoints that are not in the IAB format
3. Multiple domains: Some sites in our list may redirect to the same place (such as Twitter.co and Twitter.com). Other sites may have different domains for different countries (like, CNN.com and CNN.gr). Due to the complexity of identifying duplicates, as well as the fact that a site with multiple country versions may use different products on different domains, the data does not de-dupe based on publisher name. Instead, we analyze adoption rate by URL and thus treat CNN.com and CNN.gr as two different sites
4. Publishers with multiple CMPs: In a few cases (< 3%), some sites had multiple CMP codes. This means the total number of CMP users will be lower than the total instances of CMPs seen
Ads.txt Tracker Methodology
Unlike our CMP and HBIX trackers, this one is pretty basic - we scrape the Ads.txt file (https://www.site.com/ads.txt) of the domains in the Top 10K US and UK site list, and then parse the results. These lists were built using Amazon Alexa's API and are updated every 2-3 months.
Some misc methodology notes include:
1. De-dupes: For a given domain, if the same vendor/seller type combo appears in multiple records, we de-dupe them
2. Geo breakdowns: Some Ads.txt files include vendor breakdown by location. This tracker doesn't break that down
3. Publishers with different domains: Some sites in our list may redirect to the same place (such as Twitter.co and Twitter.com). Other sites may have different domains for different countries (like, CNN.com and CNN.gr). Due to the complexity of identifying duplicates, as well as the fact that a site with multiple country versions may use different products on different domains, the data does not de-dupe based on publisher name. Instead, we analyze adoption rate by URL and thus treat CNN.com and CNN.gr as two different sites.
4. Aggregated by Vendor: Some sellers may have more than one domain - or the domains were incorrectly written by the site - so we have aggregated by company brand name, not domain, in the vendor breakdown.
5. Ads.txt File, No Rows: If a site has an Ads.txt file but it's empty, we have NOT included them in the analysis.
Header Bidding Raw Data (Beyond HBIX)
If you purchase the raw header bidding data beyond the HBIX, there are some minor additional notes about our methodology:
1. We did not remove sites that did header bidding only via a network widget. We did this with HBIX since we feel it's more telling about actual HB adoption to exclude them. But for the raw data, we want to err on more data, and thus have kept them.
2. We are not looking at sub-pages for the Top 10K US sites (unless those sites overlap with the HBIX). We expect this will under-report HBIX usage by about 5%, since some sites won't have ads on the homepage, but will on a sub-page.
3. A small percentage of sites (<5%) ignore our tool and don't show it ads. For the HBIX we manually visit these sites to make sure month-over-month data is accurate; however, for the much larger list, we are not doing this.
4. We continue to check pings using an US IP and with multiple user agents (both mobile and desktop)
5. We are checking sites with the format, "http://www.site". In <1% of cases, a site does not have a "www" redirect if the domain URL is "http://site". As thus, these sites would not appear in the report
6. We occasionally will miss a wrapper for a site if it uses multiple client-side wrappers. This is because it's not feasible to check every site individually, so if we identify that a site is using a wrapper, we won't double-check it (this is compared to if the tool finds a site that has bidders but not a wrapper, in which case we manually check). As thus, we may under-report wrapper usage slightly.
So, Your Data Is Perfect, Right?
We wish! Some reasons the reports will not be 100% accurate are:
1. We may have missed some uncommon expressions
2. There may be false positives
3. We are not immune to typos or mistakes
4. Companies will occasionally update their endpoints/codes, which means the data could be missing until we identify the change
5. We may have checked a site when it was doing A/B testing, had loading issues, etc, which would impact whether the ping happens
6. For the CMP tracker, we are looking at just the desktop homepage for the consent prompt. If the site has the prompt appear only on, say, a mobile or sub-page, then we would not find it (the HBIX tracker, on the other hand, does look at sub-pages)
7. For the Header Bidding raw data beyond the HBIX, see above for various reasons there will be a small number of missing pings
Ultimately, these inaccuracies should be minor and not skew the aggregated graphs. But potentially if you're scouring the raw data, you'll notice something off. If you have any feedback on how to improve our data, please reach out to firstname.lastname@example.org.