cancel
Showing results for 
Search instead for 
Did you mean: 
CommunityMod_WM
Community Manager
Community Manager

In this article, we will delve into the metrics available during and after the A/B test, the meaning of the outcome produced, and how to interpret the high or low statistical probability associated with the outcome. The following topics are covered:

 

The Metrics Table and Conversion Rate Visualization

After each day that shopper visits are qualified into the A/B Test, the experiment data is updated and available to view and analyze within the A/B Test Details page. As with all enhanced content data collected by Syndigo, this information is preserved and available at any point after the test concludes, as well.

While the test remains active, only the Metrics Table is visible on the A/B Test Details page (although raw experiment data may always be exported from this interface). This table provides the following KPIs of importance to enhanced content subscribers:

  • Unique Visits – The count of distinct shoppers who were qualified into either cohort A or B.
  • Sessions – The count of page loads performed by shoppers, with a maximum of 1 session per 1 hour.
  • Conversions – The count of add-to-cart events or purchases performed by shoppers.
  • Conversion Rate – The frequency by which shoppers either add product to cart from the product page or complete orders, calculated as Conversions / Sessions.
  • Units – The number of products added to cart from the product page.
  • Price – The product price at the time of conversion.
  • Revenue – The total value of all sales during the experiment, calculated as Units * Price.
  • Average Order Value – The average value of all conversions, calculated as Revenue / Conversions.
  • Viewable Impressions – The count of sessions resulting in shoppers viewing the enhanced content.
  • Clicks – The total count of shopper clicks on enhanced content.
  • Interaction Rate – The frequency by which shoppers click at least one time on content during their visits, calculated as Clicks / Sessions.
  • Total Time on Page – The total count of time all shoppers are active on product pages, in seconds.
  • Average Time on Page – The average amount of time shoppers are active on product pages, in seconds, calculated as Total Time on Page / Sessions.

Each metric is captured with its unique value for the Content A and Content B cohorts, and the difference between these two values is presented in the last column of the table. Each metric may also be further expanded to reveal the breakdown at each retailer website from which experiment data was collected.

After the system produces the outcome of an A/B test, an additional visualization is provided on the test details page: A chart that helps represent the all-important conversion rate KPI from two perspectives – over time and by retailer website.

  • The "Time" view is a line graph where the conversion rates of content A and content B, each on their own distinct line, are plotted over each day that experiment data was collected. This enables A/B testers to potentially identify trends that emerged during the experiment or even spot when shopper conversion behavior may have been influenced by external factors.
  • The "Website" view is a bar chart that displays the average conversion rate of A and B per retailer website from which experiment data was collected. This visual can be utilized to quickly spot where the winning content version may differ across retailers, information that may be crucial when developing plans to optimize content targeted to specific retailers.

 

Exporting Data for Analysis

Within the context of a specific A/B test, exporting the raw underlying data is possible both during testing and any time after test conclusion. As with all Syndigo enhanced content, experiment data is aggregated once daily (overnight). Data becomes available after it is processed overnight for the day prior.

To export A/B test data:

  1. Navigate to the A/B test details page in the Content Experience Suite and click the "Export" button in the upper right corner.
  2. A modal is presented that provides three options to select an export type: URL Level, Widget Level, and Asset Level.
  3. Select the desired aggregation level and click the button within the modal to perform the export. A direct download of the associated CSV file will be initiated.

The export types: URL, Widget, or Asset – These options refer to the aggregation level of the enhanced content metrics.

The highest level in the data hierarchy is the URL Level. Metrics such as visits, impressions, and clicks are rolled up to represent a maximum of one per page load and there is no further breakdown across multiple layouts/sections, individual widgets, or assets on the product pages.

Format of data at the URL Level:

  • Each record (row) in the exported data represents each distinct combination of Date + URL + Variant (A or B) found in the experiment data.
  • The fields (columns) are as follows: Content Version, Site Name, Page ID, URL, Unique Visits, Sessions, Carts, Units, Revenue, Total Time On Page, Price, Cart Rate, Average Time On Page, Average Order Value, Clicks, Viewable Impressions, Date

Widget Level provides the same metrics broken down to each individual widget or module in the enhanced content. This data set includes more detail about which widgets specifically shoppers viewed and clicked.

Format of data at the Widget Level:

  • Each row represents each distinct Date + URL + Variant + Widget ID.
  • The columns are as follows: Content Version, Site Name, Page ID, URL, Experience, Widget Index (number representing the order that the widget appears in the layout), Widget Type, Widget Views, Widget Clicks, Video Plays, Video Play Rate, Video Completions, Video Completion Rate, Date.

Asset Level is the last option and represents the deepest enhanced content data available. Within each page there may be multiple widgets, and within each widget there may be multiple assets. Choose Asset Level to see the metrics broken down to specific images, video files, etc.

Format of data at the Asset Level:

  • Each row represents each distinct Date + URL + Variant + Widget ID + Asset ID.
  • The columns are as follows: The columns are as follows: Content Version, Site Name, Page ID, URL, Experience, Widget Index (number representing the order that the widget appears in the layout), Widget Type, Asset Index, Asset Type, Asset Views, Asset Clicks, Video Asset Plays, Video Asset Play Rate, Video Asset Completions, Video Asset Completion Rate, Date.

 

The Experiment Outcome

When reviewing the A/B test results at the conclusion of an experiment, the first item you may notice is the prominent statement on the details page. This statement captures the overall outcome of the experiment, and will appear as one of the following:

  • "Content (A or B) has a higher conversion rate."
  • "Content A and Content B have similar conversion rates."

Note: If the primary message states that no data is available, please reference the following Help Center article: Troubleshooting: A/B Test availability and data collection.

Directly beneath this first statement, the conversion rates corresponding to each version are displayed. One content's conversion rate is deemed higher than the other when the calculated difference between the two contents' conversion rates is greater than 0.5%. If the difference is less than 0.5%, this results in the outcome that they have similar conversion rates.

This outcome statement provides immediate insight into what occurred during the experiment: whether there was an observed and notable difference in the frequency by which the product was added to cart or purchased by shoppers who viewed content A versus those who viewed content B. As the general goal of conducting A/B testing is to validate which assets and layouts resulted in a higher conversion rate during the course of the experiment, this declaration of the winning content version may serve as the sole insight you derive from your analysis.

However, the other information Syndigo provides in addition to this statement may be crucial in helping your organization interpret the outcome. The remainder of this article will address the concept of statistical probability, which plays a vital role in determining the validity and reliability of the results. Whether the A/B test concluded with a difference in conversion rates or a declaration of similar conversion rates across the two content versions, the Syndigo system presents the calculated likelihood that the outcome is accurate and will be consistent with continuous testing.

Beneath the conversion rates, an additional statement is provided in the A/B test results. This secondary statement is either preceded by a green checkmark or a yellow warning indicator, followed by the probability (captured as a percentage) that the winning content will result in more conversions than the losing content. If the A/B test shows a green checkmark, this means high statistical probability was calculated. If the yellow warning icon is displayed, this indicates a low statistical probability. Based on whether the probability is high or low, please navigate to the appropriate section below to learn more about what this means and the recommended actions to take.

 

High Statistical Probability Calculated (greater than 90%)

When the Syndigo system presents a green icon with a checkmark alongside the secondary statement that includes probability information, this serves as confirmation that the experiment meets the criteria to be considered statistically significant. One of the requirements to reach statistical significance is that the calculated probability, or confidence level, is equal to or greater than 90%.

High statistical probability means the results of an A/B test exhibit a clear and substantial difference between the two variations being tested. In other words, when the observed data strongly suggests that one variation outperforms the other with a high degree of confidence, it is said to have a high statistical probability. This may also be thought of as the likelihood that the outcome of the A/B test will be the same if the test is run again any number of times.

If Content B has the higher conversion rate and the calculated probability is 90%, this means that there is a 90% likelihood that running the experiment again will always result in Content B having the higher conversion rate.

Next steps: When there is a conversion rate difference paired with high probability, confidently choose the winning variation and implement it, knowing that the observed effects are likely genuine and not due to chance.

  • When marking the test confirmed and complete, select the winning content version to be applied and published across all languages and locales targeted by the collection.
  • Content optimization strategy: Expand the design approach of the winning content to a broader set of products.
  • Allocate resources and budget more effectively by focusing on the templates and creative assets that have a high likelihood of success based on the A/B test results.

If the conversion rates of A and B are deemed similar and there is a high probability calculated:

  • Do not assume the tested widgets or assets have no potential to influence conversions. It may be necessary to make more substantial or refined changes and retest to achieve meaningful results. Start by analyzing the content, retailer implementation, and the data to see if the design of the A/B test itself can benefit from further optimization.
    • Navigate to most if not all URLs where the A/B test was executed. Follow this step across multiple devices if possible, i.e. desktop and mobile. If the difference between Content A and B can not easily be recognized within seconds of page load, this points to the shopping experience being too similar to influence conversion rates. It is also possible that this was further impacted by the unique way each website presents enhanced content. Reviewing the experiment data collected – particularly the interaction rate and views of specific widgets and assets – will add further detail to the story of the A/B test.
  • After performing the prior step, it may become evident that the outcomes of the experiment may be different if there is an adjustment made to either the test configuration (perhaps to alter coverage across specific retailer websites) or the content itself (to make the delta between the experiences more impactful). In this case, plan to run a new A/B test for this content, and then conclude the current A/B test by marking it confirmed and complete - selecting whichever content version serves as the best starting point to build the next A/B test. With as little delay as possible, deploy another A/B test with the refinements to the configuration and/or content.
  • If it is not possible to redesign the A/B test, analyze the experiment data to identify which content version is more desirable based on metrics other than conversion. For instance, if content that encourages engagement (clicks) differs between Content A and B, the version with more interactions on specific widgets may point to the better choice. A higher video playback or completion rate may also help measure the impact of the video assets tested. Mark the test confirmed and complete, selecting the content version that performed better in the non-conversion metrics.

In either case where high probability is calculated:

Store the results to utilize in future analysis activities and generate valuable insights about shopper preferences, behaviors, and expectations. A high statistical probability indicates this information should be used to shape and refine future design and marketing strategies. Keep all raw experiment data, which can be exported from the A/B test details page, in an organized repository for advanced analysis at a later date. Potential insights include but are not limited to:

  • The delta between the two different conversion rates can be used to project or estimate the potential impact of executing a content optimization strategy based on the winning variation.
  • The impact to the frequency by which shoppers add-to-cart or purchase can be projected by utilizing the same formula for calculating Conversion Rate Lift in the Enhanced Conversion Lift report. (The underlying data for this KPI is, after all, derived from an A/B test Syndigo runs behind the scenes.) The formula is as follows: (Treatment Conversion Rate – Control Conversion Rate) / Control Conversion Rate
    • Example: Content A is the same content that has been available on the retailer websites for many months prior to the A/B test, which makes A the Control in the experiment. Content B introduces notable changes with net-new creative media such as a 360-degree view, making it the Treatment. If Content A's conversion rate is 3.5% and Content B's conversion rate is 4.2%, we can calculate the cart rate lift associated with Content B as: (0.042 – 0.035) / 0.035 = 0.2. This means that shoppers who see the enhanced content inclusive of the net-new assets convert 20% more often than the shoppers.

 

Lower Statistical Probability Calculated (less than 90%)

Lower statistical probability means the experiment does not meet the criteria to be considered statistically significant. There is a higher degree of uncertainty that the observed effects may not hold true in real-world scenarios. In such cases, the observed conversion rates could easily be attributed to random chance, and there is a lack of confidence in the validity of the results.

However, do not assume that a probability less than 90% means the A/B test is a failure. The standards for reaching statistical significance are quite high, and it's primarily the basis for experimentation in academic or scientific industries where outcomes must be backed by over 95% statistical probability. Very few organizations employ a strictly scientific approach to digital product marketing, where there is no risk to the audience's safety and wellbeing. Rather, there is another, simpler way to interpret the outcome of an A/B test:

The A/B test results show there is a n% likelihood that running the experiment again will result in this same outcome.

For example, in the scenario where Content B's conversion rate is higher than Content A and the calculated probability is 84%, the outcome of this experiment can be interpreted as follows: "There is an 84% likelihood that running the experiment again will result in Content B shoppers converting at a higher rate than Content A shoppers. That means there's a 16% likelihood the results will be different."

Or if the Content A and B conversion rates are similar, and the calculated probability is 56%, then this scenario can be summarized as: "There is a 56% likelihood that running this experiment again will result in Content A and B shoppers converting at a similar rate, and a 44% likelihood the results will be different."

In the first scenario, most eCommerce strategists would agree that 84% is more than enough likelihood to justify concluding that Content B is essentially the winner of the A/B experiment. There is only a 16% chance that running the experiment across a larger sample size of shoppers will result in a different outcome. However, in the second scenario, there is a sense that it is a coin toss: It is just as likely that running the experiment again will produce the same outcome as it may produce a different outcome. Depending on perspective and an organization's appetite for risk, the probability – while not over 90% - still can guide on the next steps to take in the journey.

The primary reason tests encounter low statistical probability is that the experiment did not collect enough data. More specifically, too little data is usually the culprit behind the following characteristics of low probability results:

  • Small sample size: Less than 2,000 - 3,000 unique shoppers is generally not sufficient to calculate high confidence. In order to qualify visit data into either the A or B cohort, the shopper must be recognized as a new visitor to the product page. Visitors returning after having seen content before the test started are automatically excluded, as their behavior would skew the experiment results.
  • Small effect size: Low statistical probability often corresponds to a small effect size, meaning the difference between shopper behavior across both variations is negligible or inconclusive.
  • Inconsistency in shopper behavior: The results may vary significantly across different days and URLs, making it difficult to draw meaningful conclusions about whether content variations are influential.

Designing the A/B test to collect as much data as possible is the best way to avoid encountering low statistical significance.

Next steps:

Do not jump to the conclusion that a difference in conversion rate, when paired with low statistical probability, points accurately to the superior and inferior content variation. Rather, this underscores the need for iterative testing and optimization.

  • In most cases, extending the current test to run for a longer period of time and collect more data is recommended. It is not guaranteed that a larger sample of shoppers will result in higher probability or confidence level, but the most common reason why low statistical significance is calculated is due to there not being a large enough volume of data points. More shoppers means more experiment data that can potentially tip the scales to one content version or the other. Click the "Extend Test" button on the test details page and select a future date before which the experiment is most likely to collect at least 2,000 more unique shoppers.
  • A probability considered extremely low is almost always caused by too little traffic to the URLs where the A/B test is available. Extending the test may not be sufficient by itself to collect the sample size required for higher confidence.
    • Check that all relevant URLs have been added to the product and are not in an error state. Add more URLs or work with Syndigo support to resolve any errors found.
    • When there are no additional URLs to add, conclude the current test by marking it confirmed and complete and then immediately deploy another A/B test that targets more retailer websites. Reach out to a Syndigo representative to identify which websites are driving the most traffic to similar products across the extensive retailer network.
  • If extending the test or deploying a new A/B test is not possible, utilize professional judgement combined with organizational guidelines to:
    • Determine whether there is enough information to conclude a winning content version has emerged. Consider establishing a threshold aligned with your eCommerce strategy that enables confident decision making: Perhaps a probability greater than 75% is sufficient to impact content optimization strategy.
    • Review general best practices for enhanced content and the hypothesis going into the experiment to guide on which content to keep. It is well accepted that longer sessions are associated with higher conversions and attributable to the presence of engaging and interactive content,. Preserve the content version that contains the optimal, best-in-class widget types.

In all cases: Low statistical probability results should prompt organizations to delve deeper into the underlying factors affecting user behavior. Further research may be needed to uncover insights that were not initially apparent.

 

In conclusion, high and low statistical probability are essential concepts in the realm of A/B testing. High statistical probability signifies strong confidence in the observed effects and enables confident decision-making, while low statistical probability calls for further investigation. Understanding these concepts empowers organizations to make informed choices and optimize their strategies based on reliable data analysis. As technology and methodologies continue to evolve, mastering the art of interpreting statistical probability will remain a critical skill for data-driven decision-makers in various industries.

Version history
Revision #:
2 of 2
Last update:
‎04-09-2026 04:06 PM
Updated by:
 
Contributors