(From the March 3, 2007 issue of Thoroughbred Times)
Like most disciplines, the science of breeding thoroughbred racehorses has its own language of descriptive statistics aimed at analyzing and predicting performance and subsequent value. Though a full understanding of this language may seem daunting, there is a core group of statistics that if understood fully, shed light on potential pitfalls and help breeders sift through the hype and improve the performance of their bloodstock portfolio.
Perhaps the most important aspect to understanding statistics in the thoroughbred industry is that numbers are just that, numbers. Mark Twain said it best when he said “There’s lies, damn lies, and statistics”. Regardless of what statistics may or may not infer, in the end, the owner with the best individual will have more success than the owner with poor individuals whose only credentials are statistics. Statistics are best used as supporting evidence of qualityindividuals, rather than a starting point for developing a successful bloodstock portfolio.
One of the most commonly used descriptive statistics is a sire’s average earnings per starter. Unfortunately, it is also one of the least discriminating and most easily skewed of all the breeding statistics. A simple computation, it is derived by taking a sire’s total progeny earnings and dividing it by the number of starters.
Mildly effective as a starting point for evaluating stallions, the average earnings per starter allows mare owners to get a general idea of how a sire compares to his counterparts. But because such a large portion of the stallion population falls into the $25,000 – $40,000 average earnings range, it’s often times non-describtive, telling mare owners little about the class of a sire’s progeny. Also, this statistic is subject to being heavily skewed by the sire’s top earner, best illustrated in the case of Skip Trial, where Skip Away accounts for nearly 30% of his total progeny earnings. Knowing this, Skip Trial’s average earnings per starter of $91,704 can hardly be taken as an accurate indicator of his runner’s quality.
A useful number for breeders wanting to delve into the quality of a sire’s progeny from top to bottom rather than just those in the headlines, the median earnings per starter gives us an amount that 50% of a sire’s progeny have earned more than, and 50% have earned less than. To understand this, imagine a hypothetical situation where a sire has only 11 starters. Individually, they have earned the following amounts:
Starter #1 | 100,000 |
Starter #2 | 85,000 |
Starter #3 | 80,000 |
Starter #4 | 79,000 |
Starter #5 | 68,000 |
Starter #6 | 40,000 |
Starter #7 | 21,000 |
Starter #8 | 18,000 |
Starter #9 | 7,000 |
Starter #10 | 6,000 |
Starter #11 | 3,000 |
In this scenario, the median earnings per starter is $40,000. Exactly half of his progeny have earned less than $40,000, and half have earned in excess of $40,000. A sire’s median earnings is an effective indicator of a sire who gets a large number of poor individuals, assuming we’re dealing with a significant sample size much larger than the example above. If we’re researching a sire and discover his median earnings to be just $7,500, we know that at least 50% of that sire’s progeny fail to pay their way, a strong indication that investors should look elsewhere.
The obvious benefit is that median earnings are immune to heavy skewing by a single runner. The one shortcoming is that a sire lacking racing class in his progeny can achieve an inflated median earnings if he sire’s durable progeny. Though they’re not fast enough to possess class, they aren’t fast enough to hurt themselves either, leading to longer careers that inflate a sire’s median earnings, convincing some that a sire’s foals have more ability than they actually have.
One of the most commonly used indexes to measure the earning power of a sire’s progeny relative to the progeny of other stallions, the Average Earnings Index (AEI) is the average earnings for a sire’s progeny during a calendar year, with 1.00 being the average for the breed. Like the SSI, the AEI allows comparisons of stallions from different time periods, but is subject to skewing by a leading runner. Also, the AEI favors sires who throw durable types who can make more starts during the year, even if they are competing on weaker circuits.
An attempt to measure the quality of mares bred to a particular stallion, the CI (Comparable Index) is the average earnings in a calendar year for foals out of the same mares, but sired by different stallions. For instance, if a group of mares sent to a 1st year stallion had previously produced foals with an AEI of 1.50, that same number would represent the new sire’s CI. The idea here is to assess the quality of mares being sent to any given sire, thereby allowing future interpretations as to whether a sire is improving on his opportunities or simply riding the coattails of his mares.
The primary shortcoming of the CI is that we never know the quality of sires previously bred to a group of mares. Such is likely the case of highly touted stallion prospects like Mineshaft. Many of the mares in his first two books had previously seen the likes of Storm Cat and A.P. Indy. Those opportunities are sure to raise the earnings power of the resulting foals, creating a disproportionately high CI that may create the illusion that Mineshaft is dragging his mares down. But just because he can’t raise his mares to the extent of a Storm Cat or A.P. Indy, shouldn’t be held against him.
The SI (Sire Index) is similar to the AEI except that it measure earnings power based on average earnings per start, and not a calendar year. Like the AEI, the SI also categorizes according to sex and year of birth, but does not allow a group of cheaper, durable types to skew the figure.
One of the best illustrations of how the SI differs from the AEI is Airdrie Stud’s Indian Charlie. A known source of unsound individuals, Indian Charlie’s SI is 2.29, more than double the breed average, indicating that he can sire talented individuals based on his progeny’s relative earning power per start. But when we employ the AEI, measuring his progeny’s relative earning power over a calendar year, the figure drops to 1.91. Not surprisingly, Indian Charlie’s sire, In Excess (Ire) emulates this pattern very closely.
Used in conjunction with a sire’s SI to measure the quality of mares being sent to a stallion, the ComSI (Comparative Sire Index) is based on average earnings per start of foals out of the mares bred to a particular stallion, but sired by different stallions. Like the CI, it is intended to help breeders decide if a stallion is improving upon his opportunities, or simply riding the coattails of his mares. The same problems that exist within the CI also pertain to the ComSI in that we don’t know what caliber of stallions these other foals were sired by. The primary difference, as in the case of the AEI and SI, is that the ComSI is based on average earnings per start, not over a calendar year.
Handicapping tools and vocabulary are slowly permeating the psyche of breeders, consignors, and buyers. Sales catalog supplements are at the forefront of this trend, offering buyers standardized ratings for individuals in the immediate female family. After all, the same information that helps handicappers predict the outcome of a race should help breeders in assessing racing ability in potential breeding stock.
Beyer Speed Figures (named after its creator, Washington Post columnist Andrew Beyer) aim to interpret a horse’s performance as functions of class and track variants. Average winning times within a certain class at a specified track ($40,000 open claimers at Del Mar, for example) are computed in order to establish a baseline whereby horses running below or above the average are assigned corresponding speed ratings. Track variants are added to the equation to account for time periods where a track was particularly fast. Beyer Speed Ratings range from the 120’s for grade 1 caliber horses down to the 40’s and 50’s for horses at the lowest levels of American racing.
Ragozin Numbers are probably the least understood of all the statistics currently available to breeders, even though handicappers and trainers have been using them for decades under their more commonly known name, The Sheets. Ragozin Numbers calculate an individual horse’s effort in a particular race as a function of time, track condition, weight carried, wind, and traffic difficulties (such as being boxed in or forced to run wide). Ragozin Numbers are unique in that lower numbers correspond with higher racing class. Ragozin numbers in the 20’s are commonly associated with bottom-level claiming horses, while the best horses in this country usually post Ragozin Numbers in the low single digits. The average Ragozin number for grade 1 races in this country is 1 to -1.
Two of the more commonly interchanged terms in the business are stakes winners and black-type winners. Black-type earners are just that, runners who have won or placed in stakes events that qualify for bold type under current cataloguing standards. Since January 1, 2004, only races with a minimum purse of $40,000 have been credentialed as true black-type events.
Black-type should not be confused with the more loosely used terms, stakes winner or stakes-placed. These latter terms do not adhere to domestic cataloguing standards and can apply to events run for as little as $5,000 on small regional circuits. This is typically one of the most abused numbers in the industry. Depending on what publication you’re reading, the percentages may or may not include actual black-type races. If an industry newcomer can learn just one thing early on, he or she would be wise to have an understanding of what actually constitutes a black-type event, and how stakes production numbers are often times inflated.
As is the case with all statistical inferences, it is the responsibility of the user to become familiar with the methodology and language behind the numbers, as well as the strengths and weaknesses in accurately describing a phenomenon. Only after breeders have familiarized themselves with the appropriate statistical language, can they start using it to effectively to avoid poor bloodstock investments.