“Alternative data” recently became a buzzword. Asset managers are now actively using data initially intended for marketing, surveillance, supply chain management, etc., in their investment decision-making process. Hedge funds already spend more than $170 million annually on alternative data, and this spending is expected to exceed $7 billion annually by 2020.
Quantitative asset managers were natural early adopters since they were accustomed to data-driven investment process. Big quant firms employ dedicated teams to source and analyze alternative data sets; some even build their own data evaluation platforms.
Despite the growing demand, alternative data is not necessarily an easy sell. For alternative data vendors and companies that consider monetizing their data exhaust, it is important to understand how asset managers evaluate and use data. Value proposition for a proprietary quantitative trading firm may be very different from that for a discretionary long-term fundamental hedge fund manager.
For data vendors who are specifically looking to sell their data to quants, it is worth considering the following aspects.
Pre-processing data is not always a value-add. Commercialized data must be clean, consistent, accurate and well structured. But there is a side effect of “overcooking” a data product. Basic aggregation, such as sample averaging or time series averaging, may lead to losing critical insight quants are looking for. While trading teams with fewer internal resources may be interested in pre-calculated statistics, quant firms with in-house data engineering teams will probably prefer raw data. Ideally, data vendors should be able to offer both raw and pre-processed data.
The more factors, the better. Vendors often pitch ready-to-use predictive signals and only deliver final composite scores based on their internal methodology. Meanwhile, there could be many ways to combine underlying factors, some more useful to the client than others. Providing as many interim metrics as possible can enhance the data set’s value for quants.
Quants use historical time series to establish and test their investment hypothesis; it’s essential to have at least 4–5 years of consistent historical data. However, history is useless for trading without daily updates, so it should be available free of charge.
Beware of backfilling. In order to perform an accurate back test, the asset manager needs to know exactly when each data point became available. Backfilled data is inherently inferior to that collected in real time, so if a vendor backfills missing historical data, they should be very transparent about the process and assumptions.
All data revisions must be recorded. For example, in the earnings announcements, not only must changes to earnings per share values be recorded but also changes to expected or past announcement date. Maintaining accurate version control and methodology changes is critically important.
Universe covered dictates target client segments. If a data set covers a dozen names, it will not be a perfect fit for a statistical arbitrage strategy trading a couple thousand names on a daily basis. A vendor whose data product has limited coverage may want to turn to sector-focused funds with more concentrated portfolios.
Accurate mapping to security identifiers is often the most important part of transferring data to the financial domain, yet some vendors tend to treat mapping as an afterthought. Exchange tickers symbols, often vendors’ first choice for the identifier, change rather frequently to reflect corporate actions. Thus, they are usually historically less reliable than, for example, ISINs. Mapping data to universal identifiers other than ticker symbols can make a data set easier to work with.
Not everyone is after untapped data sets. There is a perception that every quant wants to be the first one to use the data set, and that limiting the number of users always creates scarcity value while overcrowded data sets are less interesting. In reality, tolerance to data set overcrowding is predicated on the manager’s investment horizon and the time it takes for a signal to get reflected in market prices. It took the market almost two decades to digest I/B/E/S, the most obviously pertinent quantamental data set, despite it being widely available. Of course, funds with certain strategies may be interested in undiscovered data, but when selling to quants, novelty and exclusivity are not always the biggest value-add. Some quants may actually prefer widely used data sets.
Marketing and distribution. Many quants are proactively sourcing alternative data sets, and they are easy to reach and start a dialogue with. However, data sourcing teams evaluate dozens of data products, looking for those additive or superior to the data they’re already using. A data product needs to have a clear value proposition, but it is equally important to make the product visible for potential clients. Making basic data features (such as length of history, coverage, and updates frequency) public will not be a big issue for a vendor from the intellectual property protection standpoint, but it will make the data set appear in prospective clients’ searches. And overly secretive data sets gated behind distribution intermediaries and restrictive non-disclosure agreements may just put the data at the back of the evaluation queue.
Quants are arguably the most sophisticated and demanding clients, and their feedback can provide great insight for data providers to improve and develop their product. Being upfront about the data weaknesses, keeping clients updated about material changes and listening for feedback is the way to build a high quality product and maximize data monetization potential.