The Golden Middle of Alternative Data Productization

3 min readApr 9, 2019

A big part of a data vendor’s value add comes down to productization, i.e., creating a data product that is usable and meets the end user’s needs.

But often data vendors have an incorrect perception of what the end user is looking for, and tend to run to one of the two extremes: They either sell raw, messy data with minimum (if any) pre-processing and some major quality issues to fix, or they attempt to create ready-to-use trading signals and composite scores.

Of course, over-processing data is risky, as it may accidentally remove the insights that traders are looking for. And yes, hedge funds who are serious about using big data and machine learning have capabilities to work with raw data. However, it takes way too much time and effort to transfer the data into the digestible format to even evaluate the alpha-generating potential of the data set and make an informed decision on whether the data set is worth buying.

On the other hand, ready-to-use trading signals extracted and back-tested by the vendor require zero preparation. But there are two major problems: Vendor-produced signals are commercially available and are prone to the overcrowding effect, and more importantly, building alpha-generating investment strategies may not be (and often clearly is not) the strongest skill of a data vendor. A good data vendor is a team of data experts and domain experts in the data category, but they are not (and are not supposed to be) good portfolio managers. Vendors who are trying to do a hedge fund manager’s job will end up missing opportunities to structure the optimal value proposition.

Instead, a data vendor should focus on what they are good at and complete the data preparation work to make easier for the data buyer to evaluate and use the dataset. It is the vendor’s job to guarantee the quality of the data and to extract as many features and statistics describing the data set as possible. Then it is the portfolio manager’s job to decide what features are useful in the context of the particular portfolio and how these features can be combined with other sources of information to create a trading signal.

It is now a very well-known fact in the industry that most of the asset managers who are active alternative data users spend around 80% of their time preparing data for analysis versus 20% on actual data analysis. This is because every data buyer has to reinvent the wheel and extract the same features from the same commercially available data set. Meanwhile, the number of features covered by a data set is usually finite, and extracting them requires domain knowledge, not investment expertise. So, a data vendor is much better positioned to cover this part of the process than an asset manager.

The golden middle of alternative data productization is a combination of raw data with a comprehensive set of pre-selected features. Not the features that predict market moves and imply trading decisions; but the features that merely describe the dataset itself, regardless of whether they can be directly used as trading signals or alphas.

Combining, complementing, interpreting and overlaying those features to come up with a meaningful investment signal is the work to be done by the hedge fund’s data team in collaboration with portfolio managers. This part of the process is that exact secret sauce that allows two portfolio managers to come up with two very different signals and strategies while using the same data. And that is exactly the reason why alternative data still works, even as it is becoming mainstream.

The Golden Middle of Alternative Data Productization

Written by Olga Kane