Common data related issues observed by Model Validators

Right derivation about the data is must. During the development and independent validation exercise, it will be good idea if the data extraction process and the assumptions underlying the data are validated by the independent validators before any model development. This will save lot time and energy.

Common issues found are:

1) Lack of adequate segmentation: In stress testing models, the main input is aggregate portfolio return series. A mathematical relationship is developed between this series and Macro Economic (ME) variables.

If this aggregate return series is further segmented then the return series may show idiocentric trends and properties. In aggregation this trend and properties gets diversified. In normal scenarios this may be true but may not be true in stressed scenarios.

If aggregation is done to simplify the model and without this aggregation the model will be impractically complex, then model developers should document the limitations. They should document what are the actions business will take when they observe the deviation from perceived behavior.

2) Due to data limitations, it is not possible to find the good independent variable to project the risk number. In that scenarios model developers choose the best available independent variable.

Model owners should provide justification of the use of that variable. They should give the confidence interval of the coefficients using bootstrapping approach and also given that the chosen independent variable is best among the worst ones available a monitoring framework should be provided.

4) Absence of data dictionary. Also absence of any procedural description of derived data.

5) Use of Index data: When data is limited model developers use Index data to model. Adequate justification should be provided for the usage of that data.

6) Data is provided the excel sheets. Ideally proper documentation should be provided the source of the data and how this data derived. Often this is not provided.

Additional explanation should be given where the index behavior and portfolio at hand’s behavior will diverge.

7) A document related to data completeness should be provided by the model developers. This should contain the completeness degree of each variable, criticality of that data variable, accuracy of that variable.

8) In validation of market models (derivative instrument pricing and risk), generally models are well defined. They are generally implemented on vendor systems or inhouse systems. They are validated robustly from theoretical perspectives. Often challenge arrives when there are multiple systems. A given trade is priced in multiple systems. There can many be difference in pricing because differences in the curve construction, interpolation of yield curves etc in all the different systems. These differences are still minor, sometimes cross currency trade is booked in one currency in one system and in other currency in other system. Adding to that, one system may do tenor adjustments another may not. That creates mis-matches.

9) In validation of market risk models, models are implemented in the software systems. The pricing models models use may direct and derived data to price a security. This data is mostly current market data. Unfortunately many aspects of this data is not stored in the software systems. This poses challenges in benchmarking/backtesting of historical data.

10) Observed in pricing models: Models which are eventhough simple but require various amount of data, for example swaption: vol cube/surface etc, there are very high possibility of data being stale. This is because of the glitches in data pipeline in data collection- warehouses to the pricing systems. It is also observed that data is fine in one particular currency but stale in other currency.

11) In option pricing models, the volatility surface/cube is an important input. But often it is observed that the data in various currencies is not uniform. The volatility surface of currency X may be more dense than that of currency Y. This poses serious questions in the authenticity of the right pricing of the model.

12) In case of data related to loan origination, often inaccuracies occur related to dates of origination, dates of maturity, appraisal values etc. These issues often get undetected by data cleansing processes when the cleansing process is automated. Even when data is cleansed manually due to sheer size data and variety of variables some inaccuracies get undetected. If model validators detect such inaccuracies, in addition to pointing them out the the developers, they  should do potential impact analysis of these inaccuracies.

This page will be updated regularly.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s