Main Article Content
The objective of outlier detection is to identify rare events, i.e. the observations in empirical data that have significantly different feature values or different characteristics than the rest of observations. Therefore, it is of high importance to conduct the outlier detection in high-frequency financial data for achieving a good quality of empirical analyses, as well as the risk management and model fitting tasks. In our paper, we investigate the use of COPOD (Copula-Based Outlier Detection) – a parameter-free and interpretable anomaly detection algorithm based on application of empirical copulas and computation of the corresponding tail probabilities. This performance is assessed on real-world data from the Warsaw Stock Exchange (GPW). In our research, we follow the theoretical framework of Li et al. [10] and restate its core formulas in a concise notation adapted later to the chosen financial setting. We provide the Python implementation of a skewness-corrected version of COPOD, closely connected to the original algorithm, and employ it to the empirical dataset, which is a pre-processed feature set derived from the GPW tick dataset, collected in the GPW20230403000tind_for_COPOD.csv file. The empirical results show that the skewness-corrected COPOD algorithm successfully identifies extreme price and return observations in the selected data, while simultaneously remaining computationally efficient and easy to interpret method. Additionally, we also discuss practical implications of the conducted research and present possible directions and extensions of the study towards anomaly identification for the future work.
Article Details
Aggarwal C. C. (2017) Outlier Analysis. 2nd edn. Springer, Cham. https://doi.org/10.1007/978-3-319-47578-3 (Crossref)
Aït-Sahalia Y., Jacod J. (2014) High-Frequency Financial Econometrics. Princeton University Press, Princeton. https://doi.org/10.1515/9781400850325 (Crossref)
Breunig M. M., Kriegel H.-P., Ng R. T, Sander, J. (2000) LOF: Identifying Density-Based Local Outliers. ACM SIGMOD Record, 29(2), 93-104. (Crossref)
Chandola V., Banerjee A., Kumar V. (2009) Anomaly Detection: A Survey. ACM Computing Surveys, 41(3), 15. https://doi.org/10.1145/1541880.1541882 (Crossref)
Cont R. (2001) Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues. Quant Finance 1(2), 223-236. https://doi.org/10.1080/713665670 (Crossref)
Deheuvels P. (1979) La fonction de dépendance empirique et ses propriétés. Un test non paramétrique d'indépendance. Bulletins de l'Académie Royale de Belgique, 65, 274-292. (Crossref)
Embrechts P., Klüppelberg C., Mikosch T. (1997) Modelling Extremal Events for Insurance and Finance. Springer-Verlag. (Crossref)
Engle R. F. (2000) The Econometrics of Ultra-High-Frequency Data. Econometrica, 68(1),1-22. https://doi.org/10.1111/1468-0262.00091 (Crossref)
Hariri S., Carrasco Kind M., Brunner R. J. (2021) Extended Isolation Forest. IEEE Transactions on Knowledge and Data Engineering, 33(4), 1479-1491. https://doi.org/10.1109/TKDE.2019.2947676. (Crossref)
Li Z., Zhao Y., Botta N., Ionescu C., Hu X. (2020) COPOD: Copula-Based Outlier Detection. [In:] Proceedings of the IEEE International Conference on Data Mining (ICDM), 1118-1123. https://doi.org/10.1109/ICDM50108.2020.00138 (Crossref)
Liu F. T., Ting K. M., Zhou Z.-H. (2008) Isolation Forest. IEEE ICDM, 413-422. (Crossref)
Nagler T., Krüger D., Min A. (2022) Stationary Vine Copula Models for Multivariate Time Series. Journal of Econometrics, 227(4), 305-324. (Crossref)
Nelsen R. B. (2006) An Introduction to Copulas. 2nd edn. Springer, New York https://doi.org/10.1007/0-387-28678-0. (Crossref)
O’Hara M. (1998) Market Microstructure Theory. Wiley & Sons. https://asset.quant-wiki.com/pdf/Maureen%20O%27Hara%20-%20Market%20Microstructure%20Theory%20%20-Wiley%20%281998%29.pdf?ref=fxopen.com
Schölkopf B., Platt J. C., Shawe-Taylor J., Smola A. J., Williamson R. C. (2001) Estimating the Support of a High-Dimensional Distribution. Neural Computation, 13(7), 1443-1471. (Crossref)
Schweizer B., Sklar A. (1983) Probabilistic Metric Spaces. North-Holland, New York.
Sklar A. (1959) Fonctions de répartition à N dimensions et leurs marges. Annales de l’ISUP, VIII (3), 229-231.
Downloads
- Marcin Dudziński, Konrad Furmańczyk, Arkadiusz Orłowski, SOME PROPOSAL OF THE TEST FOR A RANDOM WALK DETECTION AND ITS APPLICATION IN THE STOCK MARKET DATA ANALYSIS , Metody Ilościowe w Badaniach Ekonomicznych: Tom 19 Nr 4 (2018)
- Marcin Dudziński, Joanna Kaleta, An application of the interval estimation for the At-Risk-of-Poverty Rate assessment , Metody Ilościowe w Badaniach Ekonomicznych: Tom 22 Nr 1 (2021)
- Marcin Dudziński, Konrad Furmańczyk, Marek Kociński, Krystyna Twardowska, An application of branching processes in stochastic modeling of economic development , Metody Ilościowe w Badaniach Ekonomicznych: Tom 11 Nr 1 (2010)
- Marcin Dudziński, Konrad Furmańczyk, Marek Kociński, BAYESIAN CONFIDENCE INTERVALS FOR THE NUMBER AND THE SIZE OF LOSSES IN THE OPTIMAL BONUS–MALUS SYSTEM. , Metody Ilościowe w Badaniach Ekonomicznych: Tom 14 Nr 1 (2013)
- Marcin Dudziński, Konrad Furmańczyk, The quantile estimation of the maxima of sea levels , Metody Ilościowe w Badaniach Ekonomicznych: Tom 12 Nr 1 (2011)

Utwór dostępny jest na licencji Creative Commons Uznanie autorstwa – Użycie niekomercyjne 4.0 Międzynarodowe.
Publikowane artykuły dostępne są na warunkach Open Access na zasadach licencji Creative Commons CC BY-NC – do celów niekomercyjnych udostępnione materiały mogą być kopiowane, drukowane i rozpowszechniane. Autorzy ponoszą opłatę za opublikowanie artykułu.