Cryptomining Makes Noise: a Machine Learning Approach forCryptojacking Detection

06.03.2025
Cryptomining Makes Noise: a Machine Learning Approach forCryptojacking Detection

A new cybersecurity attack, where an adversary illicitly runs crypto-mining software over the devices of unaware users, is emerging in both the literature and in the wild. This attack, known as cryptojacking, has proved to be very effective given the simplicity of running a crypto-client into a target device. Several countermeasures have recently been proposed, with different features and performance, but all characterized by a host-based architecture. This kind of solutions, designed to protect the individual user, are not suitable for efficiently protecting a corporate network, especially against insiders.

In this paper, we propose a network-based approach to detect and identify crypto-clients activities by solely relying on the network traffic, even when encrypted. First, we provide a detailed analysis of the real network traces generated by three major cryptocurrencies, Bitcoin, Monero, and Bytecoin, considering both the normal traffic and the one shaped by a VPN. Then, we propose Crypto-Aegis, a Machine Learning (ML) based framework built over the results of our investigation, aimed at detecting cryptocurrencies related activities, e.g., pool mining, solo mining, and active full nodes. Our solution achieves a striking 0.96 of F1-score and 0.99 of AUC for the ROC, while enjoying a few other properties, such as device and infrastructure independence. Given the extent and novelty of the addressed threat we believe that our approach, supported by its excellent results, pave the way for further research in this area.

  1. Introduction

Blockchain actually tries to solve the old problem of distributed consensus by exploiting solutions matured from decades of research [17]. The solution coming from Bitcoin’s blockchain is particularly interesting: entities participating to the “voting” process should prove to have solved a moderately hard puzzle, the so called Proof-of-Work (PoW) [38]. Indeed, for the vast majority of cryptocurrencies, in order to verify a transaction and to have it added to the distributed ledger, participants are requested to compute a PoW. Computationally solving PoW is referred as mining. Over time, the complexity of puzzle solving (typically based on hashing as per Bitcoin and several others, such as Altcoin) has increased, leading to a rush for deploying more and more powerful systems that nowadays are able to compute more than $40 \cdot 10^{18}$ hashes per second (worldwide hash rate for Bitcoin at the time of writing this paper). ASIC architectures are today guaranteeing the best trade-off between power consumption, terrific hash rate, size, cost, and life-time. The recent adoption of ASIC architectures brings in again the major issue of centralization [10]. Indeed, the huge gap between CPU/GPU and ASIC mining makes the latter the only viable way to participate to the network as a miner. Eventually, this causes centralization since only ASIC-based crypto-miners can participate to the consensus process. In order to mitigate the above trend, other digital currencies have been created that are actually exploiting different PoW strategies, being therefore ASIC-resistant. For instance, Monero is an example of cryptocurrency specifically designed to be mined also with CPU-based architectures. Indeed, Monero adopts the CryptoNight PoW algorithm where the marginal benefit derived from specialized architectures such as GPU, FPGA, or ASIC does not introduce any significant gain for justifying the adoption of such a hardware. Therefore, mining could also be profitable when performed via CPU-based architectures. The CryptoNight algorithm works by filling a segment of cache with random data corresponding to memory addresses, then subsequently hashing the resulting block after reading and writing to those addresses [19].

PoW is becoming a significant source of revenues for the entities participating to the consensus process. This phenomena will grow even more with the increasing number of users joining the digital currency markets. However, while PoW computational requirements are fueling methodologies and techniques to achieve more and more computational power with less energy consumption, new malicious practices involves PoW-offloading to unaware users. Indeed, a very recent cybersecurity attack involves the illicit use of resources from an unaware users to carry out PoW, i.e., cryptojacking [16]. This attack mainly consists on the unauthorized mining of cryptocurrencies allowing malicious parties to steal resources in terms of CPU, GPU, and memory from a target machine with the…

aim of effortlessly collecting crypto-wealth. This behaviour is gaining momentum for two main reasons: the ease of deployment of crypto-clients; and, the difficulty to detect those crypto-clients. While being a general threat, cryptojacking is becoming particularly critical in Corporate ICT where the vast majority of laptops, desktops, and smartphones are distributed among the employees under a limited (if any) supervision. Indeed, several unauthorized mining activities have already been discovered. Russian nuclear scientists have been arrested for “Bitcoin mining plot” [3], the US government banned a Professor for secretly mining with National Science Foundation supercomputers [4], a former Federal Reserve employee was sentenced to 12 months probation and a $5,000 fine after pleading guilty to installing unauthorized software that connected to an online Bitcoin network in order to earn units of the digital currency [27], a Harvard student used 14,000-Core supercomputer to mine Dogecoin [8], the factory lines of Hoya, the leading manufacturer of optical products in Japan, were shut down for three days as hackers tried to establish an unauthorized cryptocurrency mining [26], just to cite a few.

Contribution. The major contributions of this paper are listed below:

  • A new type of attack: we define a novel type of attack that subsumes the cryptojacking attack, i.e., the sponge-attack, where an adversary (either internal or external) secures a personal profit illicitly exploiting third party computing resources.
  • Network traffic analysis: we provide a detailed analysis of real network traffic generated by 3 major cryptocurrencies: Bitcoin, Monero, and Bytecoin.
  • Encrypted traffic: we investigate how VPN tunneling shapes the network traffic generated by Crypto-clients by considering two major VPN brands: NordVPN and ExpressVPN;
  • The Crypto-Aegis Framework: We propose Crypto-Aegis, a Machine Learning (ML) based framework built over the previous steps to detect crypto-mining activities;
  • Comparison: We compare our results against competing solutions in the literature.

Crypto-Aegis enjoys the following features: (i) Infrastructure independence. The analysis is performed at the exit points (edge) of the Corporate network, independently of network size, network layout, and even when multiple layers of encryption are set in place by the attacker, e.g., a VPN is in use; (ii) Device Independence. We do not require any modification to the already existing devices adopted by the Corporate employees; (iii) Multi-adversarial profiles support. Our solution detects the presence of illicit behaviours via network traffic analysis and independently of the adversarial profiles, i.e., be it an insider or an outsider; (iv) No clean state required. Our solution detects the presence of a miner independently of the time the miner started its activities; and, (v) Effectiveness. Our solution achieves an F1-score of 0.96 and the AUC of the ROC is greater than 0.99.

Roadmap. The paper is organized as follows. Section 2 resumes the most important contributions related to both cryptojacking analysis and detection. Section 3 provides some background concepts related to the ML tools used in our solution, and presents the most important ML techniques for network traffic classification. Section 4 introduces the scenario and the adversary model. The details related to the measurement setup are depicted in Section 5, while a throughout analysis of the collected network traffic traces is presented in Section 6. Section 7 depicts a baseline example analysis, i.e., Bitcoin vs standard software, introducing all the statistics that will be considered for the subsequent analysis. Sections 8 and 9 introduce the methodologies used by our Crypto-Aegis framework, related to the detection and identification of full nodes and miners, respectively. Finally, Section 10 tackles with the general problem of detecting a Crypto-node in a Corporate network, while a detailed discussion of our results and a comparison with other solutions from the literature is presented in Section 11. Section 12 draws some concluding remarks.

2. Related Work

The computational power required to validate and to add blocks to the Bitcoin blockchain has greatly limited the odds that individuals without specialized hardware can provide any contribution to this process. Dedicating small devices (e.g., smartphones, laptops, desktop), or more powerful ones (e.g., workstations, servers), to the mining process would not be worth the cost of the electricity. This discourages users and leaves only a few in the world the opportunity to make contributions and earn the rewards arising. With the advent of other CPU-based cryptocurrencies this scenario has undergone many changes. History repeats itself again. In other ages, by seeing a mine populated by mechanical diggers, the gold digger with the only pick-axe on his shoulders would be forced to find new promising shores. This return to the “gold rush” led to the rediscovery of numerous attacks that had lost meaning with Bitcoin. This type of attacks are identified by the term cryptojacking. Hackers, as well as dishonest employees who would like to round off their earnings, “borrow” resources belonging to others to run the mining process. Hand in hand with threats, some solutions have been already proposed, with the aim of implementing countermeasures to mitigate their effects.

2.1. Cryptojacking Analysis

In [41], the state-of-the art of crypto-mining attacks have been investigated. By analyzing the malware code, as well as its behavior upon execution, authors examine two common attacks: web browser-based crypto-mining, and installable binary crypto-mining, respectively. Browser-based crypto-mining attacks exploit the JavaScript technol-ogy of web-pages, leveraging two web technology’s advancements: asm.js and WebAssembly [22]. Installable binary crypto-mining instead, is possible by using modified versions of the XMrig software [20]. The paper analyzes the techniques adopted by cybercriminals to establish a persistence mechanism and avoid detection, and it introduces both static and dynamic analysis, useful to uncover the techniques employed by the malware to exploit potential victims.

In [18], the authors present an in-depth study over cryptojacking. The analysis of 853,936 popular web pages led to the identification of 2770 unique cryptojacking samples, of which 868 belonging to Alexa’s top 100k ranking websites. A similar solution has been proposed by [30]. The authors propose an approach aiming to identify mining scripts, conducting a large-scale study on the prevalence of cryptojacking in the Alexa’s 1 million websites. According to the analysis, on average 1 out of 500 websites hosts a mining script. Numerous works have followed the same direction.

In [16], authors conduct measurements to establish the cryptojacking relevance and profitability, wondering whether it should be classified as an attack or as a business opportunity. In [35], 138 million domains have been explored, of which 137 million among com/net/org domains and 1 million coming from the Alexa’s Top 1 million list. The analysis shows that the prevalence of browser mining is currently the 0.08% of the analyzed set, a worrying number that should not be underestimated. The Alexa’s Top 1 million websites have been taken into account even by [22], in which the authors studied the websites affected by drive-by mining to understand the techniques being used to evade the detection. As a result, 20 active crypto-mining campaigns have been identified. In [30] the authors proposed a 3-phase analysis approach to investigate the cryptojacking phenomenon. They conducted a large-scale study on the Alexa first 1 million websites, finding that approximately 1 out of 500 sites hosts a mining script that immediately starts mining activities when visited.

2.2. Cryptojacking Detection

One of the first methodologies used to identify cryptojacking was the analysis of static signatures, as typically done for other types of malware [1]. Several solutions, such as [15] and [28], implement static methods to detect mining activities and blacklist malicious web sites. This approach has been proved ineffective against cryptojacking, because of the usage of obfuscation techniques to evade detection [33]. A first step towards the application of Machine Learning techniques to cryptojacking detection has been made by [7]. The authors present an experimental study in which the dynamic opcode analysis successfully allows the browser-based crypto-mining detection. The proposed model can distinguish among crypto-mining sites, weaponized benign sites (e.g., benign sites to which the crypto-mining code has been injected), de-weaponized crypto-mining sites (e.g., crypto-mining sites to which the start() call has been removed), and real world benign sites. In [24], the authors presented a method to detect the browser’s malicious mining behavior. Heap snapshot and stack features have been asynchronously extracted and automatically classified using Recurrent Neural Networks (RNNs). With 1159 malicious samples analyzed, the experimental results show that the proposed prototype recognizes the original mining samples with 98% of accuracy if not encrypted, 93% otherwise. In [18], after identifying a set of inherent characteristics of cryptojacking scripts, such as the repeated hash-based computations and the regular call stack, a behavior-based detector called CMTracker has been introduced. In [32] the authors proposed CapJack, a machine learning-based detection mechanism able to spot in-browser malicious cryptocurrency mining activities. This solution leverages CapsNet, a machine learning algorithm that mimics biological neural organization. CapJack makes use of system features such as CPU, Memory, disk and network utilization, implementing an host-based solution with a detection rate of 87%. Another approach has been used in [33], where the authors propose an application browser extension, named as CMBlock, able to detect and block mining scripts contained on web pages. The proposed solution combines two different methodologies: blacklisting and a mining behaviour detection technique.

All the solutions described in this section are host-based countermeasures designed to detect a mining script running on a single host. For this reason, these solutions are not able to effectively protect a corporate network from the cryptojacking threats, as described in Section 4. In fact, host-based solutions should be installed in every host belonging to the corporate network, with high installation and maintenance costs, not to mention privacy issues. Besides, these solutions use computational resources of the host that they must protect, subtracting them from the business tasks they should be dedicated to.

  1. Background

This section provides the reader with background knowledge on the most important techniques used in this paper. The first part contains a description of the Machine Learning techniques that have been adopted in the proposed solution. Then, in the second part we introduce related work that have laid the foundations for the classification of network traffic using Machine Learning algorithms.

3.1. Machine Learning Tools

Random Forest. Random Forest [6] is an ensemble supervised Machine Learning technique, built as a combination of tree predictors. As an ensemble learning technique, the classification is the result of a decision taken collectively, from a large number of classifiers. The idea behind ensembles classification is based upon the premise that a set of classifiers can provide a more accurate and generalized (thus, less prone to overfitting) classification than a single classifier. With Random Forest, each classifier is a tree, and each tree depends on the values of a random vector independently sampled, but with the same distribution for all the trees in the forest. In detail, for the $i^{th}$ tree, a random vector $\theta_i$ is gen-erated, independent of the past random vectors ( \theta_1, \ldots, \theta_{t-1} ), but with the same distribution. Each tree ( i ) grows using the training set and ( \theta_i ), resulting in a classifier ( h(x, \theta_i) ), where ( x ) in an input vector. After a sufficiently large number of trees is generated, each tree casts a unit vote for the most popular class at input ( x ). To guarantee a degree of diversity among the base decision trees, a randomization approach is used, which works well both with bagging and random subspace methods [23]. To generate each single tree with the Random Forest algorithm, several steps are involved. Let ( N ) be the number of records in the training set, and ( M ) be the number of input variables. The training set (also known as bootstrap sample) is built by sampling ( N ) records at random with replacement from the original data. At each node, ( m ) variables (with ( m \ll M )) are randomly selected out of ( M ). The best split of these ( m ) attributes is used to split the node. Once the forest is built, a new instance will run across all the trees in the forest. Each tree provides a classification for the new instance and issues a vote. The majority of the votes will allow to objectively declare the result of the new instance’s classification.

k-Fold Cross-Validation. k-Fold Cross-Validation [21] is an accuracy estimation method that allows to evaluate how the results of a model will be generalized to an independent dataset. The main objective of cross-validation methods is to estimate the generalization of a model, that is, to understand its accuracy in the classification of data that it had never seen before (i.e., to avoid the overfitting problem). The method consists in partitioning the dataset into subsets, some of which (e.g., training set) will be used to perform the training of the model, while the remaining ones will be used for validation (e.g., validation set) or for testing (e.g., testing set) purposes. There are two types of cross-validation methods: exhaustive cross-validation methods and non-exhaustive cross-validation methods, respectively. The only difference is the number of subsets generated for the split that are performed. In fact, while the exhaustive cross-validation methods use all possible splitting combinations for training and testing, non-exhaustive methods use only a subset of them. We can therefore say that the non-exhaustive methods are an approximation of the exhaustive ones. K-fold cross-validation is an instance of non-exhaustive methods. In k-fold cross-validation the dataset ( D ) is randomly split into ( k ) mutually exclusive subsets: ( D_1, D_2, \ldots, D_k ) of approximately equal size. The model is trained and tested ( k ) times, in particular each time ( t \in {1, 2, \ldots, k} ) the model is trained on ( D – D_t ) and tested on ( D_t ). The cross-validation estimate of accuracy is the number of correct classifications divided by the number of instances in the dataset [21].

3.2. Machine Learning Techniques for Network Traffic Classification.

Network traffic classification has gained more and more attention in the very recent years, having the potential to solve several problems in network management [25, 36] (e.g., building network profiles for proactive real time network traffic monitoring and management), as well as in network security [39] (e.g., Machine Learning application for intrusion detection systems or anomaly detection).

The classic approach combines the analysis of the packets header with the payload inspection. Despite the high accuracy of this methodology, the high volume of data to be processed, together with the users privacy issues implied by this approach, pushed the research community to explore different techniques. Moreover, payload inspection is not possible in case of encrypted traffic, that nowadays is practically the norm. A promising research direction explores Machine Learning techniques for both real time IP traffic classification and static offline analysis of previously captured traffic. One of the first work that allowed to understand the effectiveness of Machine Learning algorithms for the classification of network traffic is [40]. The authors make use of unsupervised Machine Learning techniques to automatically classify traffic flows based on statistical flow characteristics. After studying and evaluating the influence of each feature, including Forward-Pkt-Len-Var, Backward-Pkt-Len-Var, Backward-Bytes, Forward-Pkt-Len-Mean, Forward-Bytes, Backward-Pkt-Len-Mean, Duration, and Forward-IAT-Mean, several traffic traces collected at different locations of the Internet have been used to evaluate the efficiency of the adopted approach. In [31], the authors survey significant Machine Learning-based IP traffic classification solutions proposed in the literature. They highlight that the use of different Machine Learning algorithms for offline traffic analysis (e.g., AutoClass, Expectation Maximization, Decision Tree, Naive Bayes) provides high accuracy (up to 99%) for different Internet applications traffic. In [37], authors evaluate different Machine Learning algorithms for flow-based network traffic classification, in terms of correctness and computational cost. In particular, they investigate the use of three supervised algorithms (i.e., Bayesian Networks, Decision Trees and Multilayer Perceptrons) considering six different classes: Peer-to-Peer (P2P), web (HTTP), content delivery (Akamai), bulk (FTP), service (DNS) and mail (SMTP). Their results show that Decision Trees have both a higher accuracy and a higher classification rate than Bayesian Networks. However, Decision Trees require a larger build time and are more susceptible in the case of incorrect or small amounts of training data. Moreover, they highlight that the amount of training data for a certain traffic class can affect the classification accuracy of both itself and other traffic classes. For this reason, they propose a systematic approach to construct specific training sets that feature the best accuracy results. Regarding the analysis of encrypted network traffic, an early solution is from [2]. The authors use different Machine Learning algorithms (i.e., Adaboost, Support Vector Machine, Naive Bayesian, RIPPER, and C4.5) to distinguish SSH traffic from non-SSH traffic in a given traffic trace. Their results show that the model generated by C4.5 algorithm outperforms the other ones using flow-based features only. Another contribution in this field is given by [13] and [12]. By using supervised Machine Learning techniques for Android encrypted network traffic analysis, the authors demonstrate that an external attacker can identify the specific actions that a user is performing on his mobile apps. Using a Random Forest classifier, they are able to infer not only the app used by the target user, but also the specific action he performed (e.g., sending an e-mail, posting a message, refreshing the home, and so on) for the most used Android applications, such as Gmail, Facebook, and Twitter, despite the use of SSL/TLS for traffic encryption.

  1. Scenario, Assumptions, and Adversary Model

In the following, we describe our reference scenario, the assumptions we make as for the network infrastructure and network traffic classification and, finally, the adversary model.

4.1. Scenario

Figure 1 shows the details of our reference scenario. We consider a corporate network constituted by several interconnected devices, including one that is controlled by a malicious entity, willing to mine cryptocurrencies without being detected. Our solution should be deployed at the network edge and it involves only an Ethernet connection from the main Corporate Network switch to a server running our Machine Learning algorithm. We observe that our solution requires interventions neither on the employee devices, nor on the already existing network infrastructure. Moreover, our solution can be easily deployed even when there are multiple exit connections between the Corporate Network and the Internet: this can be easily achieved by deploying multiple Ethernet links to collect the data from the Corporate Network exit points. We observe that the above configuration is very conservative with respect to standard commercial solutions. Indeed, in the vast majority of cases, corporate solutions involve hardware for deep packet inspection deployed before the exit point, or even at multiple locations of the network.

On the one hand the association between traffic and device is much easier, while on the other hand it requires a significant cost in terms of hardware equipment and deployment. In our solution, we consider the traffic already aggregated, i.e., affected by IP masquerading/NAT, or even tunneled and re-encrypted by a Virtual Private Network (VPN).

4.2. Adversary model

We consider two adversary models with respect to the corporate network: (i) insider; and, (ii) outsider. We assume the insider has direct access to the hardware resources of the company, and therefore, has the opportunity to install new software into it. A typical example is the employee willing to accumulate crypto-wealth by exploiting corporate resources such as CPU, GPU, and network bandwidth. Moreover, we envisage an external adversary (outsider) being able to inject one or more corporate devices with a malicious software for performing unauthorized crypto-mining. Typical examples might be both the increasing number of malware delivering crypto-mining software to unaware users, and websites running crypto-mining Java scripts without the user’s consent. Our adversary model (as depicted in Fig. 1) takes into account a corporate device illicitly running crypto-mining-related activities. As previously stated, we stress that our model takes into account a malicious device that might be controlled by either a dishonest employee or by a remote hacker who took over control of the device itself.

There are several strategies to mine without the company consent, an activity that today is really difficult to detect and prevent. This malicious behaviour might be implemented by either a full node or a miner.

  • Full node. It is a full-featured client of the cryptocurrency infrastructure. It locally stores the whole blockchain and participates to the consensus algorithm, being able to validate all the transactions. The mining activity performed by a full node is called solo mining because the process is done independently from other nodes.
  • Miner. It is a lightweight software that implements a simple worker that receives jobs (i.e., hash computations useful for the PoW) from a third party (i.e., a mining pool). When the mining pool successfully mines a block, both the reward and the fees will be divided among all the participants, proportionally to the computational power offered. This software does not participate in the cryptocurrency protocol and does not require to store a blockchain to work.

In our scenario, we assume that the client (being either a full node or a miner) is already provided with the ledger, if needed, and does not require any warm-up operations. Indeed, Crypto-Aegis does not resort to application-specific transients and it does not require to be deployed before the malicious device starts its illicit activity.

Definition. We define sponge-attack as the malicious behavior of exploiting third-party hardware and software resources to obtain a personal profit without the authorization of the infrastructure’s owner. The sponge-attack illicitly absorbs resources from the targeted infrastructure and makes a payoff out of them in favor of the attacker.

This definition is more general than the one of Cryptojacking, that only refers to unauthorized mining activities. The sponge-attack, instead, includes also any other activities performed with someone else’s resources without authorization. An example of sponge-attack could be a malicious full node installed on a corporate server to perform a DDoS attack against a cryptocurrency network by using the company’s network resources.

The sponge-attack can be implemented by deploying either a Full Node or a Miner.

Miner. The use of a mining pool software allows to carry out mining activities without installing the heavier full node software. An adversary can use this software to perform a faster and stealthier attack since the targeted device does not need to store the ledger, usually very large. Furthermore, by joining a mining pool, profits are increased even if the available resources are limited.

Full node. Deploying a full node into a network without the administrator consent has significant advantages for the adversary. Firstly, the full node gives to the adversary the capability to perform solo mining, if the victim’s resources are sufficiently powerful. Moreover, the full node could be used to attack the cryptocurrency’s network, by performing double-spending attacks, DDoS attacks, Sybil attacks, Eclipse attacks and possibly others.

4.3. Terminology

In the following we refer to different actors and actions by using the following terminology:

  • Crypto-client: A software illicitly installed in a device belonging to the Corporate Network with the aim of performing the sponge-attack.
  • Standard software: A software legitimately installed in a device of the Corporate Network.
  • Reference device: A laptop used for running both the Standard software and the Crypto-clients used in this paper.

5. Measurement Setup and Preliminary Considerations

In this section we provide a description of our measurement setup and a preliminary statistical analysis of the collected traces.

Measurement setup. Our measurement setup can be resumed by Fig. 2. We consider two scenarios: Scenario 1 where a VPN tunnel adds an encryption layer to the communication, and Scenario 2 where the client is directly connected to the Internet. In Scenario 1, the malicious device is connected to the Internet through an encrypted VPN tunnel. For our measurements, we used two different well-known VPN brands, i.e., Nord VPN (v. 1.2.0) and Express VPN (v. 1.5.0). At the time of writing this paper, Express VPN features more than 2000 servers in 148 countries while Nord VPN features 5064 servers in 62 countries. We arbitrarily set the VPN exit node to France for all our measurements. Conversely, in Scenario 2, the malicious device is directly connected to the Internet without resorting to any additional encryption layer. The malicious device—acting as our reference device when not mining—is a Dell XPS15 laptop running Ubuntu 18.04 (64 bit). All the extracted features are publicly available at [14].

Definition. We define ingoing flow all the network traffic from the Internet to our reference device. Moreover, we refer as outgoing flow the network traffic generated by the reference device and sent to the Internet.

We collected network traffic from three different cryptocurrencies (Bitcoin, Bytecoin and Monero) and three different applications (Skype, YouTube, and standard office applications mixed together) as it follows:

  • Skype. We run an audio Skype-call and collected all the network traffic from/to the reference device.
  • YouTube. We collected the network traffic generated by a random YouTube video from/to the reference device.
  • Office network traffic. We logged the network traffic generated by the reference device while using it for standard office tasks, e.g., e-mail, web-browsing, download and upload of files, Microsoft Office365, etc.

The above applications have been selected as a reference excerpt of three traffic patterns coming from three different application scenarios that are audio calls, video streaming, and standard office network traffic. We observe how such network traffic categories cover more than 87% of the 2018 global consumer internet traffic [9]. Since our idea is to infer on the presence of a Crypto-client from the network traffic, we considered the network traffic from the above standard applications as the background “noise” hiding the traffic of the Crypto-client. Our goal is to discriminate the flows involving the Crypto-client from the other flows in the network.

It is known that Machine Learning for network traffic classification is biased by several parameters, i.e., features, type of traffic, trace length, network state, etc. One major concern is related to the consistency of the extracted features given the limited trace length. In particular, we paid particular attention to capture packets from clients at steady-state and after the initial sync period was accomplished. Indeed, Crypto-clients require a warm-up period to download the blockchain and validate it. This guarantees that our log excerpts represent a consistent snapshot of a steady-state client either syncing or mining for the blockchain network.

Table 1

CryptocurrencyTypeClientVersion
BitcoinFull NodeBitcoin Core0.17.0
BitcoinMinerbfgminer5.5.0
MoneroFull NodeLithium Luna0.12.3.0
BytecoinFull NodeBytecoin Wallet3.3.2
Bytecoin/MoneroMinerXMrig2.8.1

Figure 2: Measurement setup: We consider 2 different scenarios. The malicious device is connected to the network through a VPN Tunnel (Scenario 1) and the malicious device is directly connected to the Internet (Scenario 2). We adopted one laptop for the mining activities (malicious device), one other laptop for collecting all the in-transit packets and finally, a switch featuring a monitoring port.

  1. Network Traffic Analysis and Patterns

In this section, we start the analysis of the collected network traffic by considering the two network flows: ingoing and outgoing, as explained in the previous section. In order to guarantee a fair comparison between the various scenarios, we extracted the same number of consecutive samples for each network trace, i.e., 4576 samples. Table 2 shows the network traces we have collected considering the different application scenarios, i.e., Office, Skype, YouTube, Bytecoin, Monero, and Bitcoin. For each scenario, we report the trace duration equivalent to the extracted samples, the quantile 0.5 computed on the interarrival times, and finally the quantile 0.5 computed on the packet sizes. In order to ease the discussion, we refer to each trace by using a sequence of keywords as follows: [Application][Flow direction][VPN Type], where Application can be Office, Skype, YouTube, Bytecoin, Monero, or Bitcoin, Flow direction might be either Ingoing or Outgoing, while VPN Type might be empty (no VPN), Express, or Nord.

Firstly, we observe how considering the same amount of samples involves very different collection time depending on the application scenario, i.e., about 38.37 seconds for YouTube Ingoing with Express VPN, while about 4598 seconds for Bytecoin Ingoing. In the following we provide some insight from Table 2:

  • Bytecoin. Interarrival times are significantly affected by the use of VPN, i.e., time reduction spans from 5 to 10 times. Packet sizes are affected as well, i.e., the increase spans from 2 times to 3 times. It is worth noting that the reduction of the interarrival time with the increasing of the packet sizes involves a reduction of the trace length to guarantee the delivery of the same amount of data.
  • Monero. VPN tunnelling affects interarrival times of Monero depending on the flow. While ingoing flows experience a reduction of the interarrival time, outgoing flows slightly increase their values. Packet size is affected by the same phenomena. While packet size of ingoing flow ramps up from 66 Bytes (No VPN) to 1433 Bytes (Nord), outgoing flows work in the opposite way decreasing from 1242 Bytes (No VPN) to 131 Bytes (Nord).
  • Bitcoin. Interarrival times are more homogeneous for Bitcoin. Indeed, values span between 180 $\mu$s and 300 $\mu$s. Nevertheless, we observe that VPN tunnelling affects packet size, indeed for both Nord VPN and Express VPN, packet size is becoming significantly larger.

Discussion. VPN tunnelling tends to squeeze the packets all together and to increase the packet size. Bitcoin is special: VPN tunnelling is affecting much less the original traffic pattern although there are some significant variations for the packet size. It is worth noting the differences among the cryptocurrencies when the traffic is collected without VPN tunnelling. Interarrival times and packet sizes are very different from each other among the currencies as well as between the ingoing/outgoing flows.

Given the above considerations, we consider more in-depth analysis of the flows in order to subsequently identify the features to be used for the Machine Learning process. Figures 3 and 4 show quantile 0.05, 0.5, 0.95, minimum and maximum values associated to each collected trace during our measurements. Outgoing flows present packet sizes very different from each other in the range between 100 and 1000 bytes. Only few exceptions fall out of that range, while it is worth noting how quantile 0.5, e.g., the median, changes for each network trace. Moreover we observe that, raw traffic from Bytecoin, Monero, and Bitcoin present almost the same values of quantile 0.05 and 0.95. Interestingly, such values get closer (being characterized by less variations) when their traffic is tunnelled through a VPN.

Ingoing flows behave differently from outgoing ones. Packet size spans between closer ranges, i.e., quantile 0.05 and 0.95 are closer with respect to the outgoing flows. Median values are randomly distributed and we do not observe any significant pattern in the VPN tunneling of cryptocurrency clients.

We performed the same analysis for the interarrival times obtained by differentiating the absolute arrival times logged by WireShark. The ingoing flows (Fig. 5) of cryptocurrencies are characterized by very similar values, i.e., almost the…

same quantile 0.05 and 0.95, although we observe that the median values span between $10^{-2}$ and $10^{-4}$ seconds. Similar observations can be drawn by looking at the outgoing flows as depicted by Fig. 6.

  1. Traffic classification: a baseline example

We implemented all the traffic-classification related tasks in MatLab (R2018a) adopting the Statistics and Machine Learning Toolbox©. Our Crypto-client detection algorithm involves the following steps:

  • Features extraction. Features identification and extraction are paramount activities to maximize the performance of the classifier. In this work, we consider several features starting from the very standard ones, i.e., interarrival time and packet size. We also consider other derived features with the aim of validating how they affect the final classifier performance.
  • k-Fold Cross Validation. Cross validation is a common practice to average the results of Machine Learning algorithms. It is usually performed by defining a random partition of $k$ out of $n$ observations. The partition divides the observations into $k$ disjoint sub-samples (or folds), chosen randomly but with roughly equal size. The default value of $k$ is 10.
  • Random Forest (RF). We adopted the TreeBagger MatLab class to implement the RF algorithm. The TreeBagger combines the results of many decision trees, which reduces the effects of overfitting and improves generalization. TreeBagger grows the decision trees in the ensemble using bootstrap samples of the data.
  • Statistics. This task involves the generation of statistics from the classifier results. Our statistics include (among others) True Negative (TN), False Positive (FP), False Negative (FN), and True Positive (TP), confusion matrix, etc.

Table 2

Collected traces: Duration, Median of Interarrival Times, and Median of Packet Sizes.

TraceTrace duration [seconds]Int. Time Median [seconds]Pkt. Size Median [bytes]
Office Ingoing50.9980.0001181434
Office Outgoing59.5020.00031160
Office Ingoing Express84.4600.001003874
Office Outgoing Express147.2810.013116478
Office Ingoing Nord104.7000.000265119
Office Outgoing Nord105.4790.0001621433
Skype Ingoing146.10.018730136
Skype Outgoing145.50.019988130
Skype Ingoing Express92.5770.020065518
Skype Outgoing Express91.9110.019944535
Skype Ingoing Nord87.1170.020119169
Skype Outgoing Nord87.3380.020734196
YouTube Ingoing140.350.0000011434
YouTube Outgoing896.040.00107454
YouTube Ingoing Express38.3780.001848927.5
YouTube Outgoing Express8160.022749483
YouTube Ingoing Nord168.480.0048141432
YouTube Outgoing Nord271.930.007282119
Bytecoin Ingoing4597.20.004673593
Bytecoin Outgoing3280.40.00113066
Bytecoin Ingoing Express729.670.000443706
Bytecoin Outgoing Express979.430.000858134
Bytecoin Ingoing Nord15790.0008031432
Bytecoin Outgoing Nord2011.10.000752119
Monero Ingoing822.550.00045066
Monero Outgoing790.580.0000141242
Monero Ingoing Express197.520.000044890
Monero Outgoing Express215.450.000090820
Monero Ingoing Nord445.290.0001371433
Monero Outgoing Nord404.010.000117131
Bitcoin Ingoing669.270.00060090
Bitcoin Outgoing659.450.00024266
Bitcoin Ingoing Express356.10.000383146
Bitcoin Outgoing Express389.880.000180146
Bitcoin Ingoing Nord692.440.000502165
Bitcoin Outgoing Nord822.290.000359119

In this section, we introduce a simplified version of our methodology considering a binary decision problem. Our goal is to analyze the traffic of a network to determine whether a malicious mining activity is happening. We recall that the traffic collected from the Bitcoin client is related only to the syncing process (being a Full Node), while the one related to the mining process will be considered later on. Moreover, as for the “noise” traffic, we adopted a laptop featuring Windows 10 PRO and performing standard office tasks as discussed in the previous section. We now consider only two network traces from Table 2: Office Outgoing and Bitcoin Outgoing. Moreover, we assume the hypothesis that the current network event has been generated by the Bitcoin client. We run the 10-Fold cross validation algorithm using the RF algorithm (with a default value of 20 trees) and only two features: interarrival time and packet size. Table 3 shows the confusion matrix associated to the classifier results, i.e., 4314 times Bitcoin is correctly recognized (True Positive – TP) while 4321 times the class Office is correctly recognized (True Negative – TN). The other values refer to

ActualPredictedPredicted
No4321254
Yes2614314

False Positive – FP, i.e., 254 observations are wrongly classified as Bitcoin, and False Negative – FN, i.e., 261 observations are classified as not-Bitcoin (Office) while they actually are. Other interesting metrics—that will be used in the remainder of the paper—are the True Positive Rate (TPR) = 0.941, i.e., the number of True Positive normalized to the number of actual Bitcoin observations (TP / (TP + FN)), and the False Positive Rate (FPR) = 0.059, i.e., the number of False Positive normalized to the number of predicted observation for Bitcoin (FP / (FP+TN)).

FPR and TPR can be used to highlight the classifier performance at different threshold values when the system can accept different levels of false positive values. Figure 7 shows the Receiver Operating Characteristic (ROC) curve consisting of True Positive Rate (TPR) as a function of False Positive Rate (FPR). Another important metric directly connected to the ROC curve is the so called Area Under the Curve (AUC), i.e., the area under the ROC curve being a value spanning between 0 (worst case) and 1 (best case). As for the ROC curve in Fig. 7, AUC is about 0.971 for both the classes, Bitcoin and Office, respectively.

We now add more features to the current scenario and we analyze the performance of the classifier. Let us define the already (basic) introduced features and the new ones, as follows:

  • Interarrival time ($\delta$): the time elapsed between two consecutive packets.
  • Packet Size ($\gamma$): Packet size associated to each packet.

Table 3

Baseline example: Bitcoin Vs Office scenario.

ActualPredicted NoPredicted Yes
No4321254
Yes2614314

Figure 7

Receiver operating characteristic (ROC) curve: True Positive Rate as a function of the False Positive Rate. The Area Under the Curve (AUC) is about 0.971 for both the application scenarios.

Moving mean of $\delta (\mu_d(w))$: each mean value is calculated over a sliding window of length $w$ across neighboring elements of $\delta$.

  • Moving standard deviation of $\delta (\sigma_d(w))$: each standard deviation is calculated over a sliding window of length $w$ across neighboring elements of $\delta$.
  • Moving mean of $\gamma (\mu_y(w))$: each mean value is calculated over a sliding window of length $w$ across neighboring elements of $\gamma$.
  • Moving standard deviation of $\gamma (\sigma_y(w))$: each standard deviation is calculated over a sliding window of length $w$ across neighboring elements of $\gamma$.

In order to evaluate the impact of the features on the classification algorithm we used the Mean Square Error (MSE) averaged over all the trees in the ensemble and divided by the standard deviation taken over the trees, for each feature. The larger this value, the more important the feature is in the classification process. Figure 8 shows the Mean Square Error (MSE) as a function of the moving window size ($w$) and the different type of features. Firstly we observe that, for this scenario—Bitcoin Vs Office—the most important feature is $\gamma$, i.e., the packet size, represented by the red bar. The other features have about the same weights, while it turns out that $w = 5$ is a good trade-off for the window size of the moving mean and the standard deviation.

  1. Crypto-Aegis: Detection and Identification of Full Nodes

In this section, we consider the traces of Table 2 while parting them into ingoing and outgoing flows. As previously discussed, we consider three main metrics: True Positive Rate (TPR), False Positive Rate (FPR) and the Area Under the Curve (AUC). Our RF classifier has been configured with 20 default trees, the 6 features already introduced in the previous section, and a moving window of 5 observations. Moreover, we consider only Full Node clients; therefore, the observed network traffic will be related to syncing and consensus operations.

Ingoing flows. Figure 9 shows TPR and FPR for all the cryptocurrencies we have considered in this work. Firstly, we observe how the overall results are quite satisfactory, i.e., the mean computed on the TPR and FPR values is about 0.86 and 0.0088, respectively. The best detection performance are achieved over Bytecoin Express (TPR=0.92, FPR=0.0047) and Bitcoin Express (TPR=0.92, FPR=0.008). Conversely, worst case performance are achieved for:

  • Bytecoin (TPR=0.81, FPR=0.012). Misclassifications are mainly due to Monero (332 cases – 7%), Bitcoin (106 cases – 2%), and Bytecoin Nord (97 cases – 2%).
  • Monero Express (TPR=0.80, FPR=0.014). False positive are mainly due to Office Express (381 case – 8%), YouTube Express (148 cases – 3%) and Bytecoin Express (63 cases – 1%).
  • Bitcoin Nord (TPR=0.84, FPR=0.009). Classification errors mainly come from Monero Nord (251 cases – 5%), Bitcoin Express (217 cases – 4%) and Bytecoin Nord (70 cases – 1%).

Our results prove that ingoing flows (from the Internet to the device) can be used to effectively identify malicious miners inside local networks. In particular, traffic generated by cryptocurrencies clients (without VPN) can be detected with high TPR values, i.e., 0.84, 0.87 and 0.90 for Bytecoin, Monero, and Bitcoin. The adoption of a VPN tunnel does not improve the privacy of the Crypto-client: TPR is increasing for Bytecoin when tunneled through a VPN, while Monero and Bitcoin have diverging performance as a function of the adopted VPN brands. Moreover, we observe that…


Useful information for enthusiasts:

Contact me via Telegram: @ExploitDarlenePRO