An Empirical Study of Blockchain System Vulnerabilities: Modules, Types, and Patterns

05.03.2025
An Empirical Study of Blockchain System Vulnerabilities: Modules, Types, and Patterns

Blockchain, as a distributed ledger technology, becomes increasingly popular, especially for enabling valuable cryptocurrencies and smart contracts. However, the blockchain software systems inevitably have many bugs. Although bugs in smart contracts have been extensively investigated, security bugs of the underlying blockchain systems are much less explored. In this paper, we conduct an empirical study on blockchain’s system vulnerabilities from four representative blockchains, Bitcoin, Ethereum, Monero, and Stellar. Specifically, we first design a systematic filtering process to effectively identify 1,037 vulnerabilities and their 2,317 patches from 34,245 issues/PRs (pull requests) and 85,164 commits on GitHub. We thus build the first blockchain vulnerability dataset, which is available at https://github.com/VPRLab/BlkVulnDataset. We then perform unique analyses of this dataset at three levels, including (i) file-level vulnerable module categorization by identifying and correlating module paths across projects, (ii) text-level vulnerability type clustering by natural language processing and similarity-based sentence clustering, and (iii) code-level vulnerability pattern analysis by generating and clustering code change signatures that capture both syntactic and semantic information of patch code fragments.

Our analyses reveal three key findings: (i) some blockchain modules are more susceptible than the others; notably, each of the modules related to consensus, wallet, and networking has over 200 issues; (ii) about 70% of blockchain vulnerabilities are of traditional types, but we also identify four new types specific to blockchains; and (iii) we obtain 21 blockchain-specific vulnerability patterns that capture unique blockchain attributes and statuses, and demonstrate that they can be used to detect similar vulnerabilities in other popular blockchains, such as Dogecoin, Bitcoin SV, and Zcash.

CCS CONCEPTS

  • Security and privacy → Software and application security.

KEYWORDS Blockchain Security, System Vulnerability, Data Mining

1 INTRODUCTION While blockchain was first invented as a transaction ledger of the Bitcoin cryptocurrency [68], it is now serving as a fundamental component of many cryptocurrencies, the total market capitalization of which is close to two trillion USD in February 2022 [40]. Smart contract platforms (e.g., Ethereum [35] and Hyperledger Fabric [30]) and decentralized computing platforms (e.g., Interplanetary File System [34] and Blockstack [29]) further evolved the blockchain technology into various decentralized applications, such as DeFi (Decentralized Finance) [80], smart contract oracles [83, 84], decentralized identities [65], decentralized IoT management [73], and decentralized app markets [37]. To protect the decentralization of these systems and secure those finance-critical cryptocurrencies, security is a top priority of many blockchains.

Prior research on blockchain security focused on smart contract vulnerability detection and network analysis. Many static program analysis tools, e.g., Oyente [64], Zeus [51], Security [76], Giga-horse [47], and ETHBMC [42], have been proposed to detect vulnerable smart contracts via symbolic execution and model checking. Dynamic tools [38, 49, 70, 75] and learning-based tools [45, 60, 63] were also invented. Besides smart contract analysis, some works analyzed network traffic hijacking [31] and mining [44] attacks and performed transaction attack analysis [39, 53, 85, 87]. In contrast, blockchains’ system-level security issues are much less explored in academic research. To the best of our knowledge, there was only one study [77] in this direction. It specifically analyzed 946 blockchain bugs, with only 18 security bugs covered and four analyzed.

In this paper, we aim to systematically understand blockchain system vulnerabilities by conducting an empirical vulnerability study of the representative blockchains in four directions, including the classic Bitcoin [68], the smart contract platform Ethereum [35], the anonymous coin Monero [69], and the payment network Stellar [61]. They are not only popular in the cryptocurrency market but also backed up with solid technical papers.

As depicted in Figure 1, the first step and challenge of our study is to effectively collect vulnerable issues and their patches of those four blockchains. This is difficult because there is very little CVE information associated with blockchain projects (unlike other vulnerability mining studies [56, 81, 86]), and the large number (over 34K) of raw blockchain bugs in our crawled database makes manual vulnerability filtering ineffective. To address this, we propose a vulnerability filtering framework based on the intuition that vulnerabilities have unique characteristics from various aspects, and we can gradually identify candidate vulnerabilities by analyzing attributes of the code commits, files, labels, and keywords. Eventually, we obtain 1,037 vulnerabilities and their 2,317 patches as our blockchain vulnerability dataset.

Based on this unique dataset, we study three key yet unexplored aspects of blockchain vulnerabilities, including susceptible blockchain modules, common blockchain vulnerability types, and blockchain-specific patch code patterns. To this end, we perform the file-, text-, and code-level vulnerability analysis as follows.

Firstly, we conduct the module analysis by inspecting patched files. However, inspecting each individual file is time-consuming because there are 2,362 unique patch file paths. Therefore, we propose to identify the module path, i.e., the folder name that could summarize the module of enclosed files (e.g., the “rpc/” folder indicates the RPC module). We further correlate module paths across different blockchains by identifying a reference blockchain architecture and mapping different module paths into this architecture. This module categorization allows us to obtain a layered map of blockchain vulnerabilities in different modules and pinpoint susceptible blockchain modules. We find that some modules are more susceptible than the others, such as the highly susceptible ones related to consensus, wallet, and networking, each with over 200 vulnerabilities.

Secondly, we perform the type analysis by analyzing vulnerability text, more specifically, vulnerability titles. This is because a vulnerability type is typically captured by the title of an issue/PR (pull request), e.g., Bitcoin PR #17640 “wallet: Fix uninitialized read in bumpfee(…)”, where “uninitialized read” is the type. To eliminate noisy words and generate good-quality clusters about types, we leverage the part-of-speech analysis of NLP (natural language processing) to first extract type keywords before we conduct actual clustering. By extracting type keywords in various situations and identifying a suitable clustering algorithm (and its setting), we successfully map 75.8% of the vulnerabilities into the clusters of different types and analyze the top 20 types that affect at least ten vulnerabilities each. Among these types, we identify four new vulnerability types that are directly related to blockchain transaction, block, peer/node, and wallet key/password. We also show that traditional vulnerability types still hold $62% \sim 78%$ of all the blockchain vulnerabilities. Furthermore, we analyze the type differences across different blockchain projects.

Thirdly, we conduct the pattern analysis by analyzing vulnerability patch code. In particular, we focus on blockchain-specific vulnerability types since the code patterns of traditional vulnerability types are well-known. To facilitate similar patch code into the same cluster, we design and generate the code change signatures that concisely capture both syntactic and semantic information of patch code fragments. By clustering 3,251 code fragments into 174 clusters of code change signatures, we identify 21 blockchain-specific vulnerability patterns that check unique blockchain attributes (e.g., the sender address, transaction order, block header, and gas limit) and validate various blockchain statuses during node synchronization, peer validation, wallet, and database operations. We further leverage these patterns to discover 20 similar vulnerabilities in other popular blockchains, notably, Dogecoin, Bitcoin SV, and Zcash, which have a collective market capitalization of over 25 billion USD as of January 2022. Most of our vulnerability reports have been confirmed and are under patching, with only two being rejected. This demonstrates the real-world impact of our vulnerability patterns. A thorough detection of blockchain system vulnerabilities based on the patterns extracted in this paper will be our future work.

To sum up, the main contributions of this paper are as follows:

  • We design a systematic filtering process to curate a unique vulnerability dataset and will release it to the research community. The link of the dataset is already available at https://github.com/VPRLab/BkVulnDataset.
  • We develop a set of new methods to analyze blockchain vulnerabilities, build a knowledge base on previously unknown patterns of the vulnerabilities and their fixes.
  • We reveal three key findings about blockchain system vulnerabilities in terms of their susceptible modules, various vulnerability types, and specific vulnerability patterns. Moreover, we demonstrate the usage of these vulnerability patterns by detecting 20 similar vulnerabilities in other popular blockchains.

The rest of this paper is organized as follows. We first provide the background of studied blockchains and their bug-fixing process in §2 and describe our systematic data collection in §3. We then present our multi-level vulnerability analysis in §4, §5, and §6, respectively. §7 summarizes the related works. Finally, §8 concludes this study.

2 BACKGROUND

2.1 Four Representative Blockchains Studied

In this paper, we study the representative blockchains that are (i) popular in the cryptocurrency market, (ii) in different directions of blockchain usages, and (iii) backed up with solid technical papers. Under these three conditions, we select the classic Bitcoin [68], the smart contract platform Ethereum [35], the anonymous coin Monero [69], and the payment network Stellar [61]. Next, we present their basic information and the development status on GitHub.

Bitcoin introduces the concept of blockchain [68] and uses it as a distributed ledger to record transactions for public verification. As of January 2022, the Bitcoin cryptocurrency (or BTC) has the top one market capitalization of more than 832 billion USD. The Bitcoin software was released in 2009, and it is actively maintained by over…

850 contributors on GitHub in a repository called bitcoin/bitcoin. The primary programming language of Bitcoin is C++.

Ethereum is the first blockchain system with the capability of constructing Turing-complete smart contracts [35], which contain a set of pre-defined rules and regulations for self-execution. To maintain the operation of Ethereum, it creates a native cryptocurrency called Ether (or ETH), which is the second largest cryptocurrency with a market capitalization of more than 410 billion USD as of January 2022. The Ethereum software was released on GitHub in 2015, and its Go implementation is maintained by over 700 contributors in a repository called ethereum/go-ethereum.

Monero aims to mitigate the privacy leakage in blockchain systems, since each blockchain transaction is transparent and could leak some sensitive information. To do so, Monero uses an obfuscated ledger [69] to prevent the transaction details (e.g., transaction source, amount, and destination) from being revealed to outside observers. As of January 2022, the Monero coin (XMR) is ranked 47th with a market capitalization of over 3.8 billion USD. The Monero software was released on GitHub in 2014, and it is maintained by over 250 contributors in a repository called monero-project/monero. The primary language of Monero is C++.

Stellar is a blockchain-based payment network [61] that can perform cross-border money transfer in seconds. It uses a novel consensus protocol called Stellar Consensus Protocol (SCP) [61] for fast and secure transactions among untrusted participants. The native cryptocurrency of Stellar is called XLM, which is ranked 30th with a market capitalization of around 6.7 billion USD as of January 2022. The Stellar software was released on GitHub in 2015, and it is currently maintained by more than 80 contributors in a repository called stellar/stellar-core. Similar to Bitcoin and Monero, the primary language of Stellar is also C++.

2.2 Bug-fixing Process in Blockchain Projects

It is also necessary to understand the typical bug-fixing process of blockchain projects hosted as open-source projects on GitHub in order to collect and analyze their vulnerabilities and patches. A commit is a set of changes submitted by developers into a project repository; a commit can change anything, ranging from changing source code to modifying document files or merging multiple previous commits. A change consisting of a consecutive sequence of added/deleted lines is also known as a hunk. A patch is a collection of changes or commits that can be applied to a set of files via a patching tool. An issue is often a report on a project’s GitHub page; it may describe a potential bug or sometimes an enhancement or a question, and may come with fixes and solutions. A pull request (PR) is the proposed commit for a project from a separate clone of the project; it can be pulled from the project clone and accepted into the original project based on the review of managing developers. For simplicity, we do not explicitly distinguish an issue and a PR in this paper since the latter often contains a bug description too. Indeed, GitHub itself mixes up the usage of issue/PR numbers.

3 SYSTEMATIC DATA COLLECTION

As shown in Figure 1, the first and a critical step of our study is to collect a good-quality blockchain vulnerability dataset across multiple blockchain systems that satisfies two conditions: (i) cover as many vulnerabilities as possible in the studied blockchains (i.e., minimizing false negatives); and (ii) introduce as few non-vulnerability bugs as possible in the dataset (i.e., minimizing false positives).

Some other vulnerability studies [56, 81, 86] leverage the CVE (Common Vulnerabilities and Exposures) or Bulletin (i.e., bug bounty) information to collect vulnerability data. However, we found that there is very little CVE/Bulletin information about most blockchains because blockchain vulnerabilities are critical and often patched directly via the reports from bug bounty programs without releasing a CVE. For example, Ethereum (go-ethereum) had only four CVEs released before our data collection while Bitcoin had 33 CVEs.

We take a different way—directly analyze the blockchain projects’ issues and commits in their GitHub repositories and extract the vulnerable ones from them. We first crawl all blockchain bugs and organize them into a raw bug database (in §3.1). The major challenge is how to recognize or differentiate real vulnerabilities from a large number of regular bugs. To address the challenge, we propose a novel vulnerability filtering framework (in §3.2) that systematically and effectively filters out regular bugs and extracts blockchain vulnerabilities. We eventually obtain the first dataset of blockchain system vulnerabilities (in §3.3), comprising more than 1K vulnerabilities identified from over 34K issues. It could not be done via manual analysis or via prior training-based patch identification [74, 88] since (i) there is no ground-truth training set for blockchain vulnerabilities and (ii) the learning-based nature of those techniques tends to identify only the similar bugs or vulnerabilities.


2In this paper, we adopt a broad definition of vulnerabilities that considers the bugs with security impact as vulnerabilities.

3.1 Crawling and Organizing Blockchain Bugs

As illustrated in Figure 1, our blockchain bug database is constructed from two data sources, the issues and commits, by leveraging GitHub APIs. For the issues, we collect all the information of each closed issue/PR, including the issue title, issue body, comments, events, and bug category labels. We consider only closed issues/PRs because open issues are not confirmed bugs yet and certainly have no patches. Note that even for closed issues, they may not be the real bugs and could have no patches (i.e., they were simply closed by developers). For the commits, we first crawl all the commits of a repository and then determine which commits are bug-related. For each commit, we collect its title, commit message, affected files, and id/URL; for some commits filtered according to §3.2, their actual code change hunks are also collected and processed according to §3.3 and used for code-level pattern analysis in §6. We have collected a total of 34,245 closed issues/PRs and 85,164 commits as the raw dataset at the end of February 2020. The detailed breakdown of these issues/PRs and commits across four blockchain projects is available in Table 2.

With the raw data collected, a non-trivial task is to organize and correlate the issues with their corresponding commits. Specifically, we need to determine all the relevant commits for a given issue/PR — if an issue/PR has no patch commits, it is not a real bug and will be filtered out. By summarizing the issue/PR and commit’s GitHub structure, we observe three kinds of information we can leverage for such correlation. First, we leverage the issue page’s event information (e.g., XXX mentioned this issue and YYY added a commit) and retrieve the commit URLs from those events. For example, in https://github.com/bitcoin/bitcoin/issues/595, we obtain the commit URL via the event of “laanwj added a commit that referenced this issue.” Second, for a PR like https://github.com/bitcoin/bitcoin/pull/9366, we can directly retrieve its commit lists at its “Commits” tab page. Although these two kinds of information is useful for most issues/PRs, some commits may not appear in the events of issues or commit lists of PRs. To overcome this, our script analyzes all the 85,164 commits’ titles and messages and identifies issue/PR numbers from them. With these strategies, we successfully build the relationship between the issues and commits and finish constructing the raw bug database shown in Figure 1.

3.2 A Vulnerability Filtering Framework

To evolve the raw bug database into the final vulnerability dataset, we design a systematic vulnerability filtering framework expressed as a seven-step process (i.e., S0~S4b in Table 1) to effectively differentiate vulnerabilities from regular bugs with minimal manual work. The intuition is that vulnerabilities have unique characteristics at various aspects, and we can gradually identify candidate vulnerabilities by analyzing attributes of the code commits, files, labels, and keywords. As shown in Table 1, we perform the filtering at the following four aspects:

Commit-based filtering. Firstly, in the step S0, we leverage the most straightforward characteristic that a closed vulnerability must associate with code commits. In other words, an issue/PR without any commit could be excluded directly. Since we have already built the relationship between issues/PRs and commits in §3.1, we easily exclude 10,101 issues/PRs out of the entire 34,245 issues/PRs.

File-based filtering. Secondly, we leverage two characteristics of patch files to filter out the bugs that are certainly not vulnerabilities. The basic idea of these two characteristics is that the patch of a vulnerable issue/PR must make some real code changes, including changing files with actual source code and not containing only test code. Specifically, in the step S1, we determine the file types with actual source code (by their file suffixes) for four blockchains. An issue/PR whose commits do not modify any file in these types should be excluded. For example, there are 152 different file types for Bitcoin’s commits, but only these seven file types, [.cpp’, ‘.h’, ‘.py’, ‘.sh’, ‘.cc’, ‘.c’, ‘.java’], contain actual source code whereas other file types like ‘.yml’ and ‘.mk’ are unlikely related to vulnerabilities. This step filters out 3,798 more issues/PRs, then the remaining 20,346 are further filtered by the step S2. Specifically, S2 excludes the test-only commits and their associated issues/PRs. With the file-based filtering, we exclude 22% (5,322/24,144) of the issues/PRs.

Label-based filtering. Thirdly, we leverage the characteristic of the labels of issues/PRs: certain words in the labels could indicate whether an issue/PR is related to a vulnerability or not. For example, the ‘Privacy’ label marks privacy-related bugs in the Bitcoin project and the ‘obsolete:vuln’ label indicates the early-stage vulnerabilities of Ethereum. To avoid false positives, we are conservative in specifying vulnerability labels — we assign only three labels (i.e., the ‘Privacy’, ‘obsolete:vuln’, and special label ‘SEC-XXX’ that appeared in the beginning of issue/PR titles) and mark their corresponding 56 issues/PRs explicitly as vulnerabilities in the step S3a. In contrast, there are much more labels clearly indicating non-vulnerability issues/PRs. Specifically, out of the entire 87 labels from four blockchain projects, we manually determine that 48 of them are not related to vulnerabilities, such as ‘Refactoring’, ‘Docs’, and ‘type:feature’. With these labels, we filter out their associated 4,400 issues/PRs in the step S3b. After this step, we have narrowed the filtering scope from 34,245 to 14,368 issues/PRs, a reduction of 58%.

Keyword-based filtering. Lastly, we directly check issues/PRs’ text based on the characteristic that some keywords could indicate an issue/PR vulnerable whereas others could imply an issue/PR not related to vulnerabilities. To this end, we first perform a word count analysis on the words in issue/PR titles and bodies, sort these words by their appearance frequency, and exclude the words that appear only once. We then group the words by their semantic similarity using the spaCy [25] NLP library. Since similar words are grouped together, we manually go through all the clusters to obtain a set of vulnerability-related words (Step S4a) or non-vulnerability words (Step S4b). Specifically, we obtain 62 clusters of vulnerability-related words and 79 clusters of non-vulnerability words, which allows us to automatically identify 1,227 vulnerable issues/PRs and exclude 6,330 irrelevant issues/PRs in the step S4a and S4b, respectively.

Table 1: Intermediate results of the filtering in each step.

ActionCommitFileLabelKeyword
S0-10,101-3,798-1,522-4,400
S118,76820,34618,82414,368
S213,1416,811
S3a
S3b
S4a
S4b

Table 2: Metadata of the raw and vulnerability datasets.

RepositoryRaw Bug DatabaseVulnerability Dataset
Closed Issues/PRsCommits
Bitcoin16,73141,706
Ethereum9,32123,764
Monero5,91812,656
Stellar2,2757,038
Total34,24585,164

Eventually, our filtering framework extracted 1,283 (=1,227+56) suspicious issues/PRs (in the step S3a and S4a) from the entire 34,245 issues/PRs. We have manually examined all these candidates and confirmed that 1,059 of them were actually vulnerability-related. This suggests that our filtering achieves a precision of 82.5% in identifying true vulnerabilities. It is also worth noting that our filtering framework may potentially have a high recall in identifying all patched vulnerabilities in the projects although there is no ground-truth for exact measurement, since it handles at least 80.1% (27,434/34,245) of all the issues/PRs; although the remaining 6,811 after step S4b are discarded, we believe that they have a low chance of being vulnerabilities due to no relevant keywords.

3.3 The Vulnerability Dataset and Its Metadata

We then retrieve the actual code hunks for the identified 1,059 issues/PRs from their corresponding 2,933 commits. This allows us to further exclude 22 issues/PRs because they associate with “invalid” code commits through the code hunk analysis. Specifically, we identified 586 duplicate code commits whose code hunks were the same (e.g., https://github.com/bitcoin/bitcoin/commit/d478a1c6 and https://github.com/bitcoin/bitcoin/commit/8a445c56), for which we kept just one code commit for each duplicate pair. We also found 30 empty code commits where we were not able to obtain their code hunks due to disappeared (e.g., https://github.com/bitcoin/bitcoin/commit/7e193ff6) or large diffs (e.g., https://github.com/ethereum/go-ethereum/commit/34dde3e2). As a result, our final vulnerability dataset consists of 1,037 vulnerability-related issues/PRs and their 2,317 commits, as shown in Table 2. It is worth noting that while items in our dataset are all security patches, some of them are not conventionally technical vulnerabilities but more like security enhancements, such as upgrading weak crypto algorithms to strong ones. In this paper, we do not distinguish them.

In Table 2, we also list the metadata of each blockchain project. We can see that Bitcoin and Ethereum contribute 77.8% of the vulnerabilities in our dataset, whereas the percentages of Monero and Stellar vulnerabilities are relatively low. This is mainly because Bitcoin and Ethereum have much more code commits than the other two blockchains, holding a similar percentage (76.9%) of the entire 85,164 commits. Additionally, we notice that Stellar has around the same number of patches as Monero, whereas the number of issues/PRs is three times lower (56 v.s. 178). The main reason is that Stellar developers tended to use one PR to cover multiple-bug fixes at the early stage of Stellar development.

Based on this unique dataset, we perform a comprehensive vulnerability analysis at three different levels in §4, §5, and §6.

4 FILE-LEVEL MODULE CATEGORIZATION

At the first-level of our study, we perform the module analysis of patched files. We first propose a lightweight method for categorizing vulnerable modules in §4.1, and then present the categorization result and its implication in §4.2.

4.1 Identifying and Correlating Module Paths for Vulnerable Module Categorization

We found that 1,037 vulnerable issues/PRs (or more precisely, 2,317 patch commits) totally generated 2,362 unique file paths (544 in Bitcoin, 1,376 in Ethereum, 251 in Monero, and 191 in Stellar), which makes inspecting each individual file time-consuming. Therefore, we propose to identify the module path, i.e., the folder name that could summarize the module of enclosed files (e.g., the “rpc/” folder indicates the RPC module). For some paths of generic names (e.g., the “src/” folder), we consider its sub-folders as module paths. Since Ethereum’s folder structure is more complicated than the other three projects, we also consider three additional folders (the “core/”, “swarm/”, and “eth/” folders) as generic, and consider their sub-folders as module paths. Eventually, we obtain a total of 146 module paths (28 in Bitcoin, 71 in Ethereum, 26 in Monero, and 21 in Stellar) from 2,317 patch commits in the four studied blockchains.

Further, since different blockchains have different path names for the same module (e.g., the Consensus module of Bitcoin/Ethereum is in “consensus/” while that of Stellar is in “src/scp/”), we need to correlate those module paths across projects. Our solution is to identify a reference blockchain architecture and map different module paths into this architecture. Since many blockchains are based on Bitcoin, we use Bitcoin Core’s architecture [22] as our reference. For easier understanding, we separate the entire architecture into four layers [18], as shown in Figure 2, and unify the traditional Miner, Mempool, and Validation Engine components into the Consensus module. We then manually map those 146 module paths into our blockchain architecture one by one.

It is worth noting that a vulnerable issue/PR may affect multiple modules, so the sum of the numbers of vulnerabilities of all the modules is larger than 1,037. Also, some patch commits change only the files directly under the generic “src/” folder and do not have module paths. We inspect all such patch files (107 in Bitcoin, 31 in Ethereum, 6 in Monero, and 4 in Stellar) and map their corresponding vulnerabilities into the modules in Figure 2 based on the patch file names.

4.2 Susceptible Blockchain Modules

Figure 2 shows the result of our module categorization in a layered map of blockchain modules and the numbers of vulnerabilities in those modules. We can see that modules in the Policy, Peer, Network layers each introduce around one-fourth of the vulnerabilities, while the UI modules and other uncategorized modules contribute the remaining 30%. Among all modules, we find that some modules are more susceptible than the others. Notably, the modules related to Consensus, Wallet, and NetConn contain over 200 issues each. Other modules about RPC, GUI/CMD, and Storage are also susceptible, affecting around 100 issues each. We observe that:

Figure 2: A layered map of blockchain vulnerabilities in different modules.

  • The Consensus module covers the consensus (e.g., the Proof-of-Work mechanism [68]), miner, block/transaction related components. Unfortunately, it was affected by 265 vulnerabilities, with the major module path from the “consensus/” folder. Other module paths include “miner/”, “ethchain/”, “src/cryptonote_core/”, “src/scp/”, and “src/ledger/”.
  • In the Peer layer, the Wallet module handles transactions for each peer and the Storage module manages the storage of those transactions. As shown in Figure 2, the Wallet module was affected by 214 vulnerabilities, which are mainly from the “src/wallet/” and “accounts/” module paths. In contrast, the Storage was affected by 93 vulnerabilities, all of which are from database-related module paths, such as “src/blockchain_db/”, “src/leveldb/”, and “ethdb/”.
  • The NetConn and RPC modules collectively incurred the most blockchain vulnerabilities in our dataset. As a distributed system by nature, blockchain systems heavily rely on network synchronization and RPC (Remote Procedure Call). Since it deals with complex network communication of different peers, multiple security issues could occur, such as data race, deadlock, resource leak, and denial-of-service.
  • Surprisingly, the GUI/CMD module is also a major source, with 141 vulnerabilities from the module paths like “src/qt/”, “ethereum/ui/”, “src/daemon/”, and “cmd/”. The underlying faults vary, but segfault and deadlock are typical bugs.

5 TEXT-LEVEL TYPE CLUSTERING

At the second-level of our study, we conduct the type analysis by analyzing vulnerability text. In this section, we first present a NLP-based approach for clustering vulnerability types in §5.1, and then summarize the clustering results and showcase common blockchain vulnerability types in §5.2, including the ones not known before.

5.1 NLP-based Analysis of Vulnerability Titles for Type Clustering

We find that a vulnerability type is typically captured by the title of an issue/PR page, e.g., Bitcoin PR #17640 “wallet: Fix uninitialized read in bumpfee(…)”, where “uninitialized read” is the type. However, simply clustering issue/PR titles does not generate good-quality clusters about vulnerability types because each title could have some noises. For instance, in the earlier example, “wallet” and “bumpfee” would affect the clustering quality. To address this problem, we propose a novel NLP-based method to first extract type keywords before we conduct actual clustering. This method is based on a grammatical pattern of vulnerability titles we observed, that a type is often a noun phrase located in between a verb (e.g., “fix”) and a preposition (e.g., “in”). Figure 3 shows an intuitive illustration. Overall, our approach consists of two major steps: NLP-based keyword extraction and clustering the obtained type keywords. Before these two steps, we also need to perform some pre-processing.

Pre-processing. To this end, we remove useless words and formalize remaining words in the vulnerability titles. Specifically, the useless words include (i) the module/version information (e.g., the word before “.”, such as the “wallet” above, or the word inside “[]”, such as “[rpc]” or “[RELEASE]”), (ii) the special word (e.g., “SEC-*” for Ethereum and one-character word like “a”; note that numbers and symbols like “–” or “(…)” could be automatically handled by tokenizing), and (iii) noun-like adjective words (e.g., “possibility of” and “use of”). After cleaning useless words, we further formalize the remaining words by setting them to the lower case and tokenizing them via the NLP [24] library’s RegexpTokenizer. During this process, we also unify a few words (e.g., replacing all “txs”/”txs”/”txns” using “transaction”). In Table 3, we list several example titles our script automatically cleaned.

NLP-based keyword extraction. According to the grammatical pattern shown in Figure 3, our objective is to find the target verb and preposition that could determine the range of type words. However, one vulnerability title may contain multiple verbs or prepositions. Moreover, some verbs mainly act as nouns in our context, such as “check” and “leak”. Based on these two reasons, we do not directly use the nltk [24] library’s pos_tag() for a real-time part-of-speech analysis. Instead, we perform a pre-analysis of words’ parts of speech in our cleaned vulnerability titles and build a vocabulary of verbs and prepositions and count their frequencies in our dataset. Eventually, we obtain a list of 33 verbs and 21 prepositions and rank them by frequencies. Table 4 shows the top 10 frequently used verbs and prepositions in our dataset.

Based on our vocabulary of verbs and prepositions and their frequencies, we are able to automatically locate the target verb and preposition for a cleaned vulnerability title in various situations using the following rules:

  • If only one verb and one preposition exist and the preposition appears after the verb (with one or more words in between), such a verb and preposition, e.g., the word fix and in of the example E1 in Table 3, are the target words.
  • If there is no verb but the preposition exists (e.g., the example E2) or there is no preposition but the verb exists (e.g., the example E3), the preposition or the verb will be determined as the target.
  • If multiple verbs appear in a title, the one with the highest frequency will be regarded as the target verb. For example, in Figure 3 (or the example E4), the word fix has higher frequency than the word read in our vocabulary, fix is used as the target verb.
  • If multiple prepositions appear in a title, the first one appearing after the target verb (with one or more words in between) is determined as the target preposition. For instance, in the example E5 in Table 3, both words on and in are prepositions, but since the word on appears before in, on is then determined as the target preposition.

Table 3: Examples of the cleaned issue/PR titles and their corresponding type keywords extracted.

IDRaw TitleCleaned TitleType Keywords
E1accounts: fix two races in the account manager[‘fix’, ‘two’, ‘races’, ‘in’, ‘the’, ‘account’, ‘manager’][‘two’, ‘races’]
E2blockchain_db: sanity check on tx/hash vector sizes[‘sanity’, ‘check’, ‘on’, ‘transaction’, ‘hash’, ‘vector’, ‘sizes’][‘sanity’, ‘check’]
E3[net] Avoid possibility of NULL pointer dereference[‘avoid’, ‘null’, ‘pointer’, ‘dereference’][‘null’, ‘pointer’, ‘dereference’]
E4wallet: Fix uninitialized read in bumpfee(…)[‘fix’, ‘uninitialized’, ‘read’, ‘in’, ‘bumpfee’][‘uninitialized’, ‘read’]
E5Prevent DOS attacks on in-flight data structures[‘prevent’, ‘dos’, ‘attacks’, ‘on’, ‘in’, ‘flight’, ‘data’, ‘structures’][‘dos’, ‘attacks’]

Figure 3: An example issue/PR title to illustrate the grammatical pattern of vulnerability titles we observed.

Table 4: The top 10 frequently used verbs and prepositions.

Verbaddremovefixmakefixedsetavoidimprovehandlingadded
Prepositioninforonofwithfrombybeforeifafter
  • If none of above applies for a vulnerability title, we conclude that it has no target word.
  • After recognizing the target verb and preposition for each vulnerability title, the keywords in between the two target words are extracted as the type for the vulnerability. However, as we list above, some cleaned titles may end up with only one target word or even no any target word. We handle those special titles as follows:
    • If only the target verb exists, all words after the target verb will be regarded as the type keywords.
    • If only the target preposition exists, all words before the target preposition will be treated as the type keywords.
    • If no target word exists, the entire cleaned title becomes the type keywords.

Clustering type keywords. With the extracted type keywords, we aim to cluster them based on their semantic meaning rather than their appearance as a string of letters. Thus, after embedding all the keywords into the vector space using word2vec [66], we choose the Word Mover’s Distance (WMD) [55] as the similarity metric. Another reason for applying WMD is that it performs well on short sentences like our type keywords. Then, we calculate their pairwise similarity with WMD and generate a large similarity matrix.

The last step is to cluster the type keywords based on the similarity matrix. To reach an optimal clustering result, we tested four clustering algorithms: K-means [32], Gaussian Mixture [27], Agglomerative Clustering [26], and Affinity Propagation (AP) [43]. The first three algorithms require a pre-defined number of clusters as the key parameter, while AP needs a damping factor. For the first three algorithms, we tried a wide range of cluster numbers from 25 to 225 with an interval of 2. For AP, we tried the damping factor from 0.5 to 1 with an interval of 0.01. We kept other parameters unchanged as default. After clustering with the given parameters, we computed the Silhouette Coefficient score [71] to determine the performance of the corresponding combination. As a result, Agglomerative clustering with 125 clusters was the best setting for our similarity matrix, which reached a coefficient score of 0.66.

5.2 Common Blockchain Vulnerability Types

According to Table 5, we obtain not only the traditional vulnerabilities, such as race condition and sanity check, but also blockchain-specific vulnerabilities. Among the top 20 vulnerability types, we find that seven of them are related to blockchains’ characteristics. In particular, the 130 (22.1%) vulnerabilities from four types (T4, T7, T9, and T12) are blockchain-specific, which are related to blockchains’ transaction, block, peer/node, and wallet key/password. Additionally, we have three more vulnerability types, T2, T14, and T20, that have some portions of their vulnerabilities related to blockchains’ features. The rest of 366 (62.4%) vulnerabilities are solely the traditional vulnerabilities, not specific to blockchains.

Next, we explain three categories of these blockchain types: specific, partially specific, and traditional. For the patterns of blockchain-specific vulnerabilities, we will present them in §6.2.

Blockchain-specific vulnerability types. Since transactions, blocks, gas fees are the unique characteristics of blockchain systems, the type T4 and T7 record a large number of such new vulnerabilities. Examples are Bitcoin PR #8312 “Fix mempool DoS vulnerability from malleated transactions” and Ethereum PR #1354 “gpo nonexistent block checks”. Moreover, as a peer-to-peer software by nature, blockchains could suffer from peer/node vulnerabilities. By inspecting 28 such vulnerabilities in the type T9, we find that they are mainly related to the unique P2P features in blockchains, such as header sync and block validation. Examples include Bitcoin PR #10345 “timeout for headers sync” and Ethereum issue #604 “SEC-41 Peer TD in NewBlockMsg not verified”. Lastly, blockchain systems often provide wallets to end users, which cause the new vulnerabilities related to wallet keys and passwords in the type T12. For example, Bitcoin PR #10308 describes the vulnerability patch of “[wallet] securely erase potentially sensitive keys/values”.

Partially blockchain-specific vulnerability types. We also observe three vulnerability types partially specific to blockchains, i.e., T2, T14, and T20. Specifically, 64 vulnerabilities in the type T2 performed various checks, e.g., error and length checks, and some of them checked blockchain-related properties. For example, Bitcoin issue #1167 “check for duplicate transactions earlier” for DoS prevention, and Ethereum PR #20546 “check propagated block malformation on reception”. In contrast, the type T14 and T20 fixed more traditional vulnerabilities related to RPC calls and database corruption (due to exceptional closing), with a few vulnerabilities directly related to blockchains. Examples of blockchain-related Table 5: The top 20 blockchain vulnerability types that affect at least ten vulnerabilities in our dataset.

IDType# Vulnerability Issues/PRsSpecific*
AllB
T1Race Condition7714
T2Check/Validation6436
T3Resource Leak4724
T4Transaction Related4324
T5Deadlock3616
T6Go Panic360
T7Block Related349
T8Denial-of-Service3117
T9Peer/Node Related2812
T10Sanity Check2811
T11Overflow2711
T12Wallet Key/Password2512
T13Uninitialized Read1914
T14RPC Related169
T15Out-of-Bound149
T16Off-by-One145
T17Segfault1313
T18Memory Pool1210
T19Nil Pointer Deref126
T20Database Corruption114
Sum587256
  • ✔ means most in this type are blockchain-specific and ⋄ means some are specific.
  • B, E, M, and S represent Bitcoin, Ethereum, Monero, and Stellar, respectively.

vulnerabilities are Ethereum PR #19401 “implement cli-configurable global gas cap for RPC calls” and Monero issue#706 “DB corruption” due to unfinished blockchain tasks.

Traditional vulnerability types in blockchains. Besides blockchain-specific vulnerabilities, Table 5 also shows that 366 vulnerabilities are solely from the 13 traditional vulnerability types. The top types, such as race condition, deadlock, and denial-of-service, are more frequent probably because it is difficult for blockchain systems to avoid them due to the sync among distributed nodes.

Further analysis. According to the detailed distribution of vulnerability types across different blockchain projects in Table 5, we make three observations. First, Ethereum has more than half of the T1 (Race) vulnerabilities, much higher than the other three. After investigating all the race-related vulnerability issues/PRs, we identify that the Swarm [28] subsystem is the major cause. Specifically, Swarm is only available in Ethereum and used for distributed storage and content distribution. Second, we notice that T6 (Go Panic) appears only in Ethereum because only Ethereum is implemented in Go. Moreover, since Go is a memory-safe language, Ethereum has fewer memory-related (T13, T18) vulnerability issues/PRs than Bitcoin. Third, we find that Monero has the most number of T10 (Sanity Check) and T16 (Off-by-One) vulnerabilities, while Stellar has the least number of vulnerability types since it is relatively new.

6 CODE-LEVEL PATTERN ANALYSIS

At the third-level of our study, we perform the pattern analysis by analyzing vulnerability patch code. In particular, we focus on blockchain-specific vulnerability types (i.e., the seven types mentioned in §5.2) since the code patterns of traditional vulnerability types like race condition, deadlock, overflow, and uninitialized read are well-known (e.g., [36, 62, 79, 82]). In this section, we first propose our approach to summarizing patch code patterns in §6.1, and then present blockchain-specific code patterns in §6.2.

6.1 Generating and Clustering Code Change Signatures for Vulnerability Patterns

To obtain vulnerability code-level patterns, our objective is to put similar patch code changes into the same cluster so that analysts can summarize patterns from each cluster. To this end, we need an effective representation of code changes so that it keeps important semantic information yet ignores unimportant or noisy information. We call this representation the code change signature. Table 6 illustrates the evolution process from raw code hunks to their code fragments (i.e., contiguous lines of code) and the corresponding code change signatures using three examples. Taking the code in Table 6b and 6c as an example, both patches check whether the sender of a transaction is valid. However, if the variable name senderAddr is different, the similarity between their raw code fragment change (i.e., the syntactic changes indicated by F2 and F3) would be low. To capture the essential changes in patch code, we do not use the syntactic changes but their code change signatures like S2 and S3, the details of which will be illustrated during their generation.

Next, we introduce our approach to generating code change signatures and clustering them. Before these two major steps, we first clean up code hunks and turn them into fragments, and then align up the changed lines of code in each fragment.

Aligning up changed lines of code in each fragment. Before we generate each code fragment’s change signature based on deleted and added lines in it, we need to first pair up the changed lines of code since only some code fragments have one-to-one line change (i.e., at most one ‘-‘ line and one ‘+’ line). For example, in Table 6, only the fragments F1-2 and F1-3 have one-to-one line change. For a multiple-line change in other fragments, we measure the edit distance similarity between each ‘-‘ line and all ‘+’ lines and pair the one with the highest similarity. For instance, line 3 in Table 6c is paired with line 8 since it has the highest similarity with line 8 as compared with all the other lines. However, some lines could be simply deleted or added, causing their similarity with all other lines to be low. We handle this by not pairing the lines with the highest similarity of less than 0.5. As a result, line 3 in Table 6a will not be paired with line 5 due to the low similarity.


Useful information for enthusiasts:

Contact me via Telegram: @ExploitDarlenePRO