Document Type : Research Paper
Authors
1 Master of Qur’anic sciences, Interdisciplinary Qur’anic Studies Research Institute, Shahid Beheshti University, Tehran, Iran
2 Assistant professor, Interdisciplinary Qur’anic Studies Research Institute, Shahid Beheshti University, Tehran, Iran
3 Associate Professor, Computer Science and Engineering Department, Shahid Beheshti University, Tehran, Iran
Abstract
Keywords
The Qur’an contains numerous references to natural phenomena and to certain of their properties (e.g. Q. 45:5; 88:17; 10:5; 2:164; 13:3–4). As a result, many scholars have been inclined to employ findings from modern science to better understand the Qur’an and to demonstrate that this revelation originates from the One who knows the secrets of the natural world (Maʿrifat 1997, 2: 443). Although the primary purpose of divine references to nature is to direct humankind toward sound epistemological and theological foundations, in some cases the Qur’an also provides remarkably precise descriptions of natural phenomena which cannot be overlooked.
Scientific interpretation is a hermeneutical approach aimed at uncovering relationships between Qur’anic verses and empirical findings, using the natural sciences to provide a clearer understanding of the scripture (Rezaei Esfahani 2007). In contrast to most exegetical methods, this approach has faced serious objections. The central critique of opponents concerns the language of the Qur’an, which differs fundamentally from scientific discourse. Scientific language typically excludes figurative expressions, metaphor, and hyperbole, because its primary purpose is to convey concepts with precision and clarity; such linguistic devices would compromise that aim. The Qur’an, however, frequently employs these rhetorical strategies. Consequently, opponents of scientific exegesis argue that the Qur’an does not utilize scientific language when addressing natural phenomena, and thus cannot be considered to carry either positive or negative implications for the natural or life sciences (Damanpak Moghadam 2001).
It is correct that Qur’anic discourse differs from the idiom of science. No scholar has claimed that the Qur’an is couched in scientific language. Nevertheless, it is undeniable that the Qur’an repeatedly mentions natural phenomena and highlights many of their features. This leaves open the question of whether the characteristics mentioned align with empirical reality. Proponents of scientific exegesis argue that divine speech about natural phenomena does indeed correspond to reality, though expressed in the Qur’an’s own linguistic register rather than in the specialized idiom of contemporary science (Mazaheri Tehrani et al. 2017).
The guiding question of this article, therefore, is whether it is possible to demonstrate through a methodological analysis that the Qur’an employs a distinctive style and register when describing natural phenomena, different from other thematic domains. If one can extract and present the Qur’an’s stylistic features in this regard, the central challenge in the Qur’an and science debate namely the gap in language and style can first be explained in detail and then potentially resolved (Darzi 2022).
The primary aim of this study is to uncover the Qur’an’s unique language in presenting falsifiable claims concerning natural phenomena. This requires comparison of various lexical, structural, and semantic features across distinct categories of verses. Accordingly, three categories were defined:
These categories form the comparative framework of this study.
The study of the Qur’anic language and style has long been one of the most debated issues in Qur’anic scholarship and has generated a wide range of perspectives. A number of foundational works have explored this subject in depth. Among them are Izutsu’s (1964) God and Man in the Qur’an, Saeedi Roshan’s (2004) Analysis of the Qur’an’s Language and Methodology of Understanding It, Farasatkhah’s (1997) Language of the Qur’an, and Sajjadi’s (2001) Language of Religion and the Qur’an. These works reflect different approaches to understanding the Qur’anic discourse, ranging from the view that the Qur’an employs the innate, natural language of humankind, to interpretations emphasizing either the general or specialized linguistic conventions in which the Qur’an operates.
Alongside these monographs, various articles have sought to clarify the stylistic dimensions of Qur’anic language. For example, Shakerin (2012) argued that the Qur’an possesses a unique fourfold comprehensiveness in its linguistic structure, while Naqib and Seyyedi (2020) examined the Qur’an’s language in ethical propositions and identified distinctive stylistic features in its rhetorical mode. These studies demonstrate that, although there is no single consensus on the Qur’an’s linguistic identity, scholars generally acknowledge its complexity and the need for specialized approaches to its interpretation.
A second line of inquiry in the literature has concerned the relationship between the Qur’an and science. Important contributions in this field include the Al-Jawāhir fi Tafsīr al-Qurʾān al-Karīm by al-Ṭanṭāwī al-Jawharī (2004), a monumental 26-volume work integrating scientific discussions into Qur’anic exegesis; Maurice Bucaille’s (1976) The Bible, the Qur’an and Science; Rezaei Esfahani’s Interaction between the Qur’an and the Science (2005) and The Qur’an and the Natural and Human Sciences (2010); as well as Golshani’s (1998) The Qur’an and the Natural Sciences. Each of these works, in its own way, explores how Qur’anic descriptions of the natural world may be understood in light of scientific discovery and empirical investigation.
In addition to these books, a growing body of articles has directly engaged with the problem of scientific exegesis. Faghfoor Maghrebi (2007; 2008) analyzed the relationship between Qur’anic and scientific propositions and discussed the aims of scientific statements in the Qur’an. Dallal (2010) considered the broader historical relationship between science and Islamic thought, while Rezaei Esfahani (2007) proposed strategies to resolve apparent contradictions between scriptural and scientific knowledge.
More recently, several case-based studies have provided detailed examples of scientific interpretation. Barati and Paymard (2022) analyzed Qur’anic references to hail in Q. 24:43, offering a meteorological explanation; Moradi (2022) investigated the ṣayḥah as a divinely inflicted punishment in the Qur’an and sought to frame it within a scientific account; Koutb (2022) addressed the Qur’an’s reference to the breaking of water in photosynthesis and transpiration as a possible scientific miracle; Ayat (2016) reflected on the physiological features of the camel from both Qur’anic and scientific perspectives; and Mazaheri Tehrani et al. (2017) proposed a typology of scientific verses, thereby classifying Qur’anic references to natural phenomena according to their interpretive potential.
From this survey, it becomes evident that the majority of prior research has been preoccupied with identifying either extreme or moderate positions toward scientific exegesis, tracing its historical development, evaluating arguments for and against it, and offering illustrative examples. While this scholarship has made valuable contributions, it has not yet produced a systematic attempt to extract the linguistic features of Qur’anic verses that describe natural phenomena. This gap highlights the importance of the present study, which not only seeks to articulate such features but also introduces a novel methodological approach by integrating data-mining techniques with classical stylistic and exegetical analysis. By employing the decision-tree algorithm to detect recurring linguistic patterns, this study provides a replicable framework capable of uncovering latent structures in Qur’anic discourse that have not been addressed in earlier works.
The present study was designed to identify the linguistic features of Qur’anic verses addressing natural phenomena that contain falsifiable propositions. To this end, representative samples of verses were drawn from three categories—scientific, natural, and supernatural—and then analyzed through computational methods. Specifically, a decision-tree algorithm was employed to determine the features most predictive of a verse belonging to the scientific category. This section describes the methodology in detail.
Decision trees constitute a widely used method for classification and prediction (Ahmad & Dey 2007). Decision analysis is employed to identify the strategy that is most likely to achieve a given goal, while decision trees can also be used to represent conditional probabilities. They require a training dataset in which all instances are characterized by a fixed set of features and a target class label. Decision trees are valued for their ease of implementation and their ability to generate transparent “if–then” classification rules (Tseng et al. 2015; Rokach 2023). Because classification methods rely on pre-specified class labels, the categories must be determined in advance. In this study, the three predetermined categories were scientific verses, natural verses, and supernatural verses (Ghazanfari et al. 2008).
All decision-tree algorithms operate in a top-down manner. The algorithm begins with a dataset whose category labels are already known and seeks to construct a tree capable of predicting the class label of new instances. At each stage, the algorithm poses a series of questions about the dataset; depending on the answers, new branches are created. If the questions are well chosen, the classification of new instances can be achieved using only a small number of questions.
Visually, the tree consists of nodes and branches, where internal nodes represent features, branches represent conditions on those features, and terminal nodes (leaves) represent predicted class labels. The central task in building a decision tree is to identify which feature to use at each node. The optimal split occurs when the resulting subsets are as homogeneous as possible with respect to class membership. This is typically measured by “diversity” (or impurity): high diversity indicates that a subset contains many different classes, whereas low diversity indicates that a subset is dominated by a single class. The algorithm evaluates all candidate features using a diversity metric (such as entropy or Gini index) and selects the feature that minimizes diversity. This process continues recursively until further splitting does not reduce diversity, at which point a terminal leaf is created (Ghazanfari et al. 2008).
Cross-validation is a method of model evaluation that tests how well the results of a statistical analysis generalize to unseen data. It is particularly useful in predictive modeling, as it provides an estimate of a model’s real-world performance. In this method, the dataset is divided into complementary subsets, with one subset used for training and the other for testing. To reduce variance, the process is repeated multiple times with different partitions, and the results are averaged.
In k-fold cross-validation, the dataset is partitioned into k subsets. Each subset is used once as a validation set, while the remaining k – 1 subsets are used for training. This process is repeated k times, ensuring that each observation is used both for training and for validation. The average performance across the k trials serves as the final estimate of the model’s accuracy. Typically, 5-fold or 10-fold cross-validation is used. In this study, 4-fold cross-validation was employed: the dataset was randomly partitioned into four subsets, with 75% of the data used for training and 25% for testing in each fold. This ensured that all 150 verses in the dataset were used in both training and testing, thereby increasing the robustness of the model.
Knowledge represented in a decision tree can be expressed as classification rules in “if–then” form. Each path from the root to a leaf corresponds to one rule, with the conditions encountered along the path forming the antecedent (“if” clause) and the class label at the leaf forming the consequent (“then” clause). Features closer to the root generally have greater predictive importance. This property can be leveraged to identify which features are most characteristic of the scientific verses.
To identify the relevant features, initially, seventy verses widely recognized as “scientific,” thirty “natural” verses, and fifty “supernatural” verses were selected. Subsequently, a wide range of linguistic, structural, and semantic features were identified from the literature on scientific verses. Additional general features were also included, yielding a total of 23 features. Each verse in the corpus was annotated with values for these features, and the dataset was analyzed using the Orange data-mining software.[1]
It is important to note that while many of the selected features were motivated by prior discussions of scientific verses, their inclusion was not limited to such cases. Instead, the expectation was that the computational method would determine which features were statistically most significant in distinguishing the scientific category. Together, these 23 features provided the basis for computational analysis. Each verse was coded according to these criteria, and the decision-tree algorithm was then applied to identify which features most strongly predicted membership in the scientific category. Below, the 23 features are defined along with the sources from which they were derived.
Polysemy occurs when a single word or phrase conveys multiple simultaneous meanings, all of which may be intended (Tayyeb Hosseini 2008). This feature was identified in Qur’anic studies through works such as Qāmūs al-Qurʾān (Qurashi 1992) and Mufradāt alfāẓ al-Qurʾān (al-Rāghib al-Iṣfahānī 1992).
At times, the antecedent of a pronoun may be open to several possibilities, leading to interpretive indeterminacy (al-Suyūṭī 2001). This feature was noted in major exegeses including al-Mīzān (Tabataba'i 1996), Tafsīr Nemūneh (Makarem Shirazi et al. 1992), Mafātīḥ al-ghayb (al-Rāzī 1999), and al-Kashshāf (al-Zamakhsharī 1987).
Verses containing falsifiable descriptions often attract diverse exegetical readings across time. This feature is especially highlighted in Al-Mīzān (Tabataba'i 1996), Tafsīr Nemūneh (Makarem Shirazi et al. 1992), Al-Kashshāf (al-Zamakhsharī 1987), and Al-Tafsīr al-Kabīr (al-Rāzī 1999), as well as in Treatise on Scientific Miracles in the Qur’an (Talebpour et al. 2024).
Certain verses allow for multiple grammatical parses, a phenomenon frequently addressed in classical grammar and interpretations (al-Suyūṭī 2001). The feature was identified through exegetical works such as Al-Mīzān (Tabataba'i 1996), Tafsīr Nemūneh (Makarem Shirazi et al. 1992), Al-Kashshāf (al-Zamakhsharī 1987), and Mafātīḥ al-ghayb (al-Rāzī 1999).
The Qur’an frequently employs rhetorical figures such as simile, metaphor, allegory, irony, and ambiguity (Shaker 2003; Tayyeb Hosseini 2008). These were tracked across exegeses—Al-Mīzān, Tafsīr Nemūneh, Al-Kashshāf, and Al-Tafsīr al-Kabīr—as well as the Qur’anic Comprehensive Website (quran.inoor.ir).
Expressions that explicitly call the reader to reflection, such as a-lam tara and a-wa-lam yara (Did you not see?), mark certain verses as cognitively engaging. This feature was directly extracted from the Qur’an.
Verses may convey either a tone of warning (indhār) or encouragement (tabshīr). Classification of tone was made by analyzing the Qur’an contextually.
The Qur’an often introduces natural phenomena by swearing oaths upon them. The presence of oaths was identified directly in the Qur’an.
Occurrences of the terms āyah or āyāt were directly counted in the Muṣḥaf.
This feature measures the presence of emphatic forms such as inna, la- of emphasis, repetition, and intensified structures. The coding followed Rabbani’s (2003) classification of Qur’anic emphatic devices.
Verses were categorized according to whether they were revealed in Mecca or Medina, based on Maʿrifat’s (2008) Al-tamhīd fī ʿulūm al-Qurʾān.
The presence of passive verbs, causative voice, or figurative assignment of agency was analyzed with reference to Aisha Abd al-Rahman's (1997) Al-Iʿjāz al-Bayānī lil-Qurʾān.
Expressions like li-qawmin yaʿqilūn (for people who think) or li-qawmin yaʿlamūn (for people who know) indicate direct appeals to rational reflection. These were identified directly in the Muṣḥaf.
Each verse’s position (early, middle, or late in a surah) was coded by dividing each surah into equal parts and recording the verse’s location in the Muṣḥaf.
The number of prepositions (ḥurūf al-jarr) in each verse was measured using the Qur’anic Comprehensive website.
Morphological augmentation in triliteral roots was analyzed to detect semantic intensification, following Maʿrifat’s (2008) discussions in Al-Tamhīd fī ʿulūm al-Qurʾān.
Instances where verbal nouns replace participles (for emphasis or intensification) were identified, with guidance from Ibn ʿĀshūr (2000).
Occurrences of particles like inna and its sisters were recorded for each verse.
The total count of finite verbs per verse was extracted using the Muṣḥaf and digital parsing tools.
Lexical derivations (from triliteral or quadriliteral roots) were counted for each verse.
Word counts were calculated from the Muṣḥaf.
The range of different Arabic letters used within each verse was measured using the Qur’anic Comprehensive Website.
The total character count of each verse was computed with the Noor Comprehensive Tafsīr Software.
After annotating the 150 verses with values for the 23 features described above, the dataset was subjected to computational analysis. The central objective was to identify which features most reliably distinguished the scientific verses i.e., those containing falsifiable propositions about natural phenomena, from natural and supernatural verses.
To extract the stylistic features of scientific verses, a decision tree was constructed with a maximum depth of five levels. This ensured that the resulting model would be sufficiently detailed to capture complex interactions among features, while remaining interpretable.
The input consisted of seventy scientific verses, thirty natural verses, and fifty supernatural verses. Prior to training, the dataset was randomized to avoid order effects, ensuring that the model did not inadvertently learn from the sequence of the verses. As described earlier, four-fold cross-validation was used, with 75 percent of the verses allocated to the training set and 25 percent to the test set in each fold.
The procedure yielded four decision trees with varying accuracy levels. The model achieving the highest accuracy (75 percent) was selected as the final tree for interpretation. Figure 1 represents the simplified structure of the decision tree. Internal nodes indicate features, and branches represent conditions on those features. Terminal leaves correspond to the predicted category (scientific, natural, supernatural).
Figure 1. Decision Tree with 75% Accuracy
From the decision tree, a series of “if–then” rules were derived. These rules were classified according to their strength and clarity:
These rules show that no single feature is sufficient to classify a verse as scientific; rather, combinations of features are predictive.
The decision tree identified the following features as the most significant predictors of scientific verses, in descending order of importance:
Taken together, these features strongly suggest that verses describing natural phenomena in falsifiable terms are stylistically distinct.
An important outcome of this analysis is the recognition that such features impose interpretive demands on exegetes. For example, syntactic multiplicity requires advanced grammatical analysis; polysemy necessitates consideration of multiple semantic possibilities; and the presence of thought-provoking expressions directly calls upon the audience to reflect and engage in intellectual effort. The Qur’an’s stylistic strategy, therefore, not only conveys information about natural phenomena but also deliberately constructs a discourse that invites continuous inquiry.
This study set out to identify the distinctive linguistic features of Qur’anic verses that describe natural phenomena in falsifiable terms, verses that some exegetes have labeled as “scientific.” By analyzing a corpus of 150 verses (70 scientific, 30 natural, and 50 supernatural) across 23 linguistic, structural, and semantic features, and by applying a decision-tree algorithm with four-fold cross-validation, the study uncovered systematic stylistic patterns.
The analysis demonstrated that the most significant features characterizing scientific verses, in descending order of predictive power, are Syntactic multiplicity (the of multiple possible grammatical analyses); Verse length (with longer verses being more likely to belong to the scientific category); presence of thought-provoking expressions; Augmented verb forms (al-abwāb al-mazīd), which add semantic intensity; Variety of letters, reflecting greater lexical diversity; Number of derivational words, indicating morphological richness; and Polysemy, where a single expression carries multiple simultaneous meanings. These features, when combined, distinguish scientific verses from natural and supernatural verses. Importantly, the study showed that no single feature is decisive on its own; rather, it is the combination of features that signals the distinctiveness of the scientific category. The derived classification rules (Golden, Silver, Bronze, Brass, and Copper) illustrated how different pathways of features could converge on the identification of scientific verses.
The interpretive implications of these findings are significant. The presence of multiple syntactic and semantic possibilities, along with thought-provoking language, demonstrates that scientific verses demand sustained intellectual engagement and exegetical effort. In other words, the Qur’an’s discourse on natural phenomena is deliberately constructed to stimulate reflection, inquiry, and reinterpretation as human knowledge advances.
This study also highlights the value of integrating computational methods with traditional Qur’anic scholarship. By employing decision-tree analysis, it became possible to uncover latent structures in the Qur’an’s stylistic patterns that might otherwise remain unnoticed. The findings thus contribute not only to the field of Qur’anic stylistics but also to the ongoing debate over the relationship between the Qur’an and science.
Future research could expand this approach by enlarging the corpus to include additional categories of verses, applying alternative machine-learning algorithms, or conducting comparative studies with other religious texts. Such efforts would further illuminate how sacred discourse encodes complex layers of meaning, and how computational linguistics can enrich the hermeneutics of scripture.
In conclusion, the study confirms that Qur’anic verses addressing natural phenomena with falsifiable descriptions possess a distinctive linguistic style that sets them apart from other categories. These features, far from being incidental, represent a deliberate rhetorical strategy aimed at inviting human beings into deeper intellectual and spiritual engagement with the signs of God in creation.
[1]- The dataset prepared for this research is available on the website of the Interdisciplinary Qur’anic Studies Research Institute of Shahid Beheshti University: quran.sbu.ac.ir/peykare