Do All Eagles Fly High? The Generic Overgeneralization Effect: The Impact of Fillers in Truth Value Judgment Tasks

Abstract The generic overgeneralization effect is an attested tendency to accept false universal generalizations such as “all eagles fly” or “all snakes lay eggs” as true. In this paper, we discuss the generic overgeneralization effect demonstrated by Polish adult speakers. We asked 313 native speakers of Polish to evaluate universal quantified generalizations such as “all eagles fly” or “all snakes lay eggs” as true or false. The control group of 107 respondents provided data on the acceptance rates of the corresponding generic generalizations such as “eagles fly” or “snakes lay eggs”. By determining the impact of test fillers on the participants’ acceptance rates, the study aimed to identify the scope of the generic overgeneralization effect. We manipulated four conditions: the universal negative, positive, neutral, and generic control conditions. The results showed significant differences between the first two conditions, but neither the negative nor the positive condition differed from the neutral one. The overall acceptance rates of universal statements were 63% for the negative condition, 49% for the positive condition, 55% for the neutral condition, and 90% for the control group. Overall, the participants accepted universal quantified statements at high rates even when they were prompted to reject them. The results may be interpreted as another piece of evidence in support of the generic overgeneralization effect.


Introduction
Generic generalizations (henceforth generics) such as eagles fly high or snakes lay eggs convey information about categories, kinds or classes . In contrast, generalizations such as this eagle can't fly Edyta Wajda, Daniel Karczewski or the snake laid some eggs are not generic, as they only apply to some individuals. Despite the fact that there are members of a category devoid of the property in question (there are albino tigers; male ducks do not lay eggs), such statements are universally accepted as true. From a logical standpoint, universal quantified generalizations such as all eagles fly high or all snakes lay eggs should be considered false because the quantifier all does not allow exceptions. Empirical data, however, show that both children and adults at times treat quantified generalizations with all as if they were generic statements (e.g., Hollander, Gelman, & Star, 2002;Khemlani, Leslie, Glucksberg, & Fernandez, 2007). Leslie, Khemlani, & Glucksberg (2011) term this phenomenon "the generic overgeneralization (GOG) effect" and attribute it to a cognitive tendency that prompts people to overgeneralize from the truth of a generic to the truth of a corresponding universal statement. Therefore, the hypothesized GOG effect is claimed to provide evidence for the genericsas-default hypothesis (Leslie et al., 2011). More specifically, the hypothesis maintains that generic generalizations are produced by a basic mechanism of cognition developed early in the process of language acquisition.
The aim of the paper is to identify the scope of the GOG effect by investigating the impact of test fillers on acceptance rates of universal quantified generalizations. The article begins by discussing the concept of genericity. In particular, this section will address some features of generics. The following section focuses on the conceptually based approach to generics, with special attention given to the GOG effect. The analytical part discusses the extent to which the internal test conditions affect participants' responses and suggests that the results may be interpreted as the substantiation for the GOG effect.

The concept of genericity
Genericity is an important element in the study of human cognition because it shows our propensity to organize our experience of the world into categories, kinds or classes (Lazaridou-Chatzigoga, 2019;Mari, Beyssade, & del Prete, 2013). Even though there are no dedicated words or morphemes encoding genericity, 1 it can manifest itself linguistically in a number of ways, including bare plural noun phrases (eagles), indefinite singular noun phrases (an eagle) or definite singular noun phrases (the eagle) or sentences (eagles fly high). Genericity, which has been the focus of attention of scholars of different disciplines, including philosophers, linguists and psychologists, has been researched under various assumptions. Theoretical models of genericity Do all eagles fly high? The generic overgeneralization effect... fall into two broad categories: quantificational view and non-quantificational view. In the former but not in the latter type, the existence of GEN, a covert operator signalling generic meaning (Nickel, 2008;Pelletier & Asher, 1997), is frequently assumed. The quantificational approaches also assume that generics are insensitive to contextual restriction (Krifka et al., 1995).
Generics are linguistic manifestations of genericity that are of interest to this research. These are statements such as eagles fly high or snakes lay eggs that express general  or essentialist (Gelman, 2003) claims about kinds of entities or phenomena. They abound in speech and are acquired early by children (Gelman, 2004;Gelman, Goetz, Sarnecka, & Flukes, 2008). They exhibit a number of properties such as (1) temporal unboundedness, (2) law-like nomic character, (3) association with dispositions, (4) resistance to contextual restriction, and (5) exception tolerance (for a discussion of these, see Lazaridou-Chatzigoga, 2019). The last feature is probably their most striking one. Simply put, generics are perceived as true even though there are albino tigers, only mature female ducks lay eggs and only 1% of mosquitoes carry the West Nile virus. Moreover, generics can be used for a variety of purposes, for example, to express knowledge about the world (water boils at 100 • C), to transmit norms or stereotypes (boys are good at math) or to manipulate the perception of reality (membrane-based filters remove bacteria). 2 The following passage is meant as a quick survey of the most direct cues to generic meaning in Polish; some others include, for example, pragmatic context, world knowledge or prosody.

Generics in Polish
Polish has a palette of resources for making generics. Consider the cases below: Unlike English, Polish has no articles (which are probably the most direct formal cues in English) and thus relies on other linguistic devices to signal a generic interpretation (Gasz, 2013;Grzegorczykowa, 2001;Karczewski, 2016;Karczewski & Wajda, 2015;Kiklewicz, 2004;Smólska & Rusiecki, 1980). However, there is no clear syntactic characteristic of generics. As the above examples show, the subject noun phrase of a generic can be a bare plural (koty 'cats'), a bare singular (koń 'horse') or a mass noun (woda 'water'). However, number is not sufficient for the kind-referring interpretation of a sentence. Note that sentences with identical noun phrases (koty grzeją się w słońcu 'the cats are getting warm in the sun', koń uciekł ze stajni 'the horse escaped from the stable' or woda nam się kończy 'we're running out of water') can lead to non-generic interpretations in Polish. Equally, there is no particular syntactic structure directly associated with generic meaning; the kind-term can occur in a subject or object position (statystyczną analizę mowy umożliwił dopiero magnetofon 'the statistical analysis of speech was only made possible by the tape-recorder'). Most generic sentences have verbs in the present tense (są 'are' or jest 'is'). Sentences in other tenses and aspects, however, are also possible and will receive a generic interpretation (Indianie oswoili psa 'Indians domesticated the dog'). 3 None of the above characteristics, taken alone or together, can guarantee a generic reading. It is only by taking a number of them that we may determine whether an utterance receives a generic or non-generic interpretation.

Kinds of connections and types of generics
As already noted above, many theoretical models have been developed to explain the complex nature of generics. In our study, we rely on the conceptually based approach (Prasada, Khemlani, Leslie, & Glucksberg, 2013), the non-quantificational framework to the study of generics. This approach seeks to explore the nature of relationships between kinds and properties. Within this approach three kinds of connections between kinds and their properties were identified, namely, principled connections, statistical connections and causal connections. Principled connections concern properties (k-properties) that are essential to the kind, in the sense that they make the kind the kind (e.g., having four legs for a dog). Principled connections involve explanatory, normative and statistical dimensions. Statistical connections, on the other hand, concern properties (t-properties) that are prevalent among the kind but are not essential among the kind (e.g., having a radio for a car). They do not license formal explanations or normative expec-tations. Finally, causal connections concern properties that are potentially dangerous (e.g., mauling children for a pit bull). They do not license formal expectations or normative expectations. Prasada et al. (2013) argue that these connections underlie different types of generics. For instance, principled connections are said to underlie majority characteristic generics (eagles fly high) and minority characteristic generics (snakes lay eggs). In the former type, the property is shared by the majority of the representatives of the kind. In the latter type, the gender-related feature (laying eggs) is shared by a minority. 4 Some other types of generics include quasi-definitional generics (ants are insects) or striking generics (pit bulls maul children) (see Leslie, 2007Leslie, , 2008, for some other types of generics).

The generic overgeneralization effect
The distinction between generic and quantified generalizations is fundamental to the hypothesis that forms the core of this research, namely, the generics-as-default hypothesis. It assumes that generics give voice to cognitively more fundamental (more primitive) generalizations than quantified statements. Leslie (2007Leslie ( , 2008Leslie ( , 2012) draws a parallel between the hypothesis and a two-system view of cognition advocated by Kahneman (2003) and argues that generics belong to the automatic, effortless and cognitively basic System 1, whereas quantifiers belong to the rule-governed and extension-sensitive System 2. As argued by Leslie (2007), generics are more fundamental to cognition and are also easier to master for children (Gelman, 2003). The generics-as-default hypothesis has received some empirical evidence from the research on children and adults who sometimes treat universally quantified statements with all as generics (Hollander et al., 2002;Khemlani et al., 2007). As mentioned in the introduction, the phenomenon has been named the GOG effect and is defined as "overgeneralizing from the truth of a generic to the truth of the corresponding universal statement" (Leslie et al., 2011, p. 17). Thus, if people believe that the statement lions have manes is true, they will tend to accept a quantified statement such as all lions have manes, because resorting to a default operation saves cognitive effort. It is claimed that the scope of the GOG effect is limited (Khemlani et al., 2007) but it might affect statements with principled connections such as majority characteristic universal statements (all eagles fly high) and minority characteristic universal statements (all lions have manes). The hypothesized GOG effect is claimed to provide evidence for the generics-asdefault hypothesis (Leslie et al., 2011).

Experimental research on the GOG effect in adults
Research on the overgeneralization bias in adults so far has involved different types of quantifiers (all, some and all the) and predication types (characteristic, statistical and striking); however, the results have indicated that the effect seems to be limited to universal majority and minority characteristic generalizations with all. In the experiments, truth value judgment tasks and a multiple factorial design were employed and the sample population consisted of English and Greek native speakers.
In the first experiment concerning the relevant tendency to overgeneralize, Khemlani et al. (2007) reported the acceptance rates for false minority universals with all at the level of 47% as compared with 89% for the equivalent generics. In the subsequent study by Leslie et al. (2011), participants judged majority universals as true 78% of the time and minority universals 51% of the time. The acceptance rates of the relevant generics were 96% and 85%, respectively. The studies showed that there existed a robust tendency to accept false universal characteristic predications with all if their generic counterparts were true. Thus, the authors concluded that the results provided evidence of the GOG effect. They, however, acknowledged the possibility of other factors that may have prompted participants to agree with false universal generalizations, namely: -quantifying over sub-kind of a given kind (statements such as all ducks lay eggs may be interpreted as 'all kinds of ducks lay eggs') (Khemlani et al., 2007;Leslie et al., 2011); -lack of knowledge (people may not be aware of the gender-restriction of certain features, e.g., that only female ducks lay eggs or only male lions have manes); -domain restriction (Stanley & Szabó, 2000), which results in people relying on the restricted set of the kind, for example, all ducks lay eggs may be interpreted as 'all female ducks lay eggs' (Leslie et al., 2011). Leslie et al. (2011) conducted four additional experiments to exclude the other reasons for the overgeneralization bias. The acceptance rates decreased, but the presumed GOG effect persisted at the level of over 30%.
The hypothesis that the overgeneralization bias resulted from people resorting to easily available generic generalizations was questioned by Lazaridou-Chatzigoga and Stockall (2013) and Katsos (2017, 2019). The researchers claimed that the experiments conducted by Leslie et al. (2011) were not sufficiently effective in excluding sub-kind interpretation and, primarily, domain restriction, which in their view was responsible for the attested tendency to overgeneralize. Do all eagles fly high? The generic overgeneralization effect...
They suggested that if the domain restriction was made salient, the effect would be eliminated or radically reduced. In their 2017 and 2019 experiments, the tested predications were presented in three contexts: neutral context (which did not refer to the feature in the predicate), supportive context of the tested claim and contradictory context (the aim of which was to provide exceptions to the claim and thus make the domain restriction salient). For English native speakers, the mean proportions of the agreement with false-majority universals were as follows: neutral 80.56 (3.82), contradictory 48.15 (4.83) and supportive 87.96 (3.14), and for native speakers of Greek: neutral 70.54 (3.82), contradictory 51.79 (4.74) and supportive 78.57 (3.89). The experiments demonstrated that contexts had a significant effect on the number of affirmative answers. Nevertheless, despite making exceptions salient, a substantial group of participants persisted in assessing false universals as true and thus the overgeneralization error was not fully eliminated. The results of the studies do not seem to confirm the hypothesis that domain restriction attests for the major part of the GOG effect.

Aims of the study
The aim of the study was to find out the extent to which native speakers of Polish accepted false universal quantified statements with 'all' and thus examine the possible scope of the GOG effect. We also attempted to investigate whether internal conditions of the test affected participants' assessment of universally quantified statements by manipulating test fillers. We used three groups of filler items: false generalizations (negative condition), true generalizations (positive condition) and a combination of false and true statements (neutral condition). Because all the tested universal statements with 'all' are logically false, we assumed that using false fillers would prompt participants to accept the tested predications, whereas true fillers would prompt them to reject the tested statement. We thought it unlikely that people would tend to choose the same option of the answer in the experimental conditions. We hypothesized that there would be a statistical difference between the acceptance rates of universal statements in the negative and positive conditions; however, neither the negative nor the positive condition would differ from the neutral condition.

Design and materials
The study involved a truth value judgment task in which participants were asked to evaluate universal quantified generalizations with 'all' by deciding whether they were true or false. We conducted three tests. Each test consisted of 9 majority quantified statements, 9 minority statements and 18 fillers. We also included a reference test, in which the acceptance rates of corresponding generic statements were demonstrated. The majority and minority statements used in the experiment included generalizations involving animals and were chosen following a pre-test conducted on 34 students of the University of Białystok. The following generalizations were tested in their generic and quantified forms: Four conditions were manipulated: 1. Generic statements 2. Quantified statements with negative fillers (all fillers were false); participants were prompted to accept the tested items 3. Quantified statements with positive fillers (all fillers were true); participants were prompted to reject tested items 4. Quantified statements with mixed fillers (50% true, 50% false); control condition

Participants and procedure
The sample consisted of 313 native speakers of Polish. We also included a reference group, comprising 107 participants, which showed acceptance rates of corresponding generic statements. The mean age of the sample was 34.61 years. Participants had to judge 36 statements: 9 majority quantified statements, 9 minority statements and 18 fillers. They saw the statements on a computer screen. Each trial consisted of one display. Participants were asked to read a statement and assess its truth by choosing one option: prawdziwe ('true') or fałszywe ('false'). The answers were recorded by a person conducting the experiment.

Results
Figure 1 presents the distribution of the respondents' answers divided into four conditions and two predication types.
The data indicate the difference in the percentage of the affirmative responses between generic and quantified statements. In the case of the quantified statements, the respondents chose 'false' answers far more frequently, although situations when a respondent rejected all the statements (which would be represented by 0% on the x-axis) were relatively scarce. In the further analysis, non-parametric tests were employed due to the lack of a normal distribution of the responses in the generic condition. The Kruskal-Wallis test showed that there was a statistically significant difference in acceptance rates among four conditions and two types of predication: majority and minority (chi-square = 255.55, df = 3, p-value < 2.2e-16) and among four different conditions when the type of predication was not taken into account (chi-squared = 153.06, df = 3, p-value < 2.2e-16). Figure 2 shows the average percentage of 'true' answers for generics and for quantified statements divided into four conditions and types of predication, including 95% confidence intervals based on bootstrapping. Because the responses to the two predication types do not differ significantly, Figure 3 presents the average percentage of 'true' answers for generics and quantified statements divided into four conditions without analysing the predication types separately.
The results showed a significant reduction of the acceptance rates between the generic and quantified statements. False universal statements were judged true 63% of the time in the negative condition, 49% of the time in the positive condition and 55% of the time in the neutral condition as com-  pared with 90% in the reference generic test. The overgeneralization bias was considerable even when participants were encouraged to disagree with the statements. The Mann-Whitney U test demonstrated significant differences (p > .001) between the generic and, respectively, all of the quantified statements in three filler conditions. The results of tests with negative and positive fillers also differed significantly, although when predication types were analysed separately, the difference between minority statements in the negative and positive conditions was insignificant (p = .0031). Thus the use of negative fillers as compared with positive fillers may have boosted the affirmative responses to majority predications.
No significant differences were found between negative filler and mixed filler conditions (p = .009) or between positive filler and mixed filler conditions (p = .086). Because the test with the combination of true and false fillers was treated as a control condition, the results may suggest that, generally, the impact of the kind of fillers on the participants' level of agreement with false universal generalizations using 'all' is non-existent or largely limited.

General discussion
This study aimed to investigate the scope of the overgeneralization bias as shown by adult native speakers of Polish. The results show a significant reduction of the acceptance rates between the generic and quantified statements. Nonetheless, the overgeneralization bias demonstrated by the acceptance of false universal quantified statements with 'all' is substantial and exceeds 50% in the neutral (mixed filler) condition, and, although participants were encouraged to reject the statements, it nearly reaches 50% in the positive condition. It is important to observe that false majority and minority quantified statements were accepted at similar levels; this may indicate the relevance of the principled connection and, indirectly, its boosting impact on the overgeneralization error. The results thus may be interpreted as another piece of evidence in support of the GOG effect.
We hypothesized that the use of different fillers would affect participants' responses: negative fillers would increase the acceptance rates, whereas positive fillers would decrease them. We also predicted that those fluctuations in rates would not deviate significantly from the rates in the neutral condition. The obtained results confirm our hypotheses to a great extent. The use of different fillers did not have a major impact on people accepting or rejecting false universal generalizations with 'all'. In the case of the minority statements, the effect was even less prominent than we had predicted; there was no significant difference between the two opposite conditions. The provision of only negative fillers may have boosted the acceptance rates of quantified majority statements to a significant level as compared with the positive condition, because the access to counterexamples in some tested statements was not easy, for example, people may not have been aware that there are albino tigers.
It seems that the overgeneralization error may be manipulated experimentally both by the employment of different contexts (Karczewski, Wajda & Poniat, forthcoming;Lazaridou-Chatzigoga & Stockall, 2013;Lazaridou-Chatzigoga et al., 2017, 2019 and by the use of different fillers, but it seems to persist across various conditions. People's decisions to accept or reject a false universal statement may be influenced by a number of factors; however, the research so far has demonstrated that when assessing the truth validity of universal statements, people seem to rely on the truth conditions of the relevant generic generalizations. The results of our experiment clearly indicate that the tendency to accept false universal characteristic predications with 'all' is universal and the proportions of the error are comparable across languages.