The Gender Novels Project


Gendered Pronoun Frequency Analyses

Several analyses of gendered pronouns in 19th century novels were conducted. First, the average frequency of male and female pronouns in a 19th century novel was analyzed. Second, the portion of subject pronouns that are female pronouns were analyzed and compared with the portion of all pronouns that are female pronouns. Lastly, the proportion of female pronouns that are subject pronouns and the proportion of male pronouns that are subject pronouns were analyzed.

Analysis #1 - Frequency of Gendered Pronouns

Code from gender_pronoun_freq_analysis.py was used to determine the average proportion of the total number of pronouns in a given novel are female for each novel in the Gutenberg corpus. Then, results were binned by author gender, date published, and location of publication.

Overall

Frequency of gendered pronouns:

  • Female pronouns: 0.325
  • Male pronouns: 0.675

On average, a novel in the corpus contains 33.2% female pronouns and 66.8% male pronouns. This result has not been tested for statistical significance.

By Author Gender

Frequency of gendered pronouns, sorted by author gender:

Male author: 0.25

Female author: 0.53

On average, a novel by a male author in the corpus contains 24.6% female pronouns and a novel by a female author contains 52.8% female pronouns. This difference has been shown to be statistically significant at the p = 0.05 level by an independent t test.

By Date

By Publication Location

  • United Kingdom: 0.33
  • United States: 0.34
  • Other: 0.33

No patterns or significant difference was found between categories for publication date or publication location.

Analysis #2 - Frequency of Gendered Subject Pronouns

Code from gender_pronoun_freq_analysis.py was used to determine the average proportion of the total number of subject pronouns in a given novel that are female for each novel in the Gutenberg corpus. Then, results were binned by author gender, date published, and location of publication. Then, these data were compared with the data from Analysis #1 to determine if there is a difference in gender pronoun usage for subject vs object pronouns.

Overall

Frequency of gendered pronouns among subject pronouns:

  • Female pronouns: 0.325
  • Male pronouns: 0.675

On average, a novel in corpus contains 32.5% female subject pronouns and 67.5% male subject pronouns. This result has not been tested for statistical significance.

This result was then compared with the overall results for Analysis #1. No significant difference was found at the p = 0.05 level.

By Author Gender

Frequency of female subject pronouns sorted by author gender:

Male author: 0.24

Female author: 0.52

On average, a novel by a male author in the corpus contains 24.0% female subject pronouns and a novel by a female author contains 52.0% female subject pronouns. This difference was shown to be statistically significant at the p = 0.05 level with an independent t test.

The results for female authors and male authors were compared with the results in Analysis #1. No significant difference was found at the p = 0.05 level. This implies that female and male authors do not use gendered pronouns any more or less frequently in the subject than they do overall.

By Date

By Publication Location

  • United Kingdom: 0.33
  • United States: 0.33
  • Other: 0.32

As in Analysis #1, no patterns or significant differences were found for date of publication or publication location.

Analysis #3 - Frequency of Subject Pronouns among Gendered Pronouns

Code from gender_pronoun_freq_analysis.py was used to determine the average proportion of the total number of pronouns of a given gender that are subject pronouns for each novel in the Gutenberg corpus. Then, results were binned by author gender, date published, and location of publication.

This analysis is separated into two parts, 3a and 3b. 3a is the analysis of female pronouns and 3b is the analysis of male pronouns.

Overall

Portion of gendered pronouns that are subject pronouns:

Male pronoun: 0.74

Female pronoun: 0.47

On average, for a given novel in the corpus, about 74% of the male pronouns are subject pronouns and about 47% of the female pronouns are subject pronouns. This difference was shown to be statistically significant at the p = 0.05 level.

3a - Female Pronouns

Portion of female pronouns that are subject pronouns:

By Author Gender

  • Male: 0.46
  • Female: 0.48

No significant difference at the p = 0.05 level was found for these categories.

By Publication Date

By Publication Location

  • United Kingdom 0.46
  • United States: 0.47
  • Other: 0.47

No patterns or significant differences were found for any of the above categories.

3b - Male Pronouns

Portion of male pronouns that are subject pronouns:

By Author Gender

  • Male: 0.75
  • Female: 0.73

By Publication Date

By Publication Location

  • United Kingdom 0.73
  • United States: 0.74
  • Neither: 0.75

No patterns or significant differences were found for any of these categories.

Discussion of Results

Analysis #1 shows that female authors use female pronouns more often than male authors.

Analysis #2 shows that female authors use female subject pronouns more often than male authors, which is to be expected based off the results of Analysis #1. No significant difference was found between the results of Analysis #1 and Analysis #2. This implies that the proportion of subject pronouns that are one gender or another is the same as the proportion when considering all pronouns.

Analysis #3 shows that male pronouns are used more often in the subject than female pronouns. It was also found that there is no significant difference in male/female subject pronoun frequencies by author gender.

The comparison between Analysis #1 and Analysis #2 and the results of Analysis #3 seem to present conflicting conclusions. The comparison between #1 and #2 seems to imply that pronouns of a particular gender are used in the subject at the same rate that they are used overall. Analysis #3 presents that most male pronouns are subject pronouns and most female pronouns are object pronouns.

This may suggest that most protagonists in this literature are males. Thus, it would be logical that male pronouns would be used more often, and since the male is the main character, when thost pronouns are used they would be subject pronouns. However, one would expect male and female authors to then show a difference in Analysis #3b as in Analysis #1. Since this is not the case, it seems that female authors do include more female characters, but they must be supporting characters that receive action instead of do the action. Pronoun frequencies alone cannot support this conclusion, and more research is needed on this subject.

Overall, female authors from this time period use female pronouns more often on average than male authors. Male pronouns are used more often in the subject than the object, and female pronouns are used more often in the object than the subject. This result does not vary based on author gender.