"A Princess and a Guy Like Me?"

Gendered Dialogue in a Galaxy Far, Far Away

Number of Words By Gender

These visualizations show every word spoken in the Star Wars saga films broken down by gender. Any characters who have an unclear gender, such as Jawas, Gonk Droids, and porgs, are classified as unknown gender.

Dialogue Duration By Gender

These visualizations show the amount of time different genders speak in seconds, as indicated in subtitle files. In addition to spoken words, this also includes non-transcribable speech, such as R2-D2’s beeps, Chewbacca’s roars and howls, and noises from other droids and aliens.

How Star Wars Compares to Movies as a Whole

In April 2016, Hanah Anderson and Matt Daniels created what they called “The Largest Ever Analysis of Film Dialogue by Gender” using as many movie scripts as they could find, 2,000, downloaded from the Internet. This analysis, which they stated was not meant to be fully academic, found that the dialogue in Hollywood films in general is very male dominated.

quill

In one of the charts they created, shown above, it is clear that the largest group of films fall into the 60% to 90% male range. Most Star Wars films fall into this group as well, with many in the high 80% or even low 90% in the case of A New Hope. Star Wars films thus follow the general pattern of Hollywood films, though possibly scewing even more male than the average.

Here is another view of their data: a scatterplot containing one dot for each of the 2,000 films, and placed based on how heavily male or female the film is:

quill

Data Source

  1. The Phantom Menace: Dialogue - Characters - Characters with WordCounts
  2. Attack of the Clones: Dialogue - Characters - Characters with WordCounts
  3. Revenge of the Sith: Dialogue - Characters - Characters with WordCounts
  4. A New Hope: Dialogue - Characters - Characters with WordCounts
  5. The Empire Strikes Back: Dialogue - Characters - Characters with WordCounts
  6. Return of the Jedi: Dialogue - Characters - Characters with WordCounts
  7. The Force Awakens: Dialogue - Characters - Characters with WordCounts
  8. The Last Jedi: Dialogue - Characters - Characters with WordCounts
  9. The Rise of Skywalker: Dialogue - Characters - Characters with WordCounts

I used the English subtitle files for each Star Wars saga film to create the data set. For the original trilogy, I used subtitles created by fans specifically for the original versions of each film in Project Threepio. George Lucas famously made several changes to the original trilogy films in multiple “special editions” released in the late 90s and early 2000s. While the majority of these changes are visual, some involve alterations to the dialogue. I decided for the purposes of this project that the original, unaltered dialogue of the films was the most accurate for analyzing Star Wars films in the original, historical context that produced them.

George Lucas also made a handful of minor changes for the home release and later re-releases of the prequel trilogy, some of which involve dialogue changes. In this case, however, I opted to use the official English subtitles on Disney+, as the time between the original creation of the films and Lucas’s alterations is significantly smaller than the original trilogy. Thus, the historical context does not strongly differ.

The sequel trilogy does not have any changes between theatrical and home release versions aside from a change to the music in the conclusion of The Force Awakens, so for those films I simply used the official English subtitles used on Disney+.

Many previous studies of film dialogue have not used subtitles as their source of analysis, opting instead for a csv version of the original screenplays. While these are likely less time consuming to create because the characters are already identified, oftentimes the screenplay and the final film differ, sometimes in significant ways.

Data Munging

The original subtitle files only contained the raw dialogue as well as the timestamp of the start and end time of that dialogue. I therefore had to go through and identify the speakers of each line of dialogue, which I typically did either by memory or by consulting the film itself or the script. This was fairly straightforward for most of the dialogue, but Star Wars is rather notorious for giving very minor background characters not only names, but extensive backstories in supplementary media such as books and comics. For example, the Stormtrooper who exclaims “Look sir! Droids!” upon finding the escape pod in the desert in A New Hope is named Davin Felth and was featured extensively in a 1995 short story “When the Desert Wind Turns.” Many, if not most, of the characters with bit parts in the original trilogy at least have names, and I decided it was important to establish as many identities as I could to ensure I had identified the proper gender for these characters (more on that later). Typically, the way I made sure these identities were correct (if they were not identified in the screenplay) was to Google their dialogue, and their entry on Wookieepedia would usually appear.

I also decided to include as many non-human characters as were indicated in the subtitles, even if some of these alien characters might be more synonymous with real-world animals. This includes Jawas, tauntauns, the wampa, and porgs, along with several droids. Many of these aliens and droids communicate using either non-English words or animalistic noises, but I noted their occurrence in the subtitles. While the subtitles also noted various human sounds, such as laughter, grunting, or screaming, I decided not to include them in the data set, as it would be too cumbersome to identify every occurrence in each film. I chose to only identify words for beings that communicate with words, while identifying sounds for beings who communicate only with sounds.

Star Wars also features several fictional alien languages. Sometimes these languages are subtitled into English, in which case I simply put the English translation into the data set. In cases where it wasn’t translated, I used a fan-written transcription of the alien language if available. This primarily only occurred for aliens in Jabba’s palace and the Ewoks in Return of the Jedi.

Gender Identity of Fictional characters

The task of establishing which genders have a greater amount of dialogue in Star Wars films demands identifying the gender identity of every character with dialogue in the films. Though this is seemingly straightforward, a growing interrogation of the gender binary demands that we carefully consider how we assign gender to others. For gender theorists, gender is thought of as a personal expression of individuals, where the individual is the one who has ultimate authority over their gender identity. This is more challenging for fictional characters who do not have the ability to proclaim their gender identity (unless an author explicitly writes them as doing so). Certainly, George Lucas conceived of and wrote Luke Skywalker as man, and presumably most audience members see Luke as a male character; however, many fans engage with the gender of fictional characters in ways that alter the way the author intended. One example is “gender swap” or “gender bend,” in which fan creators draw fan art or write fanfiction in which characters have a different gender identity than that of the original text. A Google search for Luke Skywalker gender swap reveals numerous examples of edited screenshots depicting Luke Skywalker as a woman. Additionally, the practice of headcanon involves fans imagining the characters as always having a gender identity that differs from the author’s intent. The tumblr blog “Queering Star Wars” promotes the creation of “queer headcanons,” some of which involve depicting Star Wars characters as trans, gender queer, or non-binary.

This issue is even more complex when it comes to both the droids and aliens of Star Wars, many of whom do not follow the Western standards of what constitutes a male or female identity. Even though droids do not reproduce or have a clear need to have a gender, in universe many droids are programmed with a gender identity. There is little in-universe explanation why droids are gendered – Wookieepedia states that this is to make droids appear “more life-like,” but their only citation for this is the film A New Hope, which does not actually have this explanation. From a Doylist perspective, Lucas created the original droids, R2-D2 and C-3PO, as “relatable” male characters in the 1970s, a couple of decades before the concept of non-binary genders was widely known. Furthermore, the majority of droids in the films are male, reinforcing the patriarchal notion that male is the default gender.

For the purposes of this project I decided to represent the canon of Star Wars as close as possible. Since so many of the characters, even minor ones, have backstories, many are referred to with pronouns in supplementary materials, and thankfully Wookieepedia notes the gender of these characters on their respective pages. If the character’s gender is not clear either in the original film or supplementary materials, Wookiepeeda puts the page in the category Individuals of unspecified gender. Accordingly, I used three gender categories, Male, Female, and Unknown gender, the last category for instances in which the character’s gender is unclear. Characters in that category include Gonk droids, Jawas, mouse droids, porgs, mynocks (who reproduce asexually and therefore might be agender), Gamorrean guards, and various unidentified Ewoks. Admittedly, there may be some cases in which Wookipeepedia assumed a character’s gender based on their perceived gender expression instead of the use of pronouns. I did the same especially in the sequel trilogy, where some very minor Stormtroopers and Resistance pilots are not identified, but their voices sound either clearly male or female (notably, the sequel trilogy has several female Stormtroopers and Resistance pilots, unlike the original trilogy). I admit that this assumption reinforces the idea that the gender binary is the default, and therefore I may be too strongly assuming that the majority of characters, particularly human ones, fall into this binary. While I could, perhaps, record these background characters as having an unknown gender, I feel that that would not accurately reflect the intentions of the creators of Star Wars, even if these intentions perpetuate enby erasure.

It is worthwhile to mention that, in recent years, Star Wars has begun to feature trans and non-binary characters in media outside the films. The first was Eleodie Maracavanya, a pirate featured in the 2016 book Aftermath: Life Debt, who uses the pronouns “zhe/zher,” and more recently Keo Venzee was featured prominently in the 2020 video game Star Wars: Squadrons and uses “they/them” pronouns. Keo was voiced by non-binary actor Bex Taylor-Klaus.