Ep4—Metadata Meta-Awareness Transcript
MOLLY: On a hot afternoon in July, I visited Kaytlin Bailey at her tiny apartment in Greenwich Village. There were bookshelves covering the walls. I scanned some of the titles – “Love For Sale.” “Taking the Crime out of Sex Work.” “Sex Workers Unite.” Kaytlin uses these books to do research for her podcast.
KAYTLIN: Every person on this planet knows might even possibly love somebody that has done some kind of sex work at some point in their life… we call it the oldest profession. It’s all kinds of people who have been doing all kinds of things for all kinds of reasons forever.
Listening to this podcast makes people that participate in this industry understand the richness of their own history and can give people the strength to come out. I think that we need to have an experience like what happened with the LGBT movement, is that the reason people believe all these false narratives about sex work is they don’t think they know a sex worker.
MOLLY: Kaytlin agreed to participate in our podcast preservation curriculum. I made this trip to her apartment to learn more about her and why she makes her podcast. Kaytlin told me that she’s a comedian and an amateur historian, and a former sex worker. This podcast is a space where it all comes together for her. KAYTLIN: Welcome back to the OLDEST PROFESSION PODCAST, an irreverent history of audacious whores. Today, we are talking about every old pro’s hero, Marilyn Monroe.
Welcome to Preserve This Podcast, a show about how to save our podcasts. I’m Molly Schwartz. This podcast is brought to you by the Metropolitan New York Library Council with support from the Andrew W. Mellon Foundation.
MOLLY: Hi, it’s Molly here again. If this is your first time tuning into Preserve This Podcast, I suggest you go back and start listening at the beginning. This is a five-part series where we teach podcasters how to preserve their files. In the last two episodes, we focused on things that podcasters can do on their own – things like naming conventions and backing up hard drives.
In these next two episodes, I’m still gonna give you some tips for how to practice good personal digital hygiene. But I’m also going to go big picture on podcast systems.
Today, we’re tackling metadata. We’re giving Kaytlin some advice on how to use metadata to preserve her podcast.
Metadata is any information that describes what’s inside a file. So I want you to think about it like an access point into a file. Which is why I’m gonna be talking a lot about how metadata helps with podcast preservation AND podcast accessibility. And it makes sense that we think of about preservation and accessibility together because preservation and access are just two sides of the same coin. In order for a podcast to be preserved, we need to ensure its long-term accessibility. All the metadata we create to help with preservation also has the added bonus of making your podcast more accessible TODAY. And also more discoverable.
If you’d like to follow along, you can download our zine from preservethispodcast.org. We’re starting on Page 12.
Kaytlin’s really passionate about her podcast. She’s read a lot of academic books and articles about the history of sex work. She wants to make these little-known histories more accessible to a broader audience.
KAYTLIN: A lot of academic writing is written for other academics, and so I feel like a bit of a popular interpreter that I sort of like suffer through research that is being done by folks and then try to translate those stories. I know how isolating the shame, stigma, and silence is of being a sexual minority and a member of a criminalized population, so just the specific, citable, referenceable knowledge that there are people that came before you, I think can be, can alleviate a lot of emotional suffering.
Kaytlin takes her job as a popular interpreter seriously. She presents dense information in a way that’s funny and relatable.
But… her podcast is made up of digital audio files.
Kaytlin can as great of a storyteller as she wants. These stories won’t be accessible in the long term if she doesn’t pay attention to the technical needs of her podcast. Which isn’t something she enjoys.
KAYTLIN: I don’t know how to get the podcast on the internet and I certainly don’t know how to preserve things on the internet. I just discovered like 48 hours ago that not all of my tweets are available from forever ago and had like a panic attack about that. So i don’t know how the internet works, I don’t know how archiving works, so this is actually one of the reasons why I reached out to you is I think it would be deeply ironic if a project born from the impulse to stop the erasure of history got erased because I’m dumb with technology.
MOLLY: Yeah no, we’re all in that boat. I’m going to be on this journey with you [laughter] So, do you feel like you can access all of your past podcast episodes and where would you go to get them?
KAYTLIN: yeah, I would go to iTunes and if that didn’t work, I would email Mary, probably drunk. Yeah that’s my whole plan.
MOLLY: Mary is Kaytlin’s old producer. Now she has a producer named Justine.
So they’re not anywhere like on your computers, your hard drives?
KAYTLIN: I think that they are on possibly Mary and possibly Justine’s computers and archives, but I don’t know that for sure. Or even what that means exactly. MOLLY: If Kaytlin really wants to make the stories in her podcast to make these stories accessible, then her actual podcast needs to be accessible. Which is why we’re we focusing on metadata.
Metadata is the information that’s attached to Kaytlin’s MP3 files. The more descriptive and accurate that information is, the more it will make these files accessible.
That’s because metadata lets us know what’s inside a file, at a glance. Without opening it up. Here’s Mary Kidd, one of our archivists.
MARY: That’s what metadata is, it’s just kind of like a label on a package that you’re sending out in the mail. It’s the address and it doesn’t tell you exactly what’s actually inside of the box, but it does tell you something about it.
MOLLY: The actual audio inside of an MP3 file is like the inside of a package. And the data we have about that MP3 file – data like the date it was created, or the file type. These are all metadata. They’re like the address stamped on the outside of a package.
Dana Gerber-Margie, our other archivist, gives an example of how this works.
DANA: I take a lot of photos on my phone and it backs up to Google photos… I don’t really care about the aperture and stuff that is in the metadata. But Google still is trying to impose order. There’s an album of things, there’s an album of places, there’s an album of people’s faces. So it’s like even when I chose not to, Google still tries. They like auto-create all of this stuff.
MOLLY: Metadata captures information about how and when a file was created – like the way Google was encoding what kind of aperture Dana used to take her photos.
Without metadata, audio files are kind of like a black box. We don’t what sounds are inside an MP3 file until we click on the file and open it up. But we need a reason to know why we should click on it in the first place. We need a reasons to listen. Metadata give us those reasons. It tells us what a podcast is about. And who made it. And when they made. All of this information is attached to the file as metadata. And all these little metadata provide access points into the black box.
For those of you who are thinking – I don’t create metadata. And yet I know what’s in my MP3s and I share them around and people click on them – you’re wrong. You are creating metadata, you’re just doing it without realizing it. And your computer is doing a lot of it for you.
We’re going to show you how to create some metadata, intentionally. You know BEST what’s inside our MP3 files. You should take any opportunity you have to self describe your content.
So, here are some tips for getting the most out of your metadata.
There are two places you can create metadata about podcast files. One is inside of the MP3 files you create. And the other is inside of the RSS feed. The RSS feed kind of wraps around your MP3 file, and distributes it to podcatchers. I’m gonna walk you through some best practices for doing both kinds of metadata entry – MP3 metadata, and RSS metadata.
So let’s start with MP3 files. All MP3 files have built-in space for metadata. It’s written into the code of the file format. These spaces are called ID3 tags. Podcasters can edit these ID3 tags directly.
But a lot of podcasters don’t. And by not filling this in, they’re missing an opportunity to help with preservation and accessibility.
If anyone is listening to a podcast outside of an app or a streaming player, then it’s the ID3 tags that make the episode title and artist and description show up properly.
If you want an example of the difference between a file that has ID3 tags filled in versus one that doesn’t, you can go to preservethispodcast.org. Download our episode 3 and then download our episode 4. Open them up on your own computer on iTunes. See if you can tell which one I’ve edited the ID3 tags on.
There are a couple of ways you can fill out these ID3 tags.
One way is to encode metadata into an MP3 file when you bounce it out from a digital audio workstation – also known as a DAW. A lot of DAWs have a form where you can encode metadata in the export process. But not all of them do. I use Reaper, and when I render a file as an MP3, it doesn’t let me encode ID3 tags.
If your DAW doesn’t let you encode ID3s, there are a couple of workarounds. One option is to edit ID3 tags using tools that were made for doing this. A couple examples are ID3 Editor, MP3Tag, and Tagr.
I prefer a different option. Which is to encode my ID3 tags in Audacity. Audacity is a free DAW that anyone can download. I just render the final version of my episode as an uncompressed WAV file, then open it in Audacity. Then I export it as an MP3 from Audacity. I personally just find it easier to do it this way. During the export process, a window pops up that lets me encode the ID3 tags. In the space where it says Artist Name, I fill out my name. Where it says Track Title, I fill out the name of my Episode Title. Album Title is the name of my podcast. Track Number is the episode number. Year is the year I’m posting it. Genre is Podcast. The Comments is a space where I can put an episode description.
Let’s show Kaytlin how to put this into practice. Mary created this text MP3 file. It’s a recording of her talking about her favorite flavors of seltzer water.
MARY: Love seltzer, drink it pretty much everyday…
MOLLY: Then Mary shows Katelyn how to view the metadata attached to this MP3.
They open the file in VLC, an audio player. Then they click “Window” in VLC’s top navigation bar. Then they selecting “Media Information.” You can also do this on iTunes by dragging an MP3 into the iTunes library, then right click on it, and selecting “Song info”. In both cases, a form pops up.
MARY: So, what you’re seeing here is a form, and this form corresponds to what are known as ID3 tags that live inside of an MP3.
MOLLY: Mary had already filled out some of the fields on her seltzer MP3. Like the episode title and description. But some of the fields were still blank.
MARY: So these are basically fields that you can fill out, so why don’t you just go ahead and fill out some fields here?
MARY: Artist, obviously
KAYTLIN: Right. Album, I would say “Thoughts on… ” I would say thoughts on… Description: Mary, a working artist, shares her thoughts and experience with seltzer…
MARY: A working artist, oh wow.
MOLLY: Kaytlin tries to save metadata she’s created. But an error message pops up.
KAYTLIN: Error! Uh oh. Error while saving meta. VLC was unable to save the metadata. This would be the point where I would throw computer out of the window and leave.
MARY: Actually don’t throw the computer out yet. Click on OK. Shut down VLC and re-open it.
MOLLY: When Kaytlin re-opens the file, she sees the description has changed back to Mary’s original description.
MARY: And it says, encoded by iTunes.
MOLLY: This took us by surprise. This was not how we thought this exercise would go.
MARY: This might be an iTunes thing that’s like over-writing the metadata fields…
K: So like it didn’t save my title, it didn’t save my artist, it didn’t save my genre, it didn’t save my description. And it put its own shit in there.
MARY: I have a feeling iTunes is over-writing metadata fields.
K: Well Apple is over-writing a lot in our lives. So that makes perfect sense. MARY: Again, our fate in the hands of…
K: Some company making choices we don’t understand based on principles we don’t respect. It’s starting to make the kind of fatalism of comedians, of like I produce stuff. Maybe someone will save it. Maybe they won’t. I’ll do more drugs. Like make a little more sense to me, because there’s kind of a fatalism as that technology outpaces our individual ability to keep up with it.
This whole experience with iTunes over-writing metadata was weird. But it’s actually a great lesson. Whenever we upload our MP3 files to a podcast host, they get re-processed. Sometimes, that changes the file. And it’s possible that the reprocessing could change the metadata.
Once your podcast is out on the internet, it’s out there. We can’t guarantee that someone won’t process it or download it or strip the metadata.
But, no matter what, when you fill out ID3 tags, YOU will always have that information on you file on your OWN computer. It’ll give you context about your files years from now, when what you’ve created isn’t fresh in your mind.
And there’s a good chance it will stick with your file when other people download it. And I can guarantee you that if you don’t encode ID3 tags, then that information DEFINITELY won’t be there.
So these ID3 tags will come in handy first of all for you, and second of all anyone who listens to your file outside of an app or streaming services. Just ask anyone who listens to MP3s the old-school way, by downloading them onto their computer or… iPod.
I know that ID3 tagging sounds time intensive. But it really only takes a few minutes. I’ve started doing it. In fact, I left a little surprise in the metadata for this episode. If you download the MP3 of this podcast episode from our website, preservethispodcast.org, just open it in VLC or iTunes, you’ll be able to see the surprise that I left there.
OK, so that’s a little bit about how to edit MP3 metadata. Now we’re going to get into the metadata stored in RSS feeds.
Think of your RSS feed like a cocoon of text that wraps around your MP3 file. This text includes information about what your MP3 file is and where it came from. That text is, of course, metadata. Then podcatchers, like Apple Podcasts and Stitcher and RadioPublic, use the metadata in your RSS feed to deliver your podcast to the people who subscribe to it.
Podcast RSS feeds need to follow specific metadata formats. If they don’t follow these formats, then podcatchers won’t be able to grab them. Which means people won’t subscribe and listen! So it’s very important stuff.
Apple Podcasts has been the one setting the metadata standards for RSS. That’s because it’s been the most popular podcatcher since the birth of podcasting – it still gets around 70% of the podcast listening market.
So everyone just follows their RSS feed standards. But you might be thinking to yourself, how does that metadata get there?
That’s where podcast hosting platforms come in. If you don’t self-host your podcast, you’re using a hosting platform. Most of them charge a monthly fee. Kaytlin uses Libsyn. She likes using it to get stats about who’s listening.
KAYTLIN: This is the Libsyn thing. So we can see we have a listener in Nigeria which is exciting because we’re doing an episode on sex work in colonial Nigeria, Uganda… Russia, we have 6 listeners which is very exciting. China, 3. Australia, 222. That’s cool, and then here in the U.S…
A really important things that these hosts do is fill in the RSS metadata fields for you. This happens when you upload your file.
So, where does the host get the data? I mean, Libsyn doesn’t know what’s inside Kaytlin’s audio files.
Well, Kaytlin gives that information to them.
This is important. When you upload your file to a podcast host – this is the second place that you’re touching metadata. The first place was the ID3 tags, so in the MP3 file itself. The second place is in the RSS feed, when you do the upload.
Every time we upload a podcast episode, we’re required to fill out some information. You should fill out all the fields that your host lets you – even the optional ones, like keywords and copyright information.
The most important fields to focus on here are the episode title and the episode description. Google’s search engine has started scraping podcast episode descriptions to deliver its search results. Which means if you have a good episode description, it makes it a lot more likely that your podcast will come up when someone search for something related to it.
Right now, search algorithms can read text. But they can’t read audio. That means if you want your audio files to come up in search results, there needs to be text attached to the audio file. The text is attached to the file as metadata. That’s why the text you enter in when you upload your file is so important for discoverability. It’s the only place that you can control whether or not your podcast pops up in search results.
Include ANY words that you think people might type into Google to look for content like yours.
Think of writing a good description like sending off your podcast with a kiss. Bye podcast, have fun trip on the internet. Make a lot friends.
They grow up so fast!
The Preserve This Podcast team basically had goal when it comes to metadata: we want to empower podcasters to edit their own metadata. And show them that they can use this metadata to make their files accessible, now and in the future. The power of metadata is that it lets you self-describe your content. When software systems auto-generate that content, they’re taking away some of that power from you.
In general, everything on the web is moving in the direction of auto-generated metadata that’s created by machine learning. Imagine if a recommendation algorithm, powered by machine learning, started tagging Kaytlin’s podcast. What kind of keywords would it come up with? Would it represent sex workers as criminals? Would it flag Kaytlin’s use of the word “whore” as a derogatory term, even though Kaytlin uses it affectionately?
I personally feel more comfortable with podcasters designating their own keywords.
Mary and I sat down to talk about this conundrum.
MARY: You realize that metadata was not put into like a file structure for the purpose of preservation and archiving. It was put there for other reasons. And so the fact that you can’t like toggle a button that says like, OK lock this metadata down, do not override this metadata. You have that option with other formats. For example, a floppy disk has this little tab you can move over with your finger and that will lock it down. Like if you put it into a drive that drive can’t override any data in that disk. you can’t do that with metadata, and that’s a fundamental flaw.
MOLLY: Right now so much happens on the back end. We don’t really feel confident telling you that you metadata that will stick with your file as it goes through all these software systems.
But we came to a conclusion that there is another metadata tool out there. It’s a really good way to make your podcast more accessible. And it’s something that has nothing to do with tags in MP3 files or in RSS feeds. I’m talking about transcripts.
MARY: And transcriptions don’t live in any sort of like prescribed ID3 field or you know, metadata field in like a WAV file. They kind of sit outside of the file. Maybe one day there will be a place to put it in the file, but I don’t know if I would trust that. As long as it’s kind of like sitting on its own island and you can locate it very easily, then that is a lot more reliable than these sort of like fickle fields.
MOLLY: Transcripts could be the answer to our podcast metadata problem.
Transcripts are metadata. They provide machine-readable information about an audio file. They are information you provide, about your files, that will get scraped by search engines and deliver your podcast in search results.
They’re not attached directly to the file, like most metadata is. Which leaves it up to podcasters to link the audio file and the transcript together.
Most people do this by posting the file and the transcript together on their website. That’s what we do.
Having more transcripts will make podcasts more accessible. It will make them accessible to to search engines, to screen readers, and to people speaking English as a second language.
Which seems like a good thing. What podcaster doesn’t want more people to listen to their show?
Well, for Kaytlin, it actually gets complicated. She wants people to find her show, but she’s nervous about attaching transcripts.
The transcripts that would make her files more discoverable to listeners would also make them more discoverable to people who might want to censor her work.
That’s a real possibility now. Because of a law called SESTSA-FOSTA, which is cracking down on sex-work related media on the web.
KAYTLIN: The passage of SESTA-FOSTA provides the legal cover for the like seizure, erasure, locking out of a lot of this stuff. And so, I mean, there is a way to interpret through the law as it is broadly written that this podcast is quote unquote promoting prostitution, and that could justify a legal seizure and erasure of my work.
This whole episode, we’ve been treating discoverability as something that every podcaster wants. As an unqualified good thing. But we hadn’t thought about transcripts being used as tools of surveillance and policing – like they could be in Kaytlin’s case.
Kaytlin’s aware of the potential consequences. But she’s still thinking about posting transcripts. If anything, SESTA-FOSTA has just made her feel more determined to preserve her podcast.
KAYTLIN: It provides a sense of urgency behind trying to capture as much of this history as I can and trying to disseminate it as much as possible, because although websites can be seized, and audio files can be erased, you can’t take information out of people’s heads. So the more people that are listening to the podcasts, the more people that know these stories, then I imagine sort of playing a game of multi-generational telephone with these stories.
Podcast technology and search algorithms and speech-to-text software are all changing quickly. They’ll determine how podcasts get discovered in the future. We as individual creators don’t have much control over how that stuff works. We don’t even have that much information about them.
But WE do know, as individual creators, is what the podcasts we’re making are all about. We should take every opportunity we have to self-describe our content. That means taking the time to edit ID3 tags, write good episode descriptions, and make transcripts.
I, for one, hope that I can still find today’s podcasts years from now. I can totally see future Molly listening to Kaytlin wax poetic about 16th century sex workers.
KAYTLIN: My patron saint is Veronica Franco. So Veronica Franco is one of the first published women in the western canon as a poet, she was a courtesan in the 16th century Venice…
MOLLY: If you would like to preserve your podcast for future listeners, visit our website at preservethispodcast.org. Subscribe to this podcast wherever you get your podcasts. And please take some time to rate and review our show.
Preserve This Podcast is made possible through generous funded from the Andrew W. Mellon Foundation. It’s produced by me, Molly Schwartz, at the Metropolitan New York Library Council. There’s a whole team of people that make this podcast happen. There’s our project leads Mary Kidd and Dana Gerber-Margie. Sarah Nguyen is our project coordinator. Allison Behringer is our story editor. Breakmaster Cylinder composed the theme music. Dalton Harts did the mixing and mastering and the music in the outro. The musice in this episode is by Breakmaster Cylinder and Bluedot Sessions. We’ll be back in a couple of weeks with episode five. And we’re getting into… RSS.