Eight ways the media will mess up reporting on 2021 crime data
Last year the FBI reported that its estimated national murder rate increased from about 0.0051% of the population being murdered in 2019 to about 0.0065 in 2020. This was front-page news from nearly every major news organization, with almost all having the banner headline of murder increasing by around 30%, or how this was the single largest year-to-year increase on record. Often they’d talk about rates per 100,000 rather than percent of the population number I’m using, though these are identical except for multiplying by a constant. Using my numbers, homicides barely budged from basically 0% of the population to also basically 0% of the population. Using what nearly everyone else uses - the 30% number and the rate per 100,000 - makes the change seem far more dramatic. So who’s right? We both are. And we’re both wrong.
In a few weeks, the FBI will release its 2021 data, and with it comes all the news reports about it. So, in this post, I’ll talk about how the media will mess up this reporting. To be more specific, I’m talking here about national media organizations rather than local news. Local news tends to focus on crime in their local area, which is appropriate; national news organizations nationalize everything, which is highly inappropriate when talking about crime data. And major media organizations have the resources to solve some of the issues I’ll discuss, so I believe they’re more responsible for the problems in their reporting than local news. They also have a much broader reach, so any mistakes they make have a far larger audience than local news.
So what are these problems in the reporting? Usually, this isn’t intentionally downplaying or manipulating results like in my percent of the population example. Instead, decisions may seem reasonable but are still misleading or problematic in presenting the data.
Crime data is complex enough that even academics who supposedly care about being right when talking about crime data are often incorrect in their papers, so it makes sense that reporters will also make mistakes. So these problems may not be intentional - and often are motivated by reasonable decisions and incentives, as I’ll discuss - but the outcome of misleading readers is the same. I focus on the media since they’re the group responsible for how the vast majority of people - I’d say 99% or more - get their information about crime trends. The FBI may put out the data, but nearly everyone will read the news articles about it rather than look at the FBI’s data or reports themselves.
Before I get into how the media will mess up their reporting on this data, let’s take a step back and talk about what the data is. Since at least 1995, the FBI has released an annual report detailing crime in the US, called (aptly) Crime in the United States. They do this by releasing many tables with information about crimes, arrests, and data on police. They get this information from the approximately 18,000 police agencies in the US and compile it together for national and subnational (e.g., region, state) counts/rates of crimes and arrests. Since not all agencies report data (and some report only partial data such as reporting six months of data), some (but not all) of these tables estimate the missing data. They provide a large number of tables, getting into pretty good detail on the data. Still, the table that 90% of media reports care about is Table 1, which has national crime counts and rates of the seven crimes the FBI calls its “Index crimes.” And they don’t only release these tables; they also release the raw data which is used to generate these tables. And this raw data is far superior to the tables as it has information on every agency that reported and at a far more granular level of detail for crimes and arrests. It’s hardly an exaggeration to say that using only the tables gives you 1% of the information in the raw data.
In this post, I’ll be using language referring to what the reporter does. This is to keep my writing more straightforward and because, ultimately, the reporter is primarily responsible for what they write. But to be clear, these problems also stem from editors not having high enough standards or not providing enough time or resources to reporters to do these analyses well. And these reports are incredibly consequential since they are, I believe, the way that most people in the US learn about crime trends. Given how political crime trends are, it’s important not to mess up the reporting.
As an aside, 2021 data is the first year where the FBI has only accepted data in the National Incident-Based Reporting System (NIBRS) format and not the older, more widely used Uniform Crime Reporting (UCR) Program format. I’ll discuss more about NIBRS throughout this post, especially in the estimating data section. The fundamental fact is that NIBRS is far more detailed than UCR data but has fewer agencies reporting data. Since this is the first year with only NIBRS data, I expect that every single article will talk about how this is “new” data. This is incorrect. The first year that we have NIBRS data is 1991. Agencies could submit NIBRS data for 30 years. And some agencies have been reporting data using NIBRS for decades. If you’re writing as if NIBRS is some brand-new system that the FBI imposed suddenly, you’ve messed up before even mentioning anything about the data beyond its source.
So, here are the eight ways I think the media will mess up reporting on the FBI’s 2021 crime data.
Talking only about Index crimes (also called Part 1 crimes or “serious crimes,” or “major crimes”)
Using only the tables, not the raw data
Nationalizing crime data
Talking about counts, not rates
Explaining the findings
Not talking about how the FBI estimates missing data
Treating this data as crime data rather than as police data
Acting as only the most recent year exists
Talking only about Index crimes (also called Part 1 crimes or “serious crimes,” or “major crimes”)
Open up any article about the FBI’s latest data release, and the first thing you’ll probably see is the change in murders from the past year. Next, you’ll see how overall crime - sometimes broken down by violent and property - changed. For example, here are articles in the New York Times, the Washington Post, and the Wall Street Journal, respectively, about the 2020 crime data release.
What are these “major”, “violent”, and “property” crimes? You may think it’s just the sum of all violent or property crimes for the latter two categories and is mostly violent crimes for the “major” category. That is a reasonable, but wrong, assumption. If someone punches you in the face, is that violent? If you’re a victim of fraud and lose thousands of dollars, are you a victim of a property crime? In both cases, these news organizations would say no. That’s because they go by what the FBI calls “Index crimes” which are sometimes referred to as “Part 1” and referred to as “serious”, “major” or “overall” crimes in many news reports.
These are actually just a set of seven crimes chosen by the FBI in the late 1920s because they were considered serious and well-reported. These are murder, aggravated assault, robbery, and rape as violent crimes, and theft, motor vehicle theft, and burglary as property crimes. To get the combined “major” crime category, people just sum up these seven crimes, effectively saying that each are weighed the same. And doing this is mostly just measuring theft. In 2020 the FBI estimated there to be about 7.7 million Index crimes with 4.6 million of them - or about 60% - being theft. When you think of “major” or “serious” or even “overall” crimes do you think that it’s mostly referring to theft? I don’t. Yet that’s what this measure is doing.Now, reporters (and even some academics) can be forgiven for thinking that these seven crimes are all that’s available. The FBI certainly doesn’t make it easy to find other crimes. In their table showing national data, for example, only these seven crimes are available. The same is true on their official Crime Data Explorer (CDE) website’s page on crimes though this page also includes arson.
To get simple assault data (i.e. getting punched in the face) you’d need to download the raw data and look at it yourself - and this raw data doesn’t estimate missing data. In the FBI’s older Uniform Crime Reporting (UCR) Program data, simple assault was the only crime other than the seven Index crimes available for measuring crimes. But since it wasn’t included in the FBI’s reports (for crimes, though it was included for arrests) it almost always gets excluded. But how much does it actually matter?Not including simple assault as a violent crime undercounts violence by a whole lot. In 2020, according to the UCR data, there were 1,210,712 violent crimes (murder, aggravated assault, robbery, and rapes) and 2,429,376 simple assaults.
That is, there were almost exactly twice as many simple assaults as the four crimes included in the violent crime category. Include simple assault and you immediately triple the violent crime count/rate. So it makes an enormous difference. There is certainly extra steps to take to be able to include simple assault, but it can make a monumental change in how the data is presented and our understanding of what crime levels/rates are.Simple assault isn’t the only other crime worth including, and it’s not even the only violent crime excluded. Intimation and kidnapping are two examples of other important violent crimes (with intimidation also being quite common).
But since simple assault is an extremely common crime, and is included in the exact same dataset as the Index crimes when using UCR data, it’s a crime that should absolutely be included in reports about crime data. That the FBI doesn’t make it very easy - by not including it in their annual report or website - is no excuse, but I’ll have more on that in the next section.At the absolute least, reporters need to talk about individual crimes other than just murder and define their crime categories. It is insufficient to talk about “violent” or “property” or “major” crimes without telling readers exactly what crimes are included - and what crimes are excluded. Just copying the definitions that the FBI use - which is what nearly all reporters and academics and really everyone does - is insufficient and beneath the standard of quality reporting. Most of the news articles never define their categories, never talk about crimes other than murder, or do so only at the very end of the article. For example, in the New York Times article linked above, they do talk about what is included in the “major” crimes category, saying that “Murders tend to have the most devastating impact of all crimes and to attract the most attention, but they actually constitute a small percentage of major crimes, a classification that includes rape, armed assault, robbery, and car thefts.” This is technically correct in the same way that that saying, “The United States is a country that includes Florida, Maine, and Wyoming”. You’re not wrong but are missing the most important states - or most common Index crimes like theft and burglary.
I include this as the first way to mess up reporting because it is so incredibly easy to do right. Explain exactly what crimes are included, and what’s missing, and talk about the crimes without grouping them into arbitrary categories. Defining things is not hard. Talking about crimes other than murder isn’t hard, especially since the FBI has tables on all the other Index crimes. The hurdle is so low and still so many people hit it.
You might say that I’m being unfair because journalists have strict word limits (or time limits for those on TV) so can’t get into the gritty technical details. My response is that if you don’t have enough words to define what you’re talking about - literally just listing all the crimes - then you shouldn’t publish that article at all. And if the editors don’t allow the definition to be included, that’s a failure on their part. Likewise, you shouldn’t be grouping together crimes in categories with such a wide range in severity - theft does not belong in the same category as rape or murder. Talk about crime trends separately. If you need extra space in the column, cut the quote from the talking head. And let’s not pretend that these space limitations are completely immutable. They’re created by people who make decisions about what is important. When a news organization (or specifically, the editors and leadership of these organizations) makes the decision to be vague or leave terms undefined or not talk about specific offenses, these are all decisions that show what’s important to them. All of the major media organizations have now spent tens if not hundreds of thousands of words about Queen Elizabeth’s death. Surely they can spare a few hundred to report crime data properly.
To make this a bit simpler, consider if the news reported that someone had been arrested for a crime. Your first question would be “What crime”? “Crime” is so incredibly vague that it’s useless. Does “crime” mean that you shoplifted or that you massacred an orphanage? Reporters need to be specific.
Using only the tables, not the raw data
If I gave you a smartphone and you only used it like a corded phone (even down to only using it when it was plugged in!) you’d probably consider that a huge waste of potential. If I gave you a cookbook of your favorite recipes and all you did was rely on the photo on the cover, you’d also consider that a waste. A similar kind of waste is going to come out of reporting on the FBI’s 2021 data. The FBI’s release every year is a series of tables (think, Excel files) with information about crime and arrest data.
This is usually using national and state estimates as well as data on city-level agencies. This includes data on crimes, arrests, police employees, and breakdowns of certain crimes such as homicides by victim demographic. It’s not super clear what tables will be available for 2021 but we can expect fewer tables than normal. To this I say, if you rely only on the tables, you’re ignoring 99% of the data.The power of NIBRS data - the beauty of this data - is that you can dive very deep into every single crime incident. If you want to know how many, for example, Asian women were seriously injured in a domestic violence assault that occurred outside their home, you can get that information using this data. But you can’t get anything close to that if you only use the FBI’s report. This isn’t dinging the FBI for anything, there’s a serious limit to how much you can present in tables without ending up with hundreds or thousands (or hundreds of thousands) of tables. And they release the data allowing for this full dive anyways. As far as I’m aware the FBI will be releasing the raw data at the same time as their full report.
Even pretty simple questions quickly go beyond what’s available in the normal FBI report. For example, the FBI’s 2019 report had numbers of homicide victims by demographic (age, gender, race) but no interactions. If you wanted to know how many women were killed, you could get that. The number of people older than 24 was also available. Want to know about the number of women older than 24 that were murdered? You’re out of luck. You need to download the data and examine it yourself to answer that question.Handling this data is not trivial. NIBRS data is pretty tough. But when this data is available, and when it’s absolutely crucial to answering even simple questions (how many women older than 24 were murdered), it is necessary for news organizations to use this data to answer these questions. You may say that I’m being unfair. Journalists shouldn’t be expected to be data scientists, especially since NIBRS data is complex to use. I agree, this really isn’t the responsibility of the journalist. But it is the responsibility of their organization. News organizations should employ data scientists to deal with complex data - NIBRS data and other data - or work with consultants (paid or unpaid) for stories like the crime data release. They certainly do this for some projects and I’m aware of at least a few reporters in news organizations such as Andrew Ba Tran at the Washington Post and Weihua Li at the Marshall Project who already have the data skills to work with this data. So there is precedent for this, and major news organizations definitely have the money for it.
This is also a relatively easy task since NIBRS data comes out at around the same time every year and is identical each year.
So all the code written to make graphs or tables for previous years (i.e. 1991-2020 which are all data currently available) will work on the 2021 data. Other than checking the new data, there doesn’t have to be any rush when dealing with new data. Write and test the code using old data, rerun it when 2021 data comes out and you’ll have results as fast as your computer can process it.I’m also pretty baffled at why any news organization employs people who mostly (or entirely) just quote from a government report for their story. Just trusting the government report (or anyone, and I’ve seen a lot of articles about academic research that just quotes the authors of that papers, which is just as bad) is not journalism. This isn’t because I think the FBI is releasing maliciously bad data or manipulating what they release. It’s because all data and all research are flawed - sometimes by a little but sometimes by a lot. Any reporter worth their salt will do more than just trust that what someone is saying is true. And even if the reports are perfect, they’re only a tiny subset of the data available. This was true with the UCR data and is especially true with NIBRS data. My advice to these organizations: Fire any reporter who stops their investigation with the report and save your money since all they’re going to do is parrot from the government. I’m not trying to be rude, but when there is the incredible wealth of information that is NIBRS data (and this is true even with how much data is missing) and all you do is rely on a few dozen numbers from an official report, well how do you justify that? It’s just lazy. It’s a waste of excellent data and a waste of the talent that I believe a lot of reporters have.
If I can build a website to do a lot of this deeper dive than the tables (but still more shallow than the raw data) over several days when I was bored, without any money or help, major news organizations who have annual profit in the tens or hundreds of millions can do the same. I don’t say this to brag about my site - in fact, I think it has a lot of limitations. I say it because I know the potential for these organizations to create great tools that let normal people - people who don’t have the time or the skills to analyze the data directly - try to understand the data and what it means for their (and their loved one’s) safety. Every year I hope some organization will create a tool like this. And every year I’m disappointed. And if I can do it, so can well-resourced organizations. Or if they can’t, hire someone who can.
“But wait!” you may say, “you forgot that the FBI has a website allowing this kind of deep analysis of the data. Your own site isn’t the only NIBRS site out there. Stop lying!” Yes, this is true. The FBI does have an official site that shows time-series graphs (and includes downloadable tables for each graph) which is similar to my own site. The main difference between my site and the FBI’s site is in how much of the data is available. On the FBI’s site, data is only available since 1985, which is incomplete since UCR data is available as raw data since 1960.
And take a look at which crimes are available. Only the seven Index crimes. This isn’t even complete data for the more limited UCR as the raw data has simple assault and breaks down certain crimes such as robbery and aggravated assault by weapon used and rape by attempted or completed. There’s no reason to limit what’s available. My own site, with a budget of $0 and which was created only by myself, has all agencies, all years (and has monthly data available), for all crimes.One concern with the FBI’s site is how it handles missing months of data. Or, to be more specific, that it doesn’t do this. Consider, Philadelphia. According to the Philadelphia Police Department, 2019 and 2020 continued the trend of past years (though with far sharper increases than before) of rising murder counts, reaching 356 and 499 murders in 2019 and 2020, respectively. So things were getting worse before 2019 and kept getting worse for 2019 and beyond.
Below, I show how the Crime Data Explorer shows murder counts in Philadelphia. I use all the default settings and choose Philadelphia Police Department as the agency and murder as the outcome. According to this graph, things are getting worse starting in about 2013 or 2014 and then getting much better in 2019 and 2020. 2019 had 265 murders while 2020 had 201. Since it’s voluntary to report to the FBI some agencies don’t report at all, and others do so partially. In 2019 and 2020 Philadelphia reported only partial data, which is why the FBI records such low murder counts in these years. The obvious solution is to exclude years with only partial reporting, so that it’s a good apples-to-apples comparison over years, or to indicate quite clearly which years have only partial data. This site does neither. As far as I am aware there’s no place on the site that indicates which years have fewer than 12 months. That’s available if you look at the raw data directly, another reason to do so. This isn’t too concerning since agencies tend to report all 12 months or zero months but it is still a problem and as seen with Philadelphia it can affect very large agencies.
My own site keeps only agencies that report 12 months of the year, avoiding this issue.
It also includes monthly data as an added check. My own site is, I think, a better source for agency-level monthly and annual data for NIBRS than the Crime Data Explorer. But it’s also only a fraction of what’s fully available in the raw data. So it can be useful but it’s not a replacement for looking at the data itself. Using either my own site or the FBI’s site is far better than relying on only the tables released in the report. And news organizations should make their own data tools that focus on the parts of the data that they think are most important or relevant to their readers.There’s really no good excuse to only use the tables the FBI includes in the reports and not their actual data. One potential argument is that these tables will have national or state-level data, and that’s what really important to report. As I’ll explain in the next section, that’s a worse argument than you may think.
Nationalizing crime data
Crime is local. Policing is local. Even seemingly national trends like Covid are actually local - pick any day or week and Covid will be affecting some places much harder than others; it is not the same across the country.
So when you’re only talking about national crime trends, you’re really misrepresenting the data - or to be more generous, are giving only a small picture of what’s going on. Given how local crime data is, national rates are almost never the same as local rates. For example, the national murder rate in 2020 was about 6.5 murders per 100,000. In Philadelphia it was about 32; in New York City it was about 5.6. So even in large cities, it can be vastly different than national rates (and 5.6 is about 16% smaller than 6.5). Even trends change over time. Just take a look at examples like Philadelphia which has seen increasing violent crime in recent years (even before 2020) while much of the country has had a decreasing trend.I get the motivation behind talking about national crime numbers. For major news organizations, the audience is going to be national, and readers expect national information. So by all means report these national numbers. Just don’t expect it to mean very much. And don’t stop at the national counts. You need to look further. News articles have word limits so obviously you can’t just list out crime data for a bunch of cities - though you can make data tools that do this.
But it is possible - even with all the missing data (which I talk about more below) - to talk about smaller areas than the entire country. Look at certain types of agencies (e.g. small agencies, university police departments, etc.) or show crimes by victim, offender, or incident characteristics. The more detailed you get the better. Getting more detailed comes with the caveat that what you’re looking at may not be representative outside of your limited scope of inquiry. For example, if you look at crime in small towns, that’s probably not going to be the same as in big cities. That’s okay. Not everything needs to be representative. Looking only nationally is a disservice to readers who care about crime in their community.Let me give an example using Covid numbers. The New York Times (and every other major news organization) has an interactive graphic that lets users see current and historical Covid information (case counts, hospitalizations, deaths) in their county. Since it gives county-level information readers get a much better understanding of their personal risk since what’s really important is what’s going on where they live. If they’re deciding if they should wear a mask then the case counts in their county matter a lot more - and is probably very different - than whatever is happening nationally. The same concept applies to crime. Crime, like Covid, is local so we need local information to understand what’s going on and to make good decisions.
Talking about counts, not rates
The CDC estimates that the Spanish Flu (also known as the 1918 flu pandemic) killed about 675,000 people in the United States. According to a New York Times tracker, Covid has killed a little over one million people in the US. So clearly, Covid is almost 1.5 times as deadly as the Spanish Flu. The extremely obvious problem with this analysis is that there are a lot more people in the US today than in 1918. Controlling for population, Covid is only about half as deadly as the Spanish Flu (when using a super simple deaths/population measure). So if someone told you that Covid is worse than the Spanish Flu because it killed more people your likely response would be: “Counts are dumb, you need to talk about rates”.
This is the right response and is one that you should also have when talking about crime counts. And it’s pretty common for articles to use counts. For example, the New York Times article about the 2020 data release leads by saying that “there were an additional 4,901 homicides in 2020 compared with the year before, the largest leap since national records started in 1960”. Besides the fact that in 1960 there were only about 7,500 agencies (out of ~18k) reporting, using counts completely ignores the fact that populations change. The number of murders can increase every year and cities would still be safer if the population increased faster - though population alone is a poor denominator for making a crime rate, as I’ll discuss below.
Let’s look at an example of how using counts and using rates give very different understandings of what’s happening to murder in a city. Counts vs rates don’t matter as much when looking at years close together, such as the previous year or the year before that, but gets worse when looking a decade or multiple decades apart as the population can be drastically different. And of course, population is people who reside in the jurisdiction even though some places have considerably more people who spend time there than live there, such as in Las Vegas, any major city, or vacation spots.
In the below graph, we have murder counts in Las Vegas. There are data available since 1960 and the consistent story across the 60 years of data is that murder is getting worse. There are certain periods of decline but overall the number of murders is increasing. We started with the number of murders being in the single digits in the 1960s and ended with over 100 murders a year on average in recent years.
Vegas seems indisputable to be a dangerous city that’s gotten worse for many decades. Now let’s look at the murder rate per 100k population, as shown below. Now it’s a very different story. The murder rate increased until the early 1980s and then had a fairly consistent trend of decreasing rates over the remaining period of data. Even 2020, which has about 14% more murders than 2019, has one of the lowest rates on record. With a murder rate of below 6 per 100k people, Las Vegas ranks as one of the safest big cities in the US. At least when just looking at rates per total population in the city.
Rates are certainly better than counts because they adjust for the fact that crime counts are largely a function of population.
But even rates are a pretty imprecise measure because crime affects certain groups differently than others. For example, sexual offenses disproportionately affect women; murder disproportionately affects Black people; violent crime in general disproportionately affects men. Just using total population - while better than nothing - still gives crime rates that are pretty far off for any particular victim group (e.g. groups broken down by race, by sex, by race and sex). This is similar to the issue with using national data as it hides the major differences by subgroups. Using a rate per 100k people basically just assumes that crime is completely random and that there is no variation in the likelihood to be victimized by group.For example, according to NIBRS data, an astonishing 98.5% of rape victims were women in Denver in 2019.
Since women are about half the population of Denver, the rape rate per 100k people - the typical way of measuring rates - is massively off the true rate of crime when considering the likely victims. The rate for men is approximately zero; for women, it is double the traditional rate per 100k. One of the great things about NIBRS data is that you can dive deep into victim (and offender) characteristics and make these kinds of more precise measures of crime, adjusting for important traits. It’s certainly not perfect, especially if there are differences in reporting by victim demographics - but it’s a whole heck load better than the current rate per 100k people. Again, this is a reason to use the actual data.Explaining the findings
The early 1990s were the high point of crime in the US and then saw a pretty steady decline that lasted until 2020. This happened for most crimes and in most places. Why did it happen - and for that matter why did it go up so much in the 1980s? That’s a matter of great debate in criminology and related fields. It’s led to a large number of papers on the topic - the most well-known is probably this paper by Steven Levitt - but hasn’t produced a consensus on exactly what the causes were.
Given that after decades of research and debate, there’s still no consensus on the 1990s crime drop you may think that reporters and researchers are cautious about explaining current crime trends. But of course, they’re not.To be clear, I’m not talking about explaining how to understand what is in the data or providing context about it. For example, descriptive information about how the data was generated (e.g. what data is available, how it was collected, whether missing data was estimated) or what is in it such as breakdowns of victim demographic or incident characteristics (e.g. how common guns are used as a weapon now and historically, how often victims are injured). Explaining the “what” is good. Explaining how trends have changed is good. But explaining the “why” is generally quite bad.
Almost every article about this data release has quotes from “experts” who explain why crime has changed. These experts are generally researchers such as criminologists but also members of advocacy groups, or the leadership of police agencies (usually police chiefs or Sheriffs).
There are really two ways that reporters talk about the “why” in their articles: 1) quote an expert for why crime is changing, and 2) say that some unnamed “researchers” say that XYZ caused the decline/increase. Let’s look at examples in turn.
The first way, quoting experts, is probably the most common. For example, here’s part of the Washington Post’s 2020 crime data reporting: “On Monday, gun control advocates said a large increase in first-time gun owners around the start of the pandemic probably played a significant role in the rise in shooting deaths. ‘We know having a gun in your home, having a gun in public, makes you less safe and more likely to be a victim and perpetrator of gun violence,’ said Ari Davis, a policy analyst at the Coalition to Stop Gun Violence”. Note first the lack of any citation for that claim (which as I note below may not be the fault of Mr. Davis). Even if we assume that the quote is correct - and my point here is about backing up claims with evidence only, not taking sides on any explanation - it’s incomplete. This is trying to explain the large increase in homicides in 2020 compared to 2019. It certainly cannot explain all of the change - and I am not saying that Mr. Davis was trying to - but as it is written that’s what it appears to do. Even the sentence before it, which is not a quote, says that these unnamed “gun control advocates” argued that the increase in first-time gun owners “played a significant role in the rise in shooting deaths” [sic]. Did the increase in first-time gun owners cause an increase in murders? Potentially to some degree, but there’s a huge range between explaining 0.01% of the increase and explaining all of it. Effect sizes matter!
Here’s the problem. There are a lot of causal relationships that affect crime. For example, when it is hotter than normal violent crime goes up (even beyond the more crime in summer, less in winter relationships). So if you ask me why murders went up in 2020 and I say that 2020 just had an atypically hot summer, I’d be completely right. And actually would have a lot more evidence to back up my claim than other proposed causes. 2020’s hot summer probably caused an increase in murders and other crimes. If I told you that the temperature caused a “significant role” in the increase in gun murders you’d probably ask me for evidence (or laugh in my face). This is the kind of skepticism you should have with everything. Claims without evidence - without direct evidence! - are not worth very much.
If an article lists three reasons why, for example, murders are increasing, how should readers think about that? To me, if you don’t explicitly say that these are only some of the explanations, those three are used as the complete explanation for the cause of the murder increase. Even of those three, which one matters the most? How much of the increase does it explain? These are simple questions that are never answered in these articles. It’s almost impossible to properly measure how much one cause affects an outcome, which is a reason to include this uncertainty in the article and not treat it as a bygone fact that this is the absolute cause without question. And let’s not pretend that incorrect or unsubstantiated claims are harmless. When a researcher is quoted in one of these major publications about why crime changed that’s probably more impactful than the sum of their academic research. These claims will likely impact the crime debate and what policies are used to address them - and this is true even if they turn out to be wrong.
The second way that reporters explain crime trends is the worse one. Here’s an example from the Wall Street Journal about the 2020 data: “Some researchers blame frayed relations between law enforcement and Black communities after high-profile police killings, such as that of George Floyd in Minneapolis. Others point to the stress from the Covid-19 pandemic and the temporary shutdown of the court system”. The only link in that paragraph was to an article about George Floyd, not to any evidence supporting these claims. The “some researchers” part is the kicker. You can find “some researchers” to say anything, especially if they’re anonymous. In political science, for example, a tenured professor says that he has seen aliens, Atlantis, and the crucifixion of Jesus Christ. The American Society of Criminology has invited a professor to speak about Martians at their annual conference, again.
Eric Stewart is still a tenured professor at Florida State University even after he had six papers retracted (so far) because he made up data and was supported by an actual conspiracy by journal editors to ignore or downplay the claims of fraud (see here, here, and here). Credentials are not enough. You can find some anonymous researchers to say anything - assuming of course that this quote is based on interviews and not just reading some researchers' tweets or skimming research article titles. There is an incredible amount of bad research on crime, you can’t just list out some ways that may be related to a crime trend without any evidence.Now, these issues aren’t always the fault of the expert.
I’ve done several news interviews and I may give a long explanation and have the quote included in the article be only a sentence or two. Explanations for crime are complex and usually come with many caveats; what’s included in news articles is usually confident and simple. And even when the expert talks about specific research, that’s almost never linked in the article. This would be an easy fix if reporters would ask these experts to provide citations for these claims and then include links to those articles in the paper. The audience for the paper would probably not read the research articles - or understand them even if they do - but they should be included anyways. Claims need evidence, and an “expert’s” credentials is not evidence of anything. A claim without anything backing it up should be readily dismissed.You may be wondering what evidence I want to see to support these claims. My standard is pretty low. Show me some research that has a causal relationship between the cause and effect of what you’re claiming.
For example, if you say that hotter summers lead to more murders, show me a paper with a causal design proving that. Correlations are not enough. And then demonstrate - or at least argue - why the findings of this study apply to the current trend. Ideally, the research would be about this exact trend. It would use data from the year in which the FBI is releasing the report and study the exact relationship you are arguing exists. You might say that it’s an impossible standard to expect research that requires data that isn’t released until just before the news article is published. To this, I say that if it’s impossible to expect to do research then it’s also impossible to offer claims of causes of crime trends. And also that you’re wrong.How are you supposed to do research before the data is available? You can’t. At least not fully. But there is data available before the FBI releases it. For example, Jeff Asher and Ben Horwitz have a data tool tracking in near-real-time the number of murders in over 90 cities in the US. The Council on Criminal Justice has a recent report on crime trends in 29 cities that provide “incident-level data in near real-time on their online portals”. There are data available. And in my experience (though mostly limited to larger agencies) police agencies are willing to give you recent incident-level data if you ask or FOIA them. Research is possible before the FBI releases its data. It’s going to be a small and unrepresentative sample of the population and is a lot more work to gather and clean than just waiting for the FBI data, but it is possible.
Treating this data as crime data, rather than as police data
I’ve talked about this data as “crime data” but a more specific term would be “police data” or really “crime reported to the police which the police reports to the FBI”. There are some major distinctions in police data compared to the ideal crime data. The ideal crime data has all crimes committed with information about each incident. The data we get is what’s reported to the police and then passed along to the FBI, with large dropoffs of information at each stage. I’ll talk about the data not reported to the FBI in the next section so here let’s talk about crimes not reported to the police at all.
Most crime is never reported to the police. In criminology, this is often referred to as the “dark figure” of crime. We know this because of the National Crime Victimization Survey, a survey that asks a nationally-representative sample of people if they were the victim of a crime and, if so if they reported that crime to the police. The below table is from the Bureau of Justice Statistics report on the 2019 National Crime Victimization Survey and the Full Report is available here. This table shows the percentage of victims for a number of crimes that were reported to the police.
More serious crimes are generally better reported with violent crimes being more likely to be reported to the police than property crimes. There are also differences over time. Even though this table only shows two years, there are significant increases in reporting for intimate partner violence and significant decreases in robbery. There are also differences in reporting by victim demographic and location, though that’s not shown in this table. One notable thing about this table is that motor vehicle theft has only about an 80% reporting rate even though many criminologists will cite this as a reliable measure as it is so highly reported. 80% is certainly much better than other crimes but missing 20% of data is still a pretty big gap.So how should the media talk about this under-reporting due to people not reporting crime to the police? You could take the NCVS numbers from 2019 or 2020 (or even 2021 which was recently released) and apply them to the national data. For example, the 2019 data says that 48.5% of burglaries were reported to the police. So you could just double that number of burglaries that the FBI says occurs and you’ll roughly account for non-reporting to the police. But this will be a very rough measure, especially since it’s an estimate that would adjust the crime data which is also an estimate. And it requires looking at only national or large subnational data since NCVS doesn’t have local data (at least not in its public files and even the private data is highly limited). I have a proposal for how to expand this data, available here, but that’s no solution for this year’s crime data. And there are differences in reporting by crime and victim characteristics. The power of NIBRS is getting into the details about these characteristics, so I think it’d be a disservice - and misleading - to apply broader NCVS reporting rates to small categories of NIBRS crimes. The best solution, I believe, is to just mention that the reported crime is only a subset, and sometimes a small one, of all crime.
And that readers should understand that what’s reported may not actually reflect true trends. For example, consider an agency that had 1,000 actual (that is, reported and non-reported) rapes in 2020. And we’ll assume that 34% of victims reported to police, which is the share according to NCVS in 2019 (rounded up from 33.9). So 1,000 actual rapes and 340 reported rapes. In 2021 let’s assume that rapes increased by 9% to 1,090 rapes and reporting decreased to 31%. That leaves us with 338 reported rapes, a small decrease in reported rapes even though actual rapes increased quite a bit.
A related point is that since it’s police data we don’t have information about the other stages of the criminal justice system: courts and corrections. There’s one exception to this. Police may consider an incident cleared (essentially whether they close the case or not) without an arrest being made if the prosecution declines to prosecute the case (though they still need to positively ID the suspect) so in these rare cases we do know information about prosecution. Otherwise, we have no information about what happens after the arrest is made. This is important because the police are only one actor in the criminal justice system. We need information about prosecutions and incarceration to really understand crime trends. If two police agencies each make arrests in 50% of cases but one of these agency’s prosecutors convicts 50% of these while the other convicts 100%, that’s important information to know. Of course, it alone won’t tell us everything about these arrests or how effective the criminal justice system is, but it’s an important piece of the puzzle.
This isn’t a call for the FBI to start collecting new data, but it is something that reporters should consider including in their articles. Police data is only part of crime data, and crime data is only part of criminal justice data. Especially in light of how much data will be missing this year (more on this in the next section), it’s important to note that we have a lot of police data but almost nothing from the courts/prosecutors and little detailed data from prisons (though massively more than from prosecutors). And keep in mind that all FBI data is completely voluntary - at least nationally though some states do require their agencies to report. How many prosecutor’s offices voluntarily release standardized detailed data comparable to the FBI crime data? Almost none of them. Prison data is a bit weirder since all - I believe, at least the vast majority - prison agencies released aggregate annual reports. But then you get seemingly random states giving a lot of information about prisoners, such as this site from Kentucky. Sure, it’s not as standardized as FBI data but even the worst prison data is better than practically all court/prosecutor data.
Not talking about how the FBI estimates missing data
The report that the FBI will release will use an estimation method to deal with missing data. They estimate data every year but in 2021 will use a new estimation method and will be estimating far more data than ever before. Since the FBI is only accepting NIBRS data, and no longer accepting UCR data, far fewer agencies reported data.
The FBI has said that “Agencies contributing NIBRS data covered 65 percent of the United States population, compared to the 95 percent of the population covered by adding the converted data of NIBRS contributors to SRS data in prior years.” This is only partially correct as it treats agencies reporting any amount of data to be a reporting agency, even though partial reporters (e.g. report fewer than 12 months of data) still need their missing data estimated. According to FBI data released by the Marshall Project, 2021 NIBRS data will have full data from agencies covering 52% of the population, partial data from agencies covering 12.6% of the population, and no data at all from agencies covering 35.3% of the population.Their method of estimating the missing data is a joint project between the FBI, the Bureau of Justice Statistics, and Research Triangle Institute (RTI) International which is a research think tank. At the time of this writing the FBI has not released detailed documentation about their procedure. What they have released so far, available here, provides only a vague overview of what they plan to do and has no information on how well their estimation procedure works. There’s also this RTI/BJS/FBI report that talks in more detail about exactly what variables will be estimated. But again does not explain in any detail how the estimation process works or how good it is.
In this report, they do say that “A description of the data quality review process and any publication criteria applied to the estimates will be provided in conjunction with the release of the data”. So we should expect something about the estimation procedure when data is released, though it is unclear if this includes exactly how the estimation works or what the quality review process entails. Later in the report, they say that “Along with the methodology, documentation will be provided to help users understand the quality of the estimates and allow for the expansion of the set of indicators from which estimates are produced as more agencies transition to NIBRS”. So I think that this documentation is what they’re mentioning will be released alongside the data, but we’ll find out for sure when the data comes out.
Since they have yet to release their detailed documentation, I won’t talk about my opinion on how good it is and how accurate their estimates will be. But let me be clear, their estimation will be wrong. Every estimation - no matter how large the sample is or how representative it is - will be wrong. That’s just a function of what estimates are. They try to be as close as possible to the true number but will inevitably be off by some amount. So it is essential that every news article talks about the fact that a lot of this data is estimated and how exactly that estimation works. And talk about how far off that estimation is from reality - as any documentation worth its salt in an estimation procedure will demonstrate how close it is when applying the same procedure to data where the “true” number is known, such as using the method in past years. I have been personally assured that this is one of the validation methods that RTI used in developing their estimation procedure, so I expect to see it in the detailed documentation. If these kinds of validation exercises - these proofs that the estimation method works - are not available when the data is released, I recommend that you do not use any estimated numbers from the FBI’s report. Even to use their 65% of the population number, estimating missing data from a non-random 35% of the population is a Herculean task. Extraordinary claims require extraordinary evidence
This is absolutely critical - and something that is almost never mentioned in news articles about past FBI releases, even though every year does use estimation, though never as much as for 2021 (most recent years have data from agencies covering 90-95% of the population so the estimation never really matters).
This is where you, the reporter and editor, need to call in experts to talk about this. Forget the talking heads pretending to explain why crime changed. Understanding exactly what the data is and how accurate it is literally the first step to doing anything with it. Talk to some statisticians or others who can understand the methods behind the estimation process and explain how good (or bad) it is.Estimating missing data for crime is notoriously difficult. Many papers have been written on previous methods to impute missingness from crime data. The most famous is probably from Drs. Maltz and Targonski who wrote that the county-level missing data estimation method developed by the National Archive for Criminal Justice Data (NACJD) “should not be used, especially in policy studies” because of all of the problems with the method.
Of course past methods having problems does not mean that the current method is flawed. If anyone can accurately estimate missing crime data I think it’s RTI. But even if this was an easy task - and all evidence says the opposite - when a large amount of data is missing (and agencies that report are certainly not a random sample of all agencies) then the method of estimating that missingness is a fundamental part of reporting on that data.Consider, for example, if the New York Times reported on a poll that said that only 40% of eligible voters would vote for President Biden in the 2024 election. They’d be absolutely remiss if they didn’t mention how the poll was conducted and who was surveyed. If it later turned out that the survey was a non-random sample of people in West Virginia (which Biden lost with only 30% of the vote in the 2020 general election) and the New York Times treated this as national data, readers would rightly be outraged. Even if the people who conducted the poll handled missing data so well that the 40% number was actually accurate nationwide - and even if the polling group was entirely transparent about what they did - it’d make the New York Times appear to be publishing bad data. This is an exaggeration as the NIBRS contains a lot more than just a single state, but I think it’s an illustrative example. The NIBRS data is going to be flawed - this is just a function of how much missing data there is. It does a disservice to the data and to the news organizations themselves by not being transparent about how the data was created, including how missingness was estimated.
I understand the incentives for news articles to be simple. To say that this is the number of crimes in 2021 and things are getting better/worse compared to last year. That’s not going to happen this year. We’re going to a different, far more complex data source with a massive amount of data being estimated. This will be the most un-simple year of crime data (so far!).
And if you report it like it’s simple, you’ve fucked up. You’ve failed at your primary goal as a reporter which is to tell the truth.I leave this as one of the later ways to mess up because the estimation is used for national and subnational (e.g. regions, states) estimates, which I’ve already said are flawed units when talking about crime. And when you use raw data nothing is estimated so it avoids this problem entirely. But articles will inevitably be using this estimated data - and probably only use this data - so it is worth discussing.
The estimated data will also come with confidence intervals, but I have absolutely no advice on how to teach the average reader how to interpret confidence intervals properly. To all the reporters who will have to, you have my deepest sympathy.
Acting as only the most recent year exists
Most of the news reports use this data as follows: The FBI reported X murders in CURRENT-YEAR, a Y% change from PREVIOUS-YEAR. In most cases year-over-year changes are not that helpful as things tend to stay the same over a two-year period - though obviously, some years are really weird like 2020. Crimes can also fluctuate a fair amount over a single year without meaning anything. For example, below is a time-series graph showing murders in Detroit from 1960-2020. In 1991 there was a 5.6% increase from the previous year. Does this indicate anything meaningful? Potentially, but it also misses the context that it’s following a multi-year decline that started in 1988. Since we have data through 2020 we can also see that it was a single year of increased murder and then continued the trend of declining crime for the next two decades. Likewise, we can see that murders went up and down in the 1980s but for most of that decade, the trend was an increase in crime meaning that single-year comparisons were not very helpful.
To really be useful the articles need to compare this year’s crime data to more than the previous year - or even the year before, as I suspect many articles will compare 2021 to 2019 data. This means including a time-series graph showing multiple years of data to put the numbers in context. And some articles do include these kinds of graphs, which is great.
Though this invites another question: which years to include? Include data from the late 1980s or early 1990s and you include the peak of crime in the US historically, making any change in crime seem small.
Even the 30% increase in the murder rate in 2020 compared to 2019 (or really compared to any of the past several years) looks small compared to the national murder peak in the early 1990s. Include only the past decade - or even two decades - and it may seem to exaggerate any crime increase as it’s starting at a much lower base rate. There’s really no good answer to this but anything is better than just looking at the one or two years before.The FBI should release data in October.
I generally hate the word “problematic” since it’s usually just used as a vague “you did something I don’t like” but I’ll try to be specific in this post.
“Index” is also sometimes spelled lowercase. Both ways of spelling are fine.
Data on arrests have more crime categories available, partly a function of how the data is collected.
This is certainly an undercount as not all agencies reported, and some agencies reported partial data.
Though if this trend is consistent over time then the violent crime changes over time will not change, but of course, this is an empirical question and not one that can just be hand-waved away by assuming that everything is consistent over time or between places.
Neither intimidation nor kidnapping is available in UCR data, but both are available in NIBRS.
Arson is included in a separate UCR dataset.
Rape, aggravated assault (which includes more than just armed assault), robbery, and motor vehicle theft make up about 27% of Index crimes in 2020.
It’s actually kind of unclear exactly what national data will be available. An FBI press release says that no national tables will be available but the FBI, alongside the Bureau of Justice Statistics and Research Triangle Institute (RTI), is working to estimate national data. So I do believe that national data will be available in the normal FBI release.
I do not believe that city-level data is estimated.
As of a few years ago, you could download the raw data from the FBI’s Crime Data Explorer website. Before that, you had to request that they mail you a DVD with the data.
This is technically not true as some variables in NIBRS change over time, but this is quite rare. So in effect, data from any given year have the same variables and values as any other year.
NIBRS data is available since 1991 so years prior to this have zero data on the site.
To be clear, a budget of $0 means that I don’t spend any money to host or maintain the site and that I’ve never received any money for the site. I do spend $12 a year to purchase the domain name crimedatatool.com.
Though the way to measure the number of months reported is actually fairly tricky, especially when it comes to small agencies with few crimes.
This is also relevant for the dozens (hundreds) of Covid and crime papers that (very stupidly) treat Covid as a national trend. Just because Covid restrictions tended to start nationally at around the same time (mid-March 2020) doesn’t mean that behavior changed equally across the country. And behavior seemed, from my observations in Philadelphia and Texas, to be more a function of local case rates than any rules in place.
Or just use my site.
Researchers should also follow this and look at crime outcomes in places other than Chicago, New York, and one of the other largest cities.
Large cities will almost always have more crimes than small cities, even if they’re in fact safer than small cities when adjusting for population.
I say women but the more precise wording is females of all ages.
An economist having the most popular paper is typical.
Or at least the researchers willing to talk to reporters and give explanations aren’t.
To be fair, they have a 100% acceptance policy, so no standards at all.
And please note that I’m not saying that any proposed explanations are right or wrong. I really don’t care. I care about supporting claims with evidence. That should be the absolute lowest standard to explain anything in a news article.
It probably rare is.
This is a pet peeve of mine because I prefer that researchers deal in fact and not guesses or hunches.
As an aside, if any major criminal justice funder wants to solve the problem of having to wait for FBI data, having funding to gather, clean, and standardize the data that is public is really low-hanging fruit. I think people don’t do it because it is such a hassle to do and there’s no real reward to doing so.
Note the single-year comparison.
I don’t want to get into the fight about who is responsible for how much missing data is available - and people seem to mostly put the blame on the FBI for switching to NIBRS starting in 2021, during Covid. But I will say that NIBRS data has existed since 1991 so every agency in the country had a full 30 years to transition over, and the FBI announced in 2016 their start of 2021 deadline.
One thing I’ve been struck by when reading news articles about the FBI data release is how incredibly uncritical - bordering on naive - many reporters are about this data. There are so many flaws with crime data and reading these articles will tell you none of them.
Most of the field of criminology has taken these studies to heart and largely avoids using county-level data - other fields such as economics have ignored it.
2020 had more missing data than normal but not very much and it still used UCR data as an accepted source, so the data release was very similar to previous years. Crime trends certainly differed from previous years but the data release itself - that is, how much data was missing, what the sources of the data were, and how the FBI reported it - was normal.
If you think that the primary goal of crime reporting is actually entertainment and to sell advertising space, I appreciate your cynicism. Welcome to the club.
Historically means since the 1960s which is the earliest year available for UCR data in a machine-readable format.