California OpenJustice
Equity & equality in the CA criminal justice system
Code for San Francisco
Active Project
Project Status: beta
inferential statistics
predictive modeling
regression analysis
hypothesis testing
criminal justice
data science working group
data science
optimization modeling
Project Lead:
Project Members:There are no members for this project currently.
Want to help this project grow? Let us know!
share to: 

CA Dept. of Justice OpenJustice Project

Modeling & Hypothesis Testing

Below, members of the Data Science Working Group have been charged with answering, via inferential statistics, some of the California Department of Justice's inquiries around criminal justice. These more pointed inquiries were inspired by the OpenJustice project's exploratory analyses at

Responsible DSWG Members:

  • Catherine Zhang
  • John Huynh
  • Saniya Jesupaul
  • Holly Davis
  • Brian Smith
  • Jude Calvillo

Status, as of September 14, 2016:

  • Prompts were verified by the CA DOJ’s OpenJustice team, and we’re now in regular contact.
  • Prompt #1 pretty much complete (Anonymous Analyst) >>
  • Prompt #2 in draft stage (Catherine Zhang) >>
  • Prompt #3 in progress but with important questions (Matt Mollison) >>
  • Prompt #5 in progress (John Huynh and Saniya Jesupaul) >>
  • Numerous outside data sources being joined, explored, and shared (Brian Smith) >>
  • Continuing to gather additional data/features for all predictive modeling prompts. The Prompts

Status, as of August 20, 2016:

The Prompts

  1. Which counties/agencies arrest African American juveniles at a statistically significantly higher rate than that of other counties/agencies?

    • Extending analysis to each ethnic group represented
    • Drilling down to felonies vs. misdemeanors
  2. For the same criminal offense, are particular ethnic juvenile groups more likely to be treated with harsher consequences by law enforcement?

  3. Statewide, what contextual and ethnic factors best predict the arrest of juveniles for felonies?

  4. Statewide, what contextual and ethnic factors best predict the arrest of juveniles for battery, specifically?

  5. For resource allocation prompt: ~. predictors to statewide crime rate (i.e. not necessarily optimization; just a first, exploratory step, probably via LM)