Identifying sepsis in second-trimester pregnancy-loss complications

This is the code for a first-of-its-kind analysis ProPublica published as part of our February 2025 story “Texas Banned Abortion. Then Sepsis Rates Soared.” We found that the sepsis rate in second-trimester pregnancy loss hospitalizations increased by more than 50% after Texas’ near-total abortion ban went into effect in September 2021. The surge was most pronounced in cases in which the fetus may still have had a heartbeat when the patient arrived at the hospital.

Although the federal government and many states track severe complications in birth events using a federally established methodology, far less is known about complications that arise during a pregnancy loss. There is no federal methodology for tracking this, so we consulted with experts to craft one.

For this analysis, we purchased seven years (2017-2023) of Texas inpatient hospital discharge data from the Texas Department of State Health Services. The structure and contents are outlined in the data dictionary. Those wishing to re-create our Texas analysis will need to obtain the data from the department. Hospital discharge data is available for many states and is often structured similarly.

In addition to this repository, to fully understand our analysis, please refer to our methodology: “Texas Won’t Study How Its Abortion Ban Impacts Women, So We Did.”

Questions about the code in this repo? Email [email protected].

Step 0: Data processing and setup

We used Poetry to install Python libraries.

Before we could work with the data, we had to combine and transform the tab-delimited files we received. This repository does not include the code we used for this step, because this data transformation is bespoke to the format of the raw data.

We loaded the files into a PostgreSQL database, creating a main table called discharge with a primary key of record_id.

Each row had up to 36 ICD-10-CM diagnosis codes, including an admitting diagnosis, a primary diagnosis, up to 24 other diagnoses and up to 10 external cause codes. We removed those variables from the main table and transformed them from wide to long format, putting those rows into a table called discharge_diagnosis, linked to discharge by the record_id field.

All fields except for “admitting diagnosis” had an accompanying flag for “present on admission”; the value “Y” indicated that it was present at the time of admission. We added a “Y” flag to the present_on_admission column for all admitting diagnosis codes.

Each row also had up to 25 ICD-10-PCS procedure codes. We removed those variables from the main table and transformed them to a long format in a table called discharge_procedure, linked to discharge by the record_id format.

To connect to our database, we stored the values in a .env file in the root of the project. The env-sample file is an example of how that file must be structured for the database connection to be created in scripts/utils.py.

Step 1: Generate a coded file

In the file scripts/1_code_pregnancy_ends.ipynb, we generated a coded file with hospitalizations we were interested in and any relevant data for each hospitalization.

We first translated the methodology published by the Health Resources and Services Administration from SAS to Pandas in order to identify all hospitalizations in which a pregnancy ended and, within those hospitalizations, all rows with a diagnosis or procedure indicating severe maternal morbidity (SMM), based on the Centers for Disease Control and Prevention definition.

We then identified gestational week codes to allow us to narrow down to pregnancies that ended within a specific window, sepsis codes pertaining to abortive outcomes that weren’t included in the CDC sepsis definition, and codes for missed abortion and intrauterine fetal death — which both indicate no fetal heartbeat.

We then wrote this file out to a CSV.

Step 2: Analyze second-trimester pregnancy loss

In the file scripts/2_analyze_outcomes.ipynb, we used the coded file to identify hospitalizations in which a pregnancy ended between 13 and the end of 21 weeks and explore rates of sepsis within those rows.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
scripts		scripts
.env-sample		.env-sample
.gitignore		.gitignore
LICENSE		LICENSE
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Identifying sepsis in second-trimester pregnancy-loss complications

Step 0: Data processing and setup

Step 1: Generate a coded file

Step 2: Analyze second-trimester pregnancy loss

About

Uh oh!

Languages

License

propublica/tx-pregnancy-loss-sepsis-analysis

Folders and files

Latest commit

History

Repository files navigation

Identifying sepsis in second-trimester pregnancy-loss complications

Step 0: Data processing and setup

Step 1: Generate a coded file

Step 2: Analyze second-trimester pregnancy loss

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages