methodology
Reading the BTS T-100 Dataset — A 10-Minute Guide for Journalists and Analysts
Last updated
The U.S. Department of Transportation Bureau of Transportation Statistics T-100 Segment dataset is the most comprehensive public source of commercial aviation traffic data in the world. It is also one of the most easily mis-read. This guide is the practical onboarding we wish existed when we started building dashboards on top of it — what T-100 actually measures, the joins that unlock its real value, and the ten common misreadings to avoid before publishing anything based on it.
What T-100 Segment actually is
T-100 Segment is the U.S. DOT Form 41 Schedule T-100 reporting requirement, applied to scheduled and non-scheduled commercial aviation. Every U.S. carrier — and every foreign carrier operating to or from a U.S. point — is required to report monthly per-segment traffic figures. The dataset captures origin and destination airports, aircraft type, scheduled departures, performed departures, available seats, transported passengers, freight and mail metrics, and a handful of other fields. The reporting threshold catches essentially every commercial flight that touches U.S. territory.
The dataset is published by BTS at transtats.bts.gov on a quarterly cadence, with each quarterly release covering data from approximately three months earlier. The annual aggregate (the file most analysts download) ships in mid-spring covering the prior calendar year. CSVs are downloadable directly without registration; the dataset is U.S. federal public-domain and free to redistribute with attribution.
What it does and does not measure
T-100 measures the segment — the leg between two airports, regardless of whether passengers booked the flight as a single segment or as part of a longer itinerary. A passenger flying JFK-LHR-DOH on one ticket appears in two segments: JFK-LHR (counted by the operating carrier) and LHR-DOH (counted by the operating carrier of that leg, which may be different). T-100 does not connect the legs into the original itinerary; that mapping requires the separate DB1B Origin and Destination dataset.
T-100 measures the operating carrier, not the marketing carrier. A flight sold by American Airlines but operated by British Airways under a codeshare appears in T-100 as a BA segment, with passengers credited to BA. For brand-share analysis where the marketing carrier matters, you need to layer codeshare data on top — a non-trivial join that we document on our codeshare/JV map.
T-100 does not measure cabin class. The transported-passengers field is total passengers on the flight, not premium-cabin passengers specifically. Cabin-class breakdowns require Form 41 Schedule P-12(b) which most carriers report under proprietary trade-secret protections that make it unavailable for public analysis.
The joins that unlock T-100's real value
T-100 Segment alone is route-and-carrier traffic. Joined with DB1B Market or DB1B Coupon (also from BTS), it becomes itinerary-level analysis — what passengers actually paid, where they connected, what cabin class they sat in. DB1B is sampled at 10% of all U.S. tickets, which limits its precision on long-tail routes but is plenty for trunk routes.
Joined with the FAA Aircraft Registry (separate public dataset), T-100 becomes per-tail-number analysis — useful for fleet-utilization studies and aircraft-retrofit tracking. The join key is the carrier reporting code plus the aircraft type code; tail-level identification requires DOT On-Time Performance data which carries the actual tail registration on each flight.
Joined with the DOT Air Travel Consumer Report (ATCR), T-100 becomes route-and-carrier on-time and operational quality. ATCR is the source for the on-time performance figures we surface on each route page; T-100 provides the traffic context that makes the OTP figures meaningful (a 95% on-time rate on a 12-flights-a-year route is a less robust signal than the same rate on a 3,000-flights-a-year route).
Common misreadings — and how to avoid them
The most common misreading is treating T-100 segment counts as itinerary counts. A passenger on a connecting JFK-LHR-AMS itinerary contributes to both JFK-LHR and LHR-AMS segment passenger totals. Summing segment passengers across a region produces a number that double-counts every connecting passenger. For itinerary-level analysis, switch to DB1B.
The second most common is treating BTS-reported share as market share. Operating-carrier passenger counts on a route are not the same as commercial-share data — codeshare and JV revenue allocations can shift the booking-side share substantially. T-100 share is "metal share", not "ticket share".
The third is using load factor as a demand signal without checking schedule changes. Load factor is passengers divided by available seats; a 95% load factor on a route where the carrier just cut capacity by 30% means something different from 95% on a stable route. Always cross-check available-seats trend before reading load factor as demand.
The fourth is comparing year-over-year on routes with material schedule disruption. Routes that had pandemic-era cancellations, hurricane disruptions, or carrier-level operational issues create misleading YoY deltas. The DOT ATCR cancellation data is the cleanest filter for this.
The fifth is treating the most-recent quarter as definitive. T-100 data restates as carriers correct or refile reports — the figures you read from a March release for the previous October may differ slightly from the same data read six months later. For point-in-time analysis, capture the data at read time and note the release ID.
Practical workflow
Download the T-100 Segment annual release from transtats.bts.gov/PREZIP/. The file is large (typically 250-500 MB compressed; ~1.5 GB decompressed for the annual). Use a streaming CSV parser rather than loading the whole file into memory. Filter early to the carrier, route, or aircraft type subset you care about — full-dataset operations on a laptop will work but are slow.
For premium-cabin analysis specifically, the join you usually want is T-100 Segment + the consolidator route catalog (or any other carrier-cabin source). Keys are typically (originCode, destCode) or a normalized route slug. Aggregate by (carrier, routeSlug) for per-route carrier breakdown; aggregate by (carrier) alone for system-wide figures.
For published analysis, attribute BTS as the source for the upstream traffic figures and your own joined view separately. The licence is permissive but attribution is the convention that keeps the data ecosystem honest. Our open dashboards publish the joined view at /data/airfare-trends with a downloadable CSV that walks through the join in flat-file form.
When T-100 is the wrong tool
T-100 stops being the right tool when you need foreign-carrier-only operations between non-U.S. points. London-Singapore on British Airways is outside T-100's scope because neither end touches U.S. territory. For non-U.S.-touching routes, the alternatives are commercial datasets (Cirium, OAG, Sabre) which are paid-licence and redistribution-restricted, or country-level publications from individual statistics agencies (which vary widely in scope and freshness).
T-100 is also the wrong tool for real-time analysis. The 3-4 month publishing lag means the latest available data is always at least a quarter old. For real-time operational signals, ADS-B feeds (FlightAware, FlightRadar24) are closer to live but commercially licensed for most analytic uses.
For passenger experience analysis (cabin product, lounge access, service ratings) T-100 carries no signal. Those questions live in user-generated review data, our own consolidator catalog, and editorial reviews — none of which are joinable to T-100 at the field level.
Where to go from here
For the practical application of T-100 to premium-cabin analysis, start with our open dashboard at /data/airfare-trends — every figure on it is reproducible from the BTS T-100 release plus our consolidator floors. The downloadable CSV walks through the join in flat-file form.
For the deeper analytical layer, the State of the Premium Cabin quarterly report at /reports/state-of-premium-cabin-2026-q2 narrows the dataset to the long-haul subset and frames the corridor-level patterns visible in the data. The methodology behind every number is documented in line.
For the BTS source itself, transtats.bts.gov is the canonical entry point. The T-100 Segment dataset is at the link at the bottom of the home page; DB1B is in the same Form 41 Schedule. The data dictionaries that explain every field are available as separate downloads — read them before attempting any non-trivial aggregation.