Summarizing Data¶
VisiData makes it pleasantly easy to summarize your data. There are three main types of summaries: Frequency tables, one-off calculations, and the Describe Sheet.
Frequency tables¶
Frequency tables are dead-simple, but also quite powerful. For the dead-simple usage: Navigate to any column, and then press Shift-F. If you did that on the first column (“OPERATOR”) of the FAA dataset, you should get something like this:
File Edit View Column Row Data Plot System Help | VisiData 3.0.2 | Alt+H for help men OPERATOR ║↓count♯│ percent%│ histogram ~║ UNKNOWN ║ 23076 │ 31.42 │ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ║ SOUTHWEST AIRLINES ║ 7752 │ 10.55 │ ■■■■■■■■■■■■ ║ BUSINESS ║ 5868 │ 7.99 │ ■■■■■■■■■ ║ AMERICAN AIRLINES ║ 4337 │ 5.90 │ ■■■■■■■ ║ DELTA AIR LINES ║ 2817 │ 3.84 │ ■■■■ ║ FEDEX EXPRESS ║ 2709 │ 3.69 │ ■■■■ ║ UNITED AIRLINES ║ 2194 │ 2.99 │ ■■■ ║ US AIRWAYS ║ 1885 │ 2.57 │ ■■■ ║ UPS AIRLINES ║ 1773 │ 2.41 │ ■■ ║ SKYWEST AIRLINES ║ 1769 │ 2.41 │ ■■ ║ JETBLUE AIRWAYS ║ 1740 │ 2.37 │ ■■ ║ EXPRESSJET AIRLINES║ 1347 │ 1.83 │ ■■ ║ AMERICAN EAGLE AIR…║ 1041 │ 1.42 │ ■ ║ ENVOY AIR ║ 883 │ 1.20 │ ■ ║ ALASKA AIRLINES ║ 835 │ 1.14 │ ■ ║ REPUBLIC AIRLINES ║ 804 │ 1.09 │ ■ ║ MESA AIRLINES ║ 693 │ 0.94 │ ■ ║ AIR WISCONSIN AIRL…║ 623 │ 0.85 │ ■ ║ PSA AIRLINES ║ 577 │ 0.79 │ ║ PRIVATELY OWNED ║ 516 │ 0.70 │ ║ PHI INC ║ 491 │ 0.67 │ ║ 2› faa-wildlife-strikes_OPERATOR_freq| Shift+F 282 bins
With just one keystroke, VisiData has already told us something useful about the dataset: That the “operators” associated with 31% of the wildlife strikes are, according to the FAA, “unknown.” We have also learned, from the “bins” mini-report at the bottom-right of the screen, that there are 282 distinct values in the “OPERATOR” column.
Multi-column frequencies¶
Sometimes you want to count how often combinations of columns occur. VisiData also makes this easy. First, turn the columns you want to count into “key” columns, using the ! button. Then, type gF.
For instance, if we wanted to count the combinations of the “OPERATOR” and “PERSON” fields, we’d hit ! on each of those columns — either from the main data sheet or in the Columns Sheet. Once you’ve done that, you should see something like this:
File Edit View Column Row Data Plot System Help | VisiData 3.0.2 | Alt+H for help men OPERATOR │ PERSON ║<BIRDS_STRUCK │ EFFECT │ DAMAGE │ COST_REPAIRS │ REMAINS> BUSINESS │ Tower ║ 1 │ NONE │ M │ │ 0 BUSINESS │ Tower ║ 1 │ NONE │ M │ │ 0 BUSINESS │ Tower ║ 1 │ NONE │ M │ │ 0 DELTA AIR LINES │ Tower ║ 1 │ NONE │ M │ │ 0 BUSINESS │ Tower ║ 1 │ │ M │ │ 0 DELTA AIR LINES │ Tower ║ 1 │ Other │ M? │ │ 0 DELTA AIR LINES │ Tower ║ 1 │ NONE │ M? │ │ 0 BUSINESS │ Tower ║ 1 │ PRECAUTIONARY LAND…│ M? │ │ 0 ALLEGIANT AIR │ Tower ║ 1 │ NONE │ M? │ │ 0 TRANS STATES AIRLI…│ Tower ║ 1 │ NONE │ M? │ │ 0 BUSINESS │ Tower ║ 1 │ NONE │ M? │ │ 0 GOVERNMENT │ Tower ║ 1 │ NONE │ M? │ │ 0 AMERICAN AIRLINES │ Tower ║ 1 │ NONE │ N │ │ 0 EXPRESSJET AIRLINES│ Tower ║ 1 │ NONE │ N │ │ 0 MESA AIRLINES │ Tower ║ 1 │ NONE │ N │ │ 0 BUSINESS │ Tower ║ 1 │ PRECAUTIONARY LAND…│ N │ │ 0 DELTA AIR LINES │ Tower ║ 1 │ NONE │ N │ │ 0 DELTA AIR LINES │ Tower ║ 1 │ NONE │ N │ │ 0 LUFTHANSA │ Tower ║ 1 │ NONE │ N │ │ 0 BUSINESS │ Tower ║ 1 │ NONE │ N │ │ 0 SPIRIT AIRLINES │ Tower ║ 1 │ PRECAUTIONARY LAND…│ N │ │ 0 1› faa-wildlife-strikes| ! key-col 73448 rows
Then, type gF, which should result in something like this:
File Edit View Column Row Data Plot System Help | VisiData 3.0.2 | Alt+H for help men OPERATOR │ PERSON ║↓count♯│ percent%│ histogram ~║ UNKNOWN │ Carcas…║ 22842 │ 31.10 │ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ║ SOUTHWEST AIRLINES │ Pilot ║ 5481 │ 7.46 │ ■■■■■■■■■ ║ AMERICAN AIRLINES │ Pilot ║ 2240 │ 3.05 │ ■■■ ║ BUSINESS │ Pilot ║ 2061 │ 2.81 │ ■■■ ║ DELTA AIR LINES │ Airpor…║ 1793 │ 2.44 │ ■■ ║ BUSINESS │ Airpor…║ 1642 │ 2.24 │ ■■ ║ AMERICAN AIRLINES │ Airpor…║ 1481 │ 2.02 │ ■■ ║ FEDEX EXPRESS │ Air Tr…║ 1434 │ 1.95 │ ■■ ║ UNITED AIRLINES │ Airpor…║ 1351 │ 1.84 │ ■■ ║ UPS AIRLINES │ Air Tr…║ 1343 │ 1.83 │ ■■ ║ SOUTHWEST AIRLINES │ Airpor…║ 1293 │ 1.76 │ ■■ ║ BUSINESS │ Tower ║ 1094 │ 1.49 │ ■ ║ SKYWEST AIRLINES │ Airpor…║ 1024 │ 1.39 │ ■ ║ FEDEX EXPRESS │ Pilot ║ 963 │ 1.31 │ ■ ║ EXPRESSJET AIRLINES│ Airpor…║ 785 │ 1.07 │ ■ ║ JETBLUE AIRWAYS │ Airpor…║ 749 │ 1.02 │ ■ ║ US AIRWAYS │ Air Tr…║ 672 │ 0.91 │ ■ ║ US AIRWAYS │ Airpor…║ 656 │ 0.89 │ ■ ║ ALASKA AIRLINES │ Airpor…║ 599 │ 0.82 │ ║ DELTA AIR LINES │ Tower ║ 572 │ 0.78 │ ║ AMERICAN EAGLE AIR…│ Pilot ║ 535 │ 0.73 │ ║ 2› faa-wildlife-strikes_OPERATOR-PERSON_freq| processing… gShift+F 879 bins
To make it easier to read, type g_, which will expand the column widths:
File Edit View Column Row Data Plot System Help | VisiData 3.0.2 | Alt+H for help men OPERATOR │ PERSON ║↓count♯│ percent%│ histogram UNKNOWN │ Carcass Found ║ 22842 │ 31.10 │ ■■■■■■■■■■■■■■■■■■■■■■■■■■… SOUTHWEST AIRLINES │ Pilot ║ 5481 │ 7.46 │ ■■■■■■■■■ AMERICAN AIRLINES │ Pilot ║ 2240 │ 3.05 │ ■■■ BUSINESS │ Pilot ║ 2061 │ 2.81 │ ■■■ DELTA AIR LINES │ Airport Operations ║ 1793 │ 2.44 │ ■■ BUSINESS │ Airport Operations ║ 1642 │ 2.24 │ ■■ AMERICAN AIRLINES │ Airport Operations ║ 1481 │ 2.02 │ ■■ FEDEX EXPRESS │ Air Transport Operations ║ 1434 │ 1.95 │ ■■ UNITED AIRLINES │ Airport Operations ║ 1351 │ 1.84 │ ■■ UPS AIRLINES │ Air Transport Operations ║ 1343 │ 1.83 │ ■■ SOUTHWEST AIRLINES │ Airport Operations ║ 1293 │ 1.76 │ ■■ BUSINESS │ Tower ║ 1094 │ 1.49 │ ■ SKYWEST AIRLINES │ Airport Operations ║ 1024 │ 1.39 │ ■ FEDEX EXPRESS │ Pilot ║ 963 │ 1.31 │ ■ EXPRESSJET AIRLINES │ Airport Operations ║ 785 │ 1.07 │ ■ JETBLUE AIRWAYS │ Airport Operations ║ 749 │ 1.02 │ ■ US AIRWAYS │ Air Transport Operations ║ 672 │ 0.91 │ ■ US AIRWAYS │ Airport Operations ║ 656 │ 0.89 │ ■ ALASKA AIRLINES │ Airport Operations ║ 599 │ 0.82 │ DELTA AIR LINES │ Tower ║ 572 │ 0.78 │ AMERICAN EAGLE AIRLINES │ Pilot ║ 535 │ 0.73 │ 2› faa-wildlife-strikes_OPERATOR-PERSON_freq| g_ resize-cols-max 879 bins
With just a few more keystrokes, we’ve learned something else: Virtually all wildlife strikes with “unknown” operators were identified based on carcasses found at the airport (rather than reports from pilots, for example).
Adding “aggregators”¶
By default, frequency tables just count the number of times each value appears. But in VisiData, you can specify additional calculations by setting the column’s “aggregators”. (You might remember this field from the Columns Sheet.) In VisiData’s aggregators include min
, max
, mean
, median
, sum
, distinct
, and others.
To add an aggregator to a column, navigate to that column and press +. VisiData will then prompt you to specify which aggregator you would like to add. You can type out your desired aggregator, or use Control-x to open the interactive chooser and select from a list of options.
For example, let’s go back to the original FAA data sheet. Let’s navigate to the “COST_REPAIRS” column, and then do the following:
- Press # to tell VisiData this is an integer column
- Press + to tell VisiData you want to add an aggregator
- At the prompt, type
sum
and then hit Enter to add the summation aggregator - Navigate to the “AIRPORT” column, and press Shift-F
You should see something like this, with the sum
calculation now appearing in your frequency table:
File Edit View Column Row Data Plot System Help | VisiData 3.0.2 | Alt+H for help men AIRPORT ║↓count♯│ COST_REPAIRS_sum#║ UNKNOWN ║ 8424 │ 17931422 ║ DENVER INTL AIRPORT║ 2756 │ 9033148 ║ DALLAS/FORT WORTH …║ 2392 │ 927920 ║ CHICAGO O'HARE INT…║ 1583 │ 1146738 ║ JOHN F KENNEDY INTL║ 1302 │ 310917 ║ MEMPHIS INTL ║ 1217 │ 4507487 ║ SALT LAKE CITY INTL║ 1179 │ 6031324 ║ PHILADELPHIA INTL ║ 1131 │ 2213030 ║ ORLANDO INTL ║ 1026 │ 71500 ║ SACRAMENTO INTL ║ 1021 │ 3634816 ║ LA GUARDIA ARPT ║ 974 │ 381504 ║ CHARLOTTE/DOUGLAS …║ 960 │ 86960 ║ NEWARK LIBERTY INT…║ 917 │ 506489 ║ LOUISVILLE INTL AR…║ 841 │ 1366299 ║ AUSTIN-BERGSTROM I…║ 817 │ 545500 ║ LOUIE ARMSTRONG NE…║ 809 │ 406000 ║ KANSAS CITY INTL ║ 807 │ 75660 ║ HARTSFIELD - JACKS…║ 775 │ 284232 ║ GEORGE BUSH INTERC…║ 746 │ 52000 ║ DETROIT METRO WAYN…║ 734 │ 48983 ║ BALTIMORE/WASH INT…║ 691 │ 339300 ║ 3› faa-wildlife-strikes_AIRPORT_freq| Shift+F 1512 bins
By default, frequency tables are sorted by the “count” column, but you can sort them by any other column.
Note
When using aggregators, make sure that you’ve assigned the proper type (#
for integer columns, etc.) to the columns of interest, so that VisiData knows how to calculate the aggregations correctly.
One-off calculations¶
From any data sheet, you can also run a single calculation on all rows — or all selected rows — in a column. To do that, navigate to the column and type z+, which will bring up the same aggregator-choice prompt as above. Type the aggregator you want, and press Enter. At the bottom of the screen, you’ll see the result of the calculation.
You can try this on the FAA data sheet we’ve been working with. Navigate to the “COST_REPAIRS” column, and then do the following:
- Press # to tell VisiData this is an integer column (if you haven’t already)
- Type z+ to see the result of an aggregator
- Type
sum
and then hit Enter
At the bottom of the screen, you should see something like this, indicating that the total reported cost of repairs is $161,868,071:
DELTA AIR LINES │ Tower ║ 1 │ NONE │ N │ !│ 0 LUFTHANSA │ Tower ║ 1 │ NONE ┌─────────────────────────────| statuses |─┐ BUSINESS │ Tower ║ 1 │ NONE │ COST_REPAIRS_sum=161868071 │ SPIRIT AIRLINES │ Tower ║ 1 │ PRECAUTIO└──────────────────────────────────────────┘ 1› faa-wildlife-strikes| z+ memo-aggregate 73448 rows
Summarizing all columns¶
To get a bird’s-eye view of your entire dataset, press Shift-I, which will provide summary statistics for each of your columns:
File Edit View Column Row Data Plot System Help | VisiData 3.0.2 | Alt+H for help men column ║ errors ♯│ nulls ♯│ distinct♯│ mode ~│ min ~│ max ~│ sum │ > OPERATOR ║ 0 │ 0 │ 282 │ UNKNOWN │ │ │ │ ATYPE ║ 0 │ 0 │ 400 │ UNKNOWN │ │ │ │ INCIDENT_DATE ║ 0 │ 0 │ 2343 │ 10/30/14…│ │ │ │ STATE ║ 0 │ 0 │ 63 │ │ │ │ │ AIRPORT ║ 0 │ 0 │ 1512 │ UNKNOWN │ │ │ │ PHASE_OF_FLT ║ 0 │ 0 │ 21 │ │ │ │ │ HEIGHT ║ 0 │ 0 │ 319 │ │ │ │ │ SPEED ║ 0 │ 0 │ 225 │ │ │ │ │ SPECIES ║ 0 │ 0 │ 641 │ Unknown …│ │ │ │ BIRDS_STRUCK ║ 0 │ 0 │ 5 │ 1 │ │ │ │ EFFECT ║ 0 │ 0 │ 8 │ None │ │ │ │ DAMAGE ║ 0 │ 0 │ 6 │ N │ │ │ │ COST_REPAIRS ║ 72145 │ 0 │ 478 │ 5000 │ 10 │ 6500000 │ 1618680…#│ 1 PERSON ║ 0 │ 0 │ 7 │ Carcass …│ │ │ │ REMAINS_COLLECTED ║ 0 │ 0 │ 2 │ 1 │ │ │ │ REMARKS ║ 0 │ 0 │ 61240 │ │ │ │ │ 4› faa-wildlife-strikes_describe| Shift+I 16 columns
In VisiData, this is called the “Describe Sheet”. You’ll notice that there are only min/max/median/etc. calculations for the columns we’ve given types — just the COST_REPAIRS
column so far. If we go back to the data sheet and tell VisiData that the HEIGHT
, SPEED
, and BIRDS_STRUCK
fields are numbers, too, then pressing Shift-I will result in something like this:
File Edit View Column Row Data Plot System Help | VisiData 3.0.2 | Alt+H for help men column ║ errors ♯│ nulls ♯│ distinct♯│ mode ~│ min ~│ max ~│ sum │ > OPERATOR ║ 0 │ 0 │ 282 │ UNKNOWN │ │ │ │ ATYPE ║ 0 │ 0 │ 400 │ UNKNOWN │ │ │ │ INCIDENT_DATE ║ 0 │ 0 │ 2343 │ 10/30/14…│ │ │ │ STATE ║ 0 │ 0 │ 63 │ │ │ │ │ AIRPORT ║ 0 │ 0 │ 1512 │ UNKNOWN │ │ │ │ PHASE_OF_FLT ║ 0 │ 0 │ 21 │ │ │ │ │ HEIGHT ║ 35629 │ 0 │ 318 │ 0 │ 0 │ 31300 │ 33937719#│ 5 SPEED ║ 48384 │ 0 │ 224 │ 140 │ 0 │ 374 │ 3614411 #│ 1 SPECIES ║ 0 │ 0 │ 641 │ Unknown …│ │ │ │ BIRDS_STRUCK ║ 7966 │ 0 │ 1 │ 1 │ 1 │ 1 │ 65482 #│ 1 EFFECT ║ 0 │ 0 │ 8 │ None │ │ │ │ DAMAGE ║ 0 │ 0 │ 6 │ N │ │ │ │ COST_REPAIRS ║ 72145 │ 0 │ 478 │ 5000 │ 10 │ 6500000 │ 1618680…#│ 1 PERSON ║ 0 │ 0 │ 7 │ Carcass …│ │ │ │ REMAINS_COLLECTED ║ 0 │ 0 │ 2 │ 1 │ │ │ │ REMARKS ║ 0 │ 0 │ 61240 │ │ │ │ │ 5› faa-wildlife-strikes_describe| Shift+I 16 columns