Summarizing Data

VisiData makes it pleasantly easy to summarize your data. There are three main types of summaries: Frequency tables, one-off calculations, and the Describe Sheet.

Frequency tables

Frequency tables are dead-simple, but also quite powerful. For the dead-simple usage: Navigate to any column, and then press Shift-F. If you did that on the first column (“OPERATOR”) of the FAA dataset, you should get something like this:

  File  Edit  View  Column  Row  Data  Plot  System  Help               Ctrl+H for help menu        
 OPERATOR           ↓count♯| percent%| histogram                                         ~        
 UNKNOWN            ║ 23076 |   31.42 | ************************************************** ║        
 SOUTHWEST AIRLINES   7752 |   10.55 | ****************                                           
 BUSINESS             5868 |    7.99 | ************                                               
 AMERICAN AIRLINES    4337 |    5.90 | *********                                                  
 DELTA AIR LINES      2817 |    3.84 | ******                                                     
 FEDEX EXPRESS        2709 |    3.69 | *****                                                      
 UNITED AIRLINES      2194 |    2.99 | ****                                                       
 US AIRWAYS           1885 |    2.57 | ****                                                       
 UPS AIRLINES         1773 |    2.41 | ***                                                        
 SKYWEST AIRLINES     1769 |    2.41 | ***                                                        
 JETBLUE AIRWAYS      1740 |    2.37 | ***                                                        
 EXPRESSJET AIRLINES  1347 |    1.83 | **                                                         
 AMERICAN EAGLE AIR…  1041 |    1.42 | **                                                         
 ENVOY AIR             883 |    1.20 | *                                                          
 ALASKA AIRLINES       835 |    1.14 | *                                                          
 REPUBLIC AIRLINES     804 |    1.09 | *                                                          
 MESA AIRLINES         693 |    0.94 | *                                                          
 AIR WISCONSIN AIRL…   623 |    0.85 | *                                                          
 PSA AIRLINES          577 |    0.79 | *                                                          
 PRIVATELY OWNED       516 |    0.70 | *                                                          
 PHI INC               491 |    0.67 | *                                                          
2› faa-wildlife-strikes_OPERATOR_freq|                                 Shift+F         282 bins  •0 

With just one keystroke, VisiData has already told us something useful about the dataset: That the “operators” associated with 31% of the wildlife strikes are, according to the FAA, “unknown.” We have also learned, from the “bins” mini-report at the bottom-right of the screen, that there are 282 distinct values in the “OPERATOR” column.

Multi-column frequencies

Sometimes you want to count how often combinations of columns occur. VisiData also makes this easy. First, turn the columns you want to count into “key” columns, using the ! button. Then, type gF.

For instance, if we wanted to count the combinations of the “OPERATOR” and “PERSON” fields, we’d hit ! on each of those columns — either from the main data sheet or in the Columns Sheet. Once you’ve done that, you should see something like this:

  File  Edit  View  Column  Row  Data  Plot  System  Help               Ctrl+H for help menu        
 OPERATOR           | PERSON ║<BIRDS_STRUCK | EFFECT             | DAMAGE | COST_REPAIRS | REMAINS> 
 BUSINESS           | Tower  ║ 1            | NONE               | M      |              | 0        
 BUSINESS           | Tower   1            | NONE               | M      |              | 0        
 BUSINESS           | Tower   1            | NONE               | M      |              | 0        
 DELTA AIR LINES    | Tower   1            | NONE               | M      |              | 0        
 BUSINESS           | Tower   1            |                    | M      |              | 0        
 DELTA AIR LINES    | Tower   1            | Other              | M?     |              | 0        
 DELTA AIR LINES    | Tower   1            | NONE               | M?     |              | 0        
 BUSINESS           | Tower   1            | PRECAUTIONARY LAND…| M?     |              | 0        
 ALLEGIANT AIR      | Tower   1            | NONE               | M?     |              | 0        
 TRANS STATES AIRLI…| Tower   1            | NONE               | M?     |              | 0        
 BUSINESS           | Tower   1            | NONE               | M?     |              | 0        
 GOVERNMENT         | Tower   1            | NONE               | M?     |              | 0        
 AMERICAN AIRLINES  | Tower   1            | NONE               | N      |              | 0        
 EXPRESSJET AIRLINES| Tower   1            | NONE               | N      |              | 0        
 MESA AIRLINES      | Tower   1            | NONE               | N      |              | 0        
 BUSINESS           | Tower   1            | PRECAUTIONARY LAND…| N      |              | 0        
 DELTA AIR LINES    | Tower   1            | NONE               | N      |              | 0        
 DELTA AIR LINES    | Tower   1            | NONE               | N      |              | 0        
 LUFTHANSA          | Tower   1            | NONE               | N      |              | 0        
 BUSINESS           | Tower   1            | NONE               | N      |              | 0        
 SPIRIT AIRLINES    | Tower   1            | PRECAUTIONARY LAND…| N      |              | 0        
1› faa-wildlife-strikes|                                              !  key-col     73448 rows  •0 

Then, type gF, which should result in something like this:

  File  Edit  View  Column  Row  Data  Plot  System  Help               Ctrl+H for help menu        
 OPERATOR           | PERSON ↓count♯| percent%| histogram                                          
 UNKNOWN            | Carcas…║ 22842 |   31.10 | ************************************************** 
 SOUTHWEST AIRLINES | Pilot    5481 |    7.46 | ***********                                        
 AMERICAN AIRLINES  | Pilot    2240 |    3.05 | ****                                               
 BUSINESS           | Pilot    2061 |    2.81 | ****                                               
 DELTA AIR LINES    | Airpor…  1793 |    2.44 | ***                                                
 BUSINESS           | Airpor…  1642 |    2.24 | ***                                                
 AMERICAN AIRLINES  | Airpor…  1481 |    2.02 | ***                                                
 FEDEX EXPRESS      | Air Tr…  1434 |    1.95 | ***                                                
 UNITED AIRLINES    | Airpor…  1351 |    1.84 | **                                                 
 UPS AIRLINES       | Air Tr…  1343 |    1.83 | **                                                 
 SOUTHWEST AIRLINES | Airpor…  1293 |    1.76 | **                                                 
 BUSINESS           | Tower    1094 |    1.49 | **                                                 
 SKYWEST AIRLINES   | Airpor…  1024 |    1.39 | **                                                 
 FEDEX EXPRESS      | Pilot     963 |    1.31 | **                                                 
 EXPRESSJET AIRLINES| Airpor…   785 |    1.07 | *                                                  
 JETBLUE AIRWAYS    | Airpor…   749 |    1.02 | *                                                  
 US AIRWAYS         | Air Tr…   672 |    0.91 | *                                                  
 US AIRWAYS         | Airpor…   656 |    0.89 | *                                                  
 ALASKA AIRLINES    | Airpor…   599 |    0.82 | *                                                  
 DELTA AIR LINES    | Tower     572 |    0.78 | *                                                  
 AMERICAN EAGLE AIR…| Pilot     535 |    0.73 | *                                                  
2› faa-wildlife-strikes_OPERATOR-PERSON_freq|                         gShift+F         879 bins  •0 

To make it easier to read, type g_, which will expand the column widths:

  File  Edit  View  Column  Row  Data  Plot  System  Help               Ctrl+H for help menu        
 OPERATOR                | PERSON                   ↓count♯| percent%| histogram                   
 UNKNOWN                 | Carcass Found            ║ 22842 |   31.10 | **************************… 
 SOUTHWEST AIRLINES      | Pilot                      5481 |    7.46 | ***********                 
 AMERICAN AIRLINES       | Pilot                      2240 |    3.05 | ****                        
 BUSINESS                | Pilot                      2061 |    2.81 | ****                        
 DELTA AIR LINES         | Airport Operations         1793 |    2.44 | ***                         
 BUSINESS                | Airport Operations         1642 |    2.24 | ***                         
 AMERICAN AIRLINES       | Airport Operations         1481 |    2.02 | ***                         
 FEDEX EXPRESS           | Air Transport Operations   1434 |    1.95 | ***                         
 UNITED AIRLINES         | Airport Operations         1351 |    1.84 | **                          
 UPS AIRLINES            | Air Transport Operations   1343 |    1.83 | **                          
 SOUTHWEST AIRLINES      | Airport Operations         1293 |    1.76 | **                          
 BUSINESS                | Tower                      1094 |    1.49 | **                          
 SKYWEST AIRLINES        | Airport Operations         1024 |    1.39 | **                          
 FEDEX EXPRESS           | Pilot                       963 |    1.31 | **                          
 EXPRESSJET AIRLINES     | Airport Operations          785 |    1.07 | *                           
 JETBLUE AIRWAYS         | Airport Operations          749 |    1.02 | *                           
 US AIRWAYS              | Air Transport Operations    672 |    0.91 | *                           
 US AIRWAYS              | Airport Operations          656 |    0.89 | *                           
 ALASKA AIRLINES         | Airport Operations          599 |    0.82 | *                           
 DELTA AIR LINES         | Tower                       572 |    0.78 | *                           
 AMERICAN EAGLE AIRLINES | Pilot                       535 |    0.73 | *                           
2› faa-wildlife-strikes_OPERATOR-PERSON_freq|                g_  resize-cols-max       879 bins  •0 

With just a few more keystrokes, we’ve learned something else: Virtually all wildlife strikes with “unknown” operators were identified based on carcasses found at the airport (rather than reports from pilots, for example).

Adding “aggregators”

By default, frequency tables just count the number of times each value appears. But in VisiData, you can specify additional calculations by setting the column’s “aggregators”. (You might remember this field from the Columns Sheet.) In VisiData’s aggregators include min, max, mean, median, sum, distinct, and others.

To add an aggregator to a column, navigate to that column and press +. VisiData will then prompt you to specify which aggregator you would like to add. You can type out your desired aggregator, or use Control-x to open the interactive chooser and select from a list of options.

For example, let’s go back to the original FAA data sheet. Let’s navigate to the “COST_REPAIRS” column, and then do the following:

  • Press # to tell VisiData this is an integer column
  • Press + to tell VisiData you want to add an aggregator
  • At the prompt, type sum and then hit Enter to add the summation aggregator
  • Navigate to the “AIRPORT” column, and press Shift-F

You should see something like this, with the sum calculation now appearing in your frequency table:

  File  Edit  View  Column  Row  Data  Plot  System  Help               Ctrl+H for help menu        
 AIRPORT            ↓count♯| COST_REPAIRS_sum#                                                    
 UNKNOWN            ║  8424 |         17931422 ║                                                    
 DENVER INTL AIRPORT  2756 |          9033148                                                     
 DALLAS/FORT WORTH …  2392 |           927920                                                     
 CHICAGO O'HARE INT…  1583 |          1146738                                                     
 JOHN F KENNEDY INTL  1302 |           310917                                                     
 MEMPHIS INTL         1217 |          4507487                                                     
 SALT LAKE CITY INTL  1179 |          6031324                                                     
 PHILADELPHIA INTL    1131 |          2213030                                                     
 ORLANDO INTL         1026 |            71500                                                     
 SACRAMENTO INTL      1021 |          3634816                                                     
 LA GUARDIA ARPT       974 |           381504                                                     
 CHARLOTTE/DOUGLAS …   960 |            86960                                                     
 NEWARK LIBERTY INT…   917 |           506489                                                     
 LOUISVILLE INTL AR…   841 |          1366299                                                     
 AUSTIN-BERGSTROM I…   817 |           545500                                                     
 LOUIE ARMSTRONG NE…   809 |           406000                                                     
 KANSAS CITY INTL      807 |            75660                                                     
 HARTSFIELD - JACKS…   775 |           284232                                                     
 GEORGE BUSH INTERC…   746 |            52000                                                     
 DETROIT METRO WAYN…   734 |            48983                                                     
 BALTIMORE/WASH INT…   691 |           339300                                                     
3› faa-wildlife-strikes_AIRPORT_freq|                                  Shift+F        1512 bins  •0 

By default, frequency tables are sorted by the “count” column, but you can sort them by any other column.

Note

When using aggregators, make sure that you’ve assigned the proper type (# for integer columns, etc.) to the columns of interest, so that VisiData knows how to calculate the aggregations correctly.

One-off calculations

From any data sheet, you can also run a single calculation on all rows — or all selected rows — in a column. To do that, navigate to the column and type z+, which will bring up the same aggregator-choice prompt as above. Type the aggregator you want, and press Enter. At the bottom of the screen, you’ll see the result of the calculation.

You can try this on the FAA data sheet we’ve been working with. Navigate to the “COST_REPAIRS” column, and then do the following:

  • Press # to tell VisiData this is an integer column (if you haven’t already)
  • Type z+ to see the result of an aggregator
  • Type sum and then hit Enter

At the bottom of the screen, you should see something like this, indicating that the total reported cost of repairs is $161,868,071:

 BUSINESS           | Tower   1            | NONE               | N      |             !| 0        
 SPIRIT AIRLINES    | Tower   1            | PRECAUTIONARY LAND…| N      |             !| 0        
1› faa-wildlife-strikes| "sum" | COST_REPAIRS_sum=161868071   z+  memo-aggregate     73448 rows  •0 

Summarizing all columns

To get a bird’s-eye view of your entire dataset, press Shift-I, which will provide summary statistics for each of your columns:

  File  Edit  View  Column  Row  Data  Plot  System  Help               Ctrl+H for help menu        
 column             errors♯| nulls♯| distinct♯| mode~| min~| max~| sum | median~| mean%| stdev%   
 OPERATOR          ║      0 |     0 |      282 | UNKN…|     |     |     |        |     !|      !   
 ATYPE                   0 |     0 |      400 | UNKN…|     |     |     |        |     !|      !   
 INCIDENT_DATE           0 |     0 |     2343 | 10/3…|     |     |     |        |     !|      !   
 STATE                   0 |     0 |       63 |      |     |     |     |        |     !|      !   
 AIRPORT                 0 |     0 |     1512 | UNKN…|     |     |     |        |     !|      !   
 PHASE_OF_FLT            0 |     0 |       21 |      |     |     |     |        |     !|      !   
 HEIGHT                  0 |     0 |      319 |      |     |     |     |        |     !|      !   
 SPEED                   0 |     0 |      225 |      |     |     |     |        |     !|      !   
 SPECIES                 0 |     0 |      641 | Unkn…|     |     |     |        |     !|      !   
 BIRDS_STRUCK            0 |     0 |        5 | 1    |     |     |     |        |     !|      !   
 EFFECT                  0 |     0 |        8 | None |     |     |     |        |     !|      !   
 DAMAGE                  0 |     0 |        6 | N    |     |     |     |        |     !|      !   
 COST_REPAIRS      ║  72145 |     0 |      478 | 5000 | 10  | 650…| 16…#| 13468  | 1242…| 52603…║   
 PERSON            ║      0 |     0 |        7 | Carc…|     |     |     |        |     !|      !   
 REMAINS_COLLECTED       0 |     0 |        2 | 1    |     |     |     |        |     !|      !   
 REMARKS                 0 |     0 |    61240 |      |     |     |     |        |     !|      !   
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
4› faa-wildlife-strikes_describe|                                   Shift+I          16 columns  •0 

In VisiData, this is called the “Describe Sheet”. You’ll notice that there are only min/max/median/etc. calculations for the columns we’ve given types — just the COST_REPAIRS column so far. If we go back to the data sheet and tell VisiData that the HEIGHT, SPEED, and BIRDS_STRUCK fields are numbers, too, then pressing Shift-I will result in something like this:

  File  Edit  View  Column  Row  Data  Plot  System  Help               Ctrl+H for help menu        
 column             errors♯| nulls♯| distinct♯| mode~| min~| max~| sum | median~| mean%| stdev%   
 OPERATOR          ║      0 |     0 |      282 | UNKN…|     |     |     |        |     !|      !   
 ATYPE                   0 |     0 |      400 | UNKN…|     |     |     |        |     !|      !   
 INCIDENT_DATE           0 |     0 |     2343 | 10/3…|     |     |     |        |     !|      !   
 STATE                   0 |     0 |       63 |      |     |     |     |        |     !|      !   
 AIRPORT                 0 |     0 |     1512 | UNKN…|     |     |     |        |     !|      !   
 PHASE_OF_FLT            0 |     0 |       21 |      |     |     |     |        |     !|      !   
 HEIGHT              35629 |     0 |      318 | 0    | 0   | 313…| 33…#| 50     | 897.…| 1820.…   
 SPEED               48384 |     0 |      224 | 140  | 0   | 374 | 36…#| 140.0  | 144.…| 46.37    
 SPECIES                 0 |     0 |      641 | Unkn…|     |     |     |        |     !|      !   
 BIRDS_STRUCK      ║   7966 |     0 |        1 | 1    | 1   | 1   | 65…#| 1.0    | 1.00 |  0.00 ║   
 EFFECT                  0 |     0 |        8 | None |     |     |     |        |     !|      !   
 DAMAGE                  0 |     0 |        6 | N    |     |     |     |        |     !|      !   
 COST_REPAIRS        72145 |     0 |      478 | 5000 | 10  | 650…| 16…#| 13468  | 1242…| 52603…   
 PERSON            ║      0 |     0 |        7 | Carc…|     |     |     |        |     !|      !   
 REMAINS_COLLECTED       0 |     0 |        2 | 1    |     |     |     |        |     !|      !   
 REMARKS                 0 |     0 |    61240 |      |     |     |     |        |     !|      !   
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
5› faa-wildlife-strikes_describe|                                   Shift+I          16 columns  •0