Summarizing Data

VisiData makes it pleasantly easy to summarize your data. There are three main types of summaries: Frequency tables, one-off calculations, and the Describe Sheet.

Frequency tables

Frequency tables are dead-simple, but also quite powerful. For the dead-simple usage: Navigate to any column, and then press Shift-F. If you did that on the first column (“OPERATOR”) of the FAA dataset, you should get something like this:

 OPERATOR            count  #| percent  %| histogram                                              
 UNKNOWN            ‖   23076 |     31.42 | ************************************************** ‖    
 SOUTHWEST AIRLINES     7752 |     10.55 | ****************                                       
 BUSINESS               5868 |      7.99 | ************                                           
 AMERICAN AIRLINES      4337 |      5.90 | *********                                              
 DELTA AIR LINES        2817 |      3.84 | ******                                                 
 FEDEX EXPRESS          2709 |      3.69 | *****                                                  
 UNITED AIRLINES        2194 |      2.99 | ****                                                   
 US AIRWAYS             1885 |      2.57 | ****                                                   
 UPS AIRLINES           1773 |      2.41 | ***                                                    
 SKYWEST AIRLINES       1769 |      2.41 | ***                                                    
 JETBLUE AIRWAYS        1740 |      2.37 | ***                                                    
 EXPRESSJET AIRLINES    1347 |      1.83 | **                                                     
 AMERICAN EAGLE AIR…    1041 |      1.42 | **                                                     
 ENVOY AIR               883 |      1.20 | *                                                      
 ALASKA AIRLINES         835 |      1.14 | *                                                      
 REPUBLIC AIRLINES       804 |      1.09 | *                                                      
 MESA AIRLINES           693 |      0.94 | *                                                      
 AIR WISCONSIN AIRL…     623 |      0.85 | *                                                      
 PSA AIRLINES            577 |      0.79 | *                                                      
 PRIVATELY OWNED         516 |      0.70 | *                                                      
 PHI INC                 491 |      0.67 | *                                                      
 SHUTTLE AMERICA         467 |      0.64 | *                                                      
faa-wildlife-strikes_OPERATOR_freq|                                                F       282 bins 

With just one keystroke, VisiData has already told us something useful about the dataset: That the “operators” associated with 31% of the wildlife strikes are, according to the FAA, “unknown.” We have also learned, from the “bins” mini-report at the bottom-right of the screen, that there are 282 distinct values in the “OPERATOR” column.

Multi-column frequencies

Sometimes you want to count how often combinations of columns occur. VisiData also makes this easy. First, turn the columns you want to count into “key” columns, using the ! button. Then, type gF.

For instance, if we wanted to count the combinations of the “OPERATOR” and “PERSON” fields, we’d hit ! on each of those columns — either from the main data sheet or in the Columns Sheet. Once you’ve done that, you should see something like this:

 OPERATOR           | PERSON   ‖<BIRDS_STRUCK   | EFFECT             | DAMAGE   | COST_REPAIRS   |> 
 BUSINESS           | Tower    ‖ 1              | NONE               | M        |                |… 
 BUSINESS           | Tower     1              | NONE               | M        |                | 
 BUSINESS           | Tower     1              | NONE               | M        |                | 
 DELTA AIR LINES    | Tower     1              | NONE               | M        |                | 
 BUSINESS           | Tower     1              |                    | M        |                | 
 DELTA AIR LINES    | Tower     1              | Other              | M?       |                | 
 DELTA AIR LINES    | Tower     1              | NONE               | M?       |                | 
 BUSINESS           | Tower     1              | PRECAUTIONARY LAND…| M?       |                | 
 ALLEGIANT AIR      | Tower     1              | NONE               | M?       |                | 
 TRANS STATES AIRLI…| Tower     1              | NONE               | M?       |                | 
 BUSINESS           | Tower     1              | NONE               | M?       |                | 
 GOVERNMENT         | Tower     1              | NONE               | M?       |                | 
 AMERICAN AIRLINES  | Tower     1              | NONE               | N        |                | 
 EXPRESSJET AIRLINES| Tower     1              | NONE               | N        |                | 
 MESA AIRLINES      | Tower     1              | NONE               | N        |                | 
 BUSINESS           | Tower     1              | PRECAUTIONARY LAND…| N        |                | 
 DELTA AIR LINES    | Tower     1              | NONE               | N        |                | 
 DELTA AIR LINES    | Tower     1              | NONE               | N        |                | 
 LUFTHANSA          | Tower     1              | NONE               | N        |                | 
 BUSINESS           | Tower     1              | NONE               | N        |                | 
 SPIRIT AIRLINES    | Tower     1              | PRECAUTIONARY LAND…| N        |                | 
 EXPRESSJET AIRLINES| Tower     1              | NONE               | N        |                | 
faa-wildlife-strikes|                                                              !     73448 rows 

Then, type gF, which should result in something like this:

 OPERATOR           | PERSON    count  #| percent  %| histogram                                    
 UNKNOWN            | Carcass …‖   22842 |     31.10 | *******************************************… 
 SOUTHWEST AIRLINES | Pilot        5481 |      7.46 | ***********                                  
 AMERICAN AIRLINES  | Pilot        2240 |      3.05 | ****                                         
 BUSINESS           | Pilot        2061 |      2.81 | ****                                         
 DELTA AIR LINES    | Airport …    1793 |      2.44 | ***                                          
 BUSINESS           | Airport …    1642 |      2.24 | ***                                          
 AMERICAN AIRLINES  | Airport …    1481 |      2.02 | ***                                          
 FEDEX EXPRESS      | Air Tran…    1434 |      1.95 | ***                                          
 UNITED AIRLINES    | Airport …    1351 |      1.84 | **                                           
 UPS AIRLINES       | Air Tran…    1343 |      1.83 | **                                           
 SOUTHWEST AIRLINES | Airport …    1293 |      1.76 | **                                           
 BUSINESS           | Tower        1094 |      1.49 | **                                           
 SKYWEST AIRLINES   | Airport …    1024 |      1.39 | **                                           
 FEDEX EXPRESS      | Pilot         963 |      1.31 | **                                           
 EXPRESSJET AIRLINES| Airport …     785 |      1.07 | *                                            
 JETBLUE AIRWAYS    | Airport …     749 |      1.02 | *                                            
 US AIRWAYS         | Air Tran…     672 |      0.91 | *                                            
 US AIRWAYS         | Airport …     656 |      0.89 | *                                            
 ALASKA AIRLINES    | Airport …     599 |      0.82 | *                                            
 DELTA AIR LINES    | Tower         572 |      0.78 | *                                            
 AMERICAN EAGLE AIR…| Pilot         535 |      0.73 | *                                            
 BUSINESS           |               509 |      0.69 | *                                            
faa-wildlife-strikes_OPERATOR-PERSON_freq|                                        gF       879 bins 

To make it easier to read, type g_, which will expand the column widths:

 OPERATOR                | PERSON                    count#| percent%| histogram                   
 UNKNOWN                 | Carcass Found            ‖ 22842 |   31.10 | **************************… 
 SOUTHWEST AIRLINES      | Pilot                      5481 |    7.46 | ***********                 
 AMERICAN AIRLINES       | Pilot                      2240 |    3.05 | ****                        
 BUSINESS                | Pilot                      2061 |    2.81 | ****                        
 DELTA AIR LINES         | Airport Operations         1793 |    2.44 | ***                         
 BUSINESS                | Airport Operations         1642 |    2.24 | ***                         
 AMERICAN AIRLINES       | Airport Operations         1481 |    2.02 | ***                         
 FEDEX EXPRESS           | Air Transport Operations   1434 |    1.95 | ***                         
 UNITED AIRLINES         | Airport Operations         1351 |    1.84 | **                          
 UPS AIRLINES            | Air Transport Operations   1343 |    1.83 | **                          
 SOUTHWEST AIRLINES      | Airport Operations         1293 |    1.76 | **                          
 BUSINESS                | Tower                      1094 |    1.49 | **                          
 SKYWEST AIRLINES        | Airport Operations         1024 |    1.39 | **                          
 FEDEX EXPRESS           | Pilot                       963 |    1.31 | **                          
 EXPRESSJET AIRLINES     | Airport Operations          785 |    1.07 | *                           
 JETBLUE AIRWAYS         | Airport Operations          749 |    1.02 | *                           
 US AIRWAYS              | Air Transport Operations    672 |    0.91 | *                           
 US AIRWAYS              | Airport Operations          656 |    0.89 | *                           
 ALASKA AIRLINES         | Airport Operations          599 |    0.82 | *                           
 DELTA AIR LINES         | Tower                       572 |    0.78 | *                           
 AMERICAN EAGLE AIRLINES | Pilot                       535 |    0.73 | *                           
 BUSINESS                |                             509 |    0.69 | *                           
faa-wildlife-strikes_OPERATOR-PERSON_freq|                                        g_       879 bins 

With just a few more keystrokes, we’ve learned something else: Virtually all wildlife strikes with “unknown” operators were identified based on carcasses found at the airport (rather than reports from pilots, for example).

Adding “aggregators”

By default, frequency tables just count the number of times each value appears. But in VisiData, you can specify additional calculations by setting the column’s “aggregators”. (You might remember this field from the Columns Sheet.) In VisiData, there are eight main aggregators:

  • min
  • max
  • avg / mean
  • median
  • sum
  • distinct
  • count
  • keymax

To add an aggregator to a column, navigate to that column and press +. VisiData will then prompt you to specify which aggregator you would like to add. You can repeat the process to add as many aggregators as you’d like to any column. Make sure, though, that you’ve assigned the proper type (# for integer columns, etc.) to those columns, so that they’re calculated properly.

Once you’ve set your aggregators, those calculations will appear in your frequency tables.

For example, let’s go back to the original FAA data sheet. Let’s navigate to the “COST_REPAIRS” column, and then do the following:

  • Press # to tell VisiData this is an integer column
  • Press + to tell VisiData you want to add an aggregator
  • Type sum and then hit Enter to add the summation aggregator
  • Navigate to the “AIRPORT” column, and press Shift-F

You should see something like this:

 AIRPORT             count  #| sum_COST_REPAIRS  #                                                
 UNKNOWN            ‖    8424 |           17931422 ‖                                                
 DENVER INTL AIRPORT    2756 |            9033148                                                 
 DALLAS/FORT WORTH …    2392 |             927920                                                 
 CHICAGO O'HARE INT…    1583 |            1146738                                                 
 JOHN F KENNEDY INTL    1302 |             310917                                                 
 MEMPHIS INTL           1217 |            4507487                                                 
 SALT LAKE CITY INTL    1179 |            6031324                                                 
 PHILADELPHIA INTL      1131 |            2213030                                                 
 ORLANDO INTL           1026 |              71500                                                 
 SACRAMENTO INTL        1021 |            3634816                                                 
 LA GUARDIA ARPT         974 |             381504                                                 
 CHARLOTTE/DOUGLAS …     960 |              86960                                                 
 NEWARK LIBERTY INT…     917 |             506489                                                 
 LOUISVILLE INTL AR…     841 |            1366299                                                 
 AUSTIN-BERGSTROM I…     817 |             545500                                                 
 LOUIE ARMSTRONG NE…     809 |             406000                                                 
 KANSAS CITY INTL        807 |              75660                                                 
 HARTSFIELD - JACKS…     775 |             284232                                                 
 GEORGE BUSH INTERC…     746 |              52000                                                 
 DETROIT METRO WAYN…     734 |              48983                                                 
 BALTIMORE/WASH INT…     691 |             339300                                                 
 GENERAL EDWARD LAW…     686 |            7000577                                                 
faa-wildlife-strikes_AIRPORT_freq|                                                 F      1512 bins 

By default, frequency tables are sorted by the “count” column, but you can sort them by any other column.

One-off calculations

From any data sheet, you can also run a single calculation on all rows — or all selected rows — in a column. To do that, navigate to the column and type z+, which will bring up the same aggregator-choice prompt as above. Type the aggregator you want, and press Enter. At the bottom of the screen, you’ll see the result of the calculation.

You can try this on the FAA data sheet we’ve been working with. Navigate to the “COST_REPAIRS” column, and then do the following:

  • Press # to tell VisiData this is an integer column (if you haven’t already)
  • Type z+ to see the result of an aggregator
  • Type sum and then hit Enter

At the bottom of the screen, you should see something like this, indicating that the total reported cost of repairs is $161,868,071:

 SPIRIT AIRLINES    | Tower     1              | PRECAUTIONARY LAND…| N        |               ?| 
 EXPRESSJET AIRLINES| Tower     1              | NONE               | N        |               ?| 
faa-wildlife-strikes| "sum" | 161868071                                           z+     73448 rows 

Summarizing all columns

To get a bird’s-eye view of your entire dataset, press Shift-I, which will provide summary statistics for each of your columns:

 column             | errors  #| nulls  #| distinct  #| mode   | min   | max   | median   | mean  > 
 OPERATOR           |      [0] |     [0] |        282 | UNKNOWN|      |      |         |        PERSON             |      [0] |     [0] |          7 | Carcas…|      |      |         |        ATYPE              |      [0] |     [0] |        400 | UNKNOWN|      |      |         |        INCIDENT_DATE      |      [0] |     [0] |       2343 | 10/30/…|      |      |         |        STATE              |      [0] |     [0] |         63 |        |      |      |         |        AIRPORT            |      [0] |     [0] |       1512 | UNKNOWN|      |      |         |        PHASE_OF_FLT       |      [0] |     [0] |         21 |        |      |      |         |        HEIGHT             |      [0] |     [0] |        319 |        |      |      |         |        SPEED              |      [0] |     [0] |        225 |        |      |      |         |        SPECIES            |      [0] |     [0] |        641 | Unknow…|      |      |         |        BIRDS_STRUCK       |      [0] |     [0] |          5 | 1      |      |      |         |        EFFECT             |      [0] |     [0] |          8 | None   |      |      |         |        DAMAGE             |      [0] |     [0] |          6 | N      |      |      |         |        COST_REPAIRS       |  [72145] |     [0] |        478 | 5000  #| 10   #| 6500…#| 13468   #| 124227… REMAINS_COLLECTED  |      [0] |     [0] |          2 | 1      |      |      |         |        REMARKS            |      [0] |     [0] |      61240 |        |      |      |         |                                                                                                           
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
faa-wildlife-strikes_describe|                                                  I        16 columns 

In VisiData, this is called the “Describe Sheet”. You’ll notice that there are only min/max/median/etc. calculations for the columns we’ve given types — just the COST_REPAIRS column so far. If we go back to the data sheet and tell VisiData that the HEIGHT, SPEED, and BIRDS_STRUCK fields are numbers, too, then pressing Shift-I will result in something like this:

 column             | errors  #| nulls  #| distinct  #| mode   | min   | max   | median   | mean  > 
 OPERATOR           |      [0] |     [0] |        282 | UNKNOWN|      |      |         |        PERSON             |      [0] |     [0] |          7 | Carcas…|      |      |         |        ATYPE              |      [0] |     [0] |        400 | UNKNOWN|      |      |         |        INCIDENT_DATE      |      [0] |     [0] |       2343 | 10/30/…|      |      |         |        STATE              |      [0] |     [0] |         63 |        |      |      |         |        AIRPORT            |      [0] |     [0] |       1512 | UNKNOWN|      |      |         |        PHASE_OF_FLT       |      [0] |     [0] |         21 |        |      |      |         |        HEIGHT             |  [35629] |     [0] |        318 | 0     #| 0    #| 31300#| 50      #| 897.37  SPEED              |  [48384] |     [0] |        224 | 140   #| 0    #| 374  #| 140     #| 144.21  SPECIES            |      [0] |     [0] |        641 | Unknow…|      |      |         |        BIRDS_STRUCK       |   [7966] |     [0] |          1 | 1     #| 1    #| 1    #| 1       #|   1.00  EFFECT             |      [0] |     [0] |          8 | None   |      |      |         |        DAMAGE             |      [0] |     [0] |          6 | N      |      |      |         |        COST_REPAIRS       |  [72145] |     [0] |        478 | 5000  #| 10   #| 6500…#| 13468   #| 124227… REMAINS_COLLECTED  |      [0] |     [0] |          2 | 1      |      |      |         |        REMARKS            |      [0] |     [0] |      61240 |        |      |      |         |                                                                                                           
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
faa-wildlife-strikes_describe|                                                  I        16 columns