Summarizing Data

VisiData makes it pleasantly easy to summarize your data. There are three main types of summaries: Frequency tables, one-off calculations, and the Describe Sheet.

Frequency tables

Frequency tables are dead-simple, but also quite powerful. For the dead-simple usage: Navigate to any column, and then press Shift-F. If you did that on the first column (“OPERATOR”) of the FAA dataset, you should get something like this:

 OPERATOR            count  #| percent  %| histogram                                         ~    
 UNKNOWN            ‖   23076 |     31.42 | ************************************************** ‖    
 SOUTHWEST AIRLINES ‖    7752 |     10.55 | ****************                                       
 BUSINESS           ‖    5868 |      7.99 | ************                                           
 AMERICAN AIRLINES  ‖    4337 |      5.90 | *********                                              
 DELTA AIR LINES    ‖    2817 |      3.84 | ******                                                 
 FEDEX EXPRESS      ‖    2709 |      3.69 | *****                                                  
 UNITED AIRLINES    ‖    2194 |      2.99 | ****                                                   
 US AIRWAYS         ‖    1885 |      2.57 | ****                                                   
 UPS AIRLINES       ‖    1773 |      2.41 | ***                                              
 SKYWEST AIRLINES   ‖    1769 |      2.41 | ***                                                    
 JETBLUE AIRWAYS    ‖    1740 |      2.37 | ***                                                    
 EXPRESSJET AIRLINES‖    1347 |      1.83 | **                                                     
 AMERICAN EAGLE AIR…‖    1041 |      1.42 | **                                                     
 ENVOY AIR          ‖     883 |      1.20 | *                                                      
 ALASKA AIRLINES    ‖     835 |      1.14 | *                                                      
 REPUBLIC AIRLINES  ‖     804 |      1.09 | *                                                
 MESA AIRLINES      ‖     693 |      0.94 | *                                                      
 AIR WISCONSIN AIRL…‖     623 |      0.85 | *                                                      
 PSA AIRLINES       ‖     577 |      0.79 | *                                                      
 PRIVATELY OWNED    ‖     516 |      0.70 | *                                                      
 PHI INC            ‖     491 |      0.67 | *                                                      
 SHUTTLE AMERICA    ‖     467 |      0.64 | *                                                      
faa-wildlife-strikes_OPERATOR_freq|                                               F       282 bins  

With just one keystroke, VisiData has already told us something useful about the dataset: That the “operators” associated with 31% of the wildlife strikes are, according to the FAA, “unknown.” We have also learned, from the “bins” mini-report at the bottom-right of the screen, that there are 282 distinct values in the “OPERATOR” column.

Multi-column frequencies

Sometimes you want to count how often combinations of columns occur. VisiData also makes this easy. First, turn the columns you want to count into “key” columns, using the ! button. Then, type gF.

For instance, if we wanted to count the combinations of the “OPERATOR” and “PERSON” fields, we’d hit ! on each of those columns — either from the main data sheet or in the Columns Sheet. Once you’ve done that, you should see something like this:

 OPERATOR           | PERSON   ‖<BIRDS_STRUCK   | EFFECT             | DAMAGE   | COST_REPAIRS   |> 
 BUSINESS           | Tower    ‖ 1              | NONE               | M        |                |… 
 BUSINESS           | Tower    ‖ 1              | NONE               | M        |                |… 
 BUSINESS           | Tower    ‖ 1              | NONE               | M        |                |… 
 DELTA AIR LINES    | Tower    ‖ 1              | NONE               | M        |                |… 
 BUSINESS           | Tower    ‖ 1              |                    | M        |                |… 
 DELTA AIR LINES    | Tower    ‖ 1              | Other              | M?       |                |… 
 DELTA AIR LINES    | Tower    ‖ 1              | NONE               | M?       |                |… 
 BUSINESS           | Tower    ‖ 1              | PRECAUTIONARY LAND…| M?       |                |… 
 ALLEGIANT AIR      | Tower    ‖ 1              | NONE               | M?       |                |… 
 TRANS STATES AIRLI…| Tower    ‖ 1              | NONE               | M?       |                |… 
 BUSINESS           | Tower    ‖ 1              | NONE               | M?       |                |… 
 GOVERNMENT         | Tower    ‖ 1              | NONE               | M?       |                |… 
 AMERICAN AIRLINES  | Tower    ‖ 1              | NONE               | N        |                |… 
 EXPRESSJET AIRLINES| Tower    ‖ 1              | NONE               | N        |                |… 
 MESA AIRLINES      | Tower    ‖ 1              | NONE               | N        |                |… 
 BUSINESS           | Tower    ‖ 1              | PRECAUTIONARY LAND…| N        |                |… 
 DELTA AIR LINES    | Tower    ‖ 1              | NONE               | N        |                |… 
 DELTA AIR LINES    | Tower    ‖ 1              | NONE               | N        |                |… 
 LUFTHANSA          | Tower    ‖ 1              | NONE               | N        |                |… 
 BUSINESS           | Tower    ‖ 1              | NONE               | N        |                |… 
 SPIRIT AIRLINES    | Tower    ‖ 1              | PRECAUTIONARY LAND…| N        |                |… 
 EXPRESSJET AIRLINES| Tower    ‖ 1              | NONE               | N        |                |… 
faa-wildlife-strikes|                                                             !     73448 rows  

Then, type gF, which should result in something like this:

 OPERATOR           | PERSON    count  #| percent  %| histogram                                    
 UNKNOWN            | Carcass …‖   22842 |     31.10 | *******************************************… 
 SOUTHWEST AIRLINES | Pilot    ‖    5481 |      7.46 | ***********                                  
 AMERICAN AIRLINES  | Pilot    ‖    2240 |      3.05 | ****                                         
 BUSINESS           | Pilot    ‖    2061 |      2.81 | ****                                         
 DELTA AIR LINES    | Airport …‖    1793 |      2.44 | ***                                          
 BUSINESS           | Airport …‖    1642 |      2.24 | ***                                          
 AMERICAN AIRLINES  | Airport …‖    1481 |      2.02 | ***                                          
 FEDEX EXPRESS      | Air Tran…‖    1434 |      1.95 | ***                                          
 UNITED AIRLINES    | Airport …‖    1351 |      1.84 | **                                           
 UPS AIRLINES       | Air Tran…‖    1343 |      1.83 | **                                           
 SOUTHWEST AIRLINES | Airport …‖    1293 |      1.76 | **                                           
 BUSINESS           | Tower    ‖    1094 |      1.49 | **                                           
 SKYWEST AIRLINES   | Airport …‖    1024 |      1.39 | **                                           
 FEDEX EXPRESS      | Pilot    ‖     963 |      1.31 | **                                           
 EXPRESSJET AIRLINES| Airport …‖     785 |      1.07 | *                                            
 JETBLUE AIRWAYS    | Airport …‖     749 |      1.02 | *                                            
 US AIRWAYS         | Air Tran…‖     672 |      0.91 | *                                            
 US AIRWAYS         | Airport …‖     656 |      0.89 | *                                            
 ALASKA AIRLINES    | Airport …‖     599 |      0.82 | *                                            
 DELTA AIR LINES    | Tower    ‖     572 |      0.78 | *                                            
 AMERICAN EAGLE AIR…| Pilot    ‖     535 |      0.73 | *                                            
 BUSINESS           |          ‖     509 |      0.69 | *                                            
faa-wildlife-strikes_OPERATOR-PERSON_freq|                                       gF       879 bins  

To make it easier to read, type g_, which will expand the column widths:

 OPERATOR                | PERSON                    count#| percent%| histogram                   
 UNKNOWN                 | Carcass Found            ‖ 22842 |   31.10 | **************************… 
 SOUTHWEST AIRLINES      | Pilot                    ‖  5481 |    7.46 | ***********                 
 AMERICAN AIRLINES       | Pilot                    ‖  2240 |    3.05 | ****                        
 BUSINESS                | Pilot                    ‖  2061 |    2.81 | ****                        
 DELTA AIR LINES         | Airport Operations       ‖  1793 |    2.44 | ***                         
 BUSINESS                | Airport Operations       ‖  1642 |    2.24 | ***                         
 AMERICAN AIRLINES       | Airport Operations       ‖  1481 |    2.02 | ***                         
 FEDEX EXPRESS           | Air Transport Operations ‖  1434 |    1.95 | ***                         
 UNITED AIRLINES         | Airport Operations       ‖  1351 |    1.84 | **                          
 UPS AIRLINES            | Air Transport Operations ‖  1343 |    1.83 | **                          
 SOUTHWEST AIRLINES      | Airport Operations       ‖  1293 |    1.76 | **                          
 BUSINESS                | Tower                    ‖  1094 |    1.49 | **                          
 SKYWEST AIRLINES        | Airport Operations       ‖  1024 |    1.39 | **                          
 FEDEX EXPRESS           | Pilot                    ‖   963 |    1.31 | **                          
 EXPRESSJET AIRLINES     | Airport Operations       ‖   785 |    1.07 | *                           
 JETBLUE AIRWAYS         | Airport Operations       ‖   749 |    1.02 | *                           
 US AIRWAYS              | Air Transport Operations ‖   672 |    0.91 | *                           
 US AIRWAYS              | Airport Operations       ‖   656 |    0.89 | *                           
 ALASKA AIRLINES         | Airport Operations       ‖   599 |    0.82 | *                           
 DELTA AIR LINES         | Tower                    ‖   572 |    0.78 | *                           
 AMERICAN EAGLE AIRLINES | Pilot                    ‖   535 |    0.73 | *                           
 BUSINESS                |                          ‖   509 |    0.69 | *                           
faa-wildlife-strikes_OPERATOR-PERSON_freq|                                       g_       879 bins  

With just a few more keystrokes, we’ve learned something else: Virtually all wildlife strikes with “unknown” operators were identified based on carcasses found at the airport (rather than reports from pilots, for example).

Adding “aggregators”

By default, frequency tables just count the number of times each value appears. But in VisiData, you can specify additional calculations by setting the column’s “aggregators”. (You might remember this field from the Columns Sheet.) In VisiData, there are eight main aggregators:

  • min
  • max
  • avg / mean
  • median
  • q3/q4/q5/q10 (terciles/quartiles/quintiles/deciles)
  • sum
  • distinct
  • count
  • keymax

To add an aggregator to a column, navigate to that column and press +. VisiData will then prompt you to specify which aggregator you would like to add. You can repeat the process to add as many aggregators as you’d like to any column. Make sure, though, that you’ve assigned the proper type (# for integer columns, etc.) to those columns, so that they’re calculated properly.

Once you’ve set your aggregators, those calculations will appear in your frequency tables.

For example, let’s go back to the original FAA data sheet. Let’s navigate to the “COST_REPAIRS” column, and then do the following:

  • Press # to tell VisiData this is an integer column
  • Press + to tell VisiData you want to add an aggregator
  • Type sum and then hit Enter to add the summation aggregator
  • Navigate to the “AIRPORT” column, and press Shift-F

You should see something like this:

 AIRPORT             count  #| sum_COST_REPAIRS  #                                                
 UNKNOWN            ‖    8424 |           17931422 ‖                                                
 DENVER INTL AIRPORT‖    2756 |            9033148 ‖                                                
 DALLAS/FORT WORTH …‖    2392 |             927920 ‖                                                
 CHICAGO O'HARE INT…‖    1583 |            1146738 ‖                                                
 JOHN F KENNEDY INTL‖    1302 |             310917 ‖                                                
 MEMPHIS INTL       ‖    1217 |            4507487 ‖                                                
 SALT LAKE CITY INTL‖    1179 |            6031324 ‖                                                
 PHILADELPHIA INTL  ‖    1131 |            2213030 ‖                                                
 ORLANDO INTL       ‖    1026 |              71500 ‖                                                
 SACRAMENTO INTL    ‖    1021 |            3634816 ‖                                                
 LA GUARDIA ARPT    ‖     974 |             381504 ‖                                                
 CHARLOTTE/DOUGLAS …‖     960 |              86960 ‖                                                
 NEWARK LIBERTY INT…‖     917 |             506489 ‖                                                
 LOUISVILLE INTL AR…‖     841 |            1366299 ‖                                                
 AUSTIN-BERGSTROM I…‖     817 |             545500 ‖                                                
 LOUIE ARMSTRONG NE…‖     809 |             406000 ‖                                                
 KANSAS CITY INTL   ‖     807 |              75660 ‖                                                
 HARTSFIELD - JACKS…‖     775 |             284232 ‖                                                
 GEORGE BUSH INTERC…‖     746 |              52000 ‖                                                
 DETROIT METRO WAYN…‖     734 |              48983 ‖                                                
 BALTIMORE/WASH INT…‖     691 |             339300 ‖                                                
 GENERAL EDWARD LAW…‖     686 |            7000577 ‖                                                
faa-wildlife-strikes_AIRPORT_freq|                                                F      1512 bins  

By default, frequency tables are sorted by the “count” column, but you can sort them by any other column.

One-off calculations

From any data sheet, you can also run a single calculation on all rows — or all selected rows — in a column. To do that, navigate to the column and type z+, which will bring up the same aggregator-choice prompt as above. Type the aggregator you want, and press Enter. At the bottom of the screen, you’ll see the result of the calculation.

You can try this on the FAA data sheet we’ve been working with. Navigate to the “COST_REPAIRS” column, and then do the following:

  • Press # to tell VisiData this is an integer column (if you haven’t already)
  • Type z+ to see the result of an aggregator
  • Type sum and then hit Enter

At the bottom of the screen, you should see something like this, indicating that the total reported cost of repairs is $161,868,071:

 SPIRIT AIRLINES    | Tower    ‖ 1              | PRECAUTIONARY LAND…| N        |               !|… 
 EXPRESSJET AIRLINES| Tower    ‖ 1              | NONE               | N        |               !|… 
faa-wildlife-strikes| "sum" | 161868071                                          z+     73448 rows  

Summarizing all columns

To get a bird’s-eye view of your entire dataset, press Shift-I, which will provide summary statistics for each of your columns:

 column              errors  ♯| nulls  ♯| distinct  ♯| mode  ~| min  ~| max  ~| median  ~| mean  > 
 OPERATOR           ‖        0 |       0 |        282 | UNKNOWN|       |       |          |       ! ATYPE              ‖        0 |       0 |        400 | UNKNOWN|       |       |          |       !‖
 INCIDENT_DATE      ‖        0 |       0 |       2343 | 10/30/…|       |       |          |       !‖
 STATE              ‖        0 |       0 |         63 |        |       |       |          |       !‖
 AIRPORT            ‖        0 |       0 |       1512 | UNKNOWN|       |       |          |       !‖
 PHASE_OF_FLT       ‖        0 |       0 |         21 |        |       |       |          |       !‖
 HEIGHT             ‖        0 |       0 |        319 |        |       |       |          |       !‖
 SPEED              ‖        0 |       0 |        225 |        |       |       |          |       !‖
 SPECIES            ‖        0 |       0 |        641 | Unknow…|       |       |          |       !‖
 BIRDS_STRUCK       ‖        0 |       0 |          5 | 1      |       |       |          |       !‖
 EFFECT             ‖        0 |       0 |          8 | None   |       |       |          |       !‖
 DAMAGE             ‖        0 |       0 |          6 | N      |       |       |          |       ! COST_REPAIRS       ‖    72145 |       0 |        478 | 5000   | 10    | 65000…| 13468    | 124227…‖
 PERSON             ‖        0 |       0 |          7 | Carcas…|       |       |          |       ! REMAINS_COLLECTED  ‖        0 |       0 |          2 | 1      |       |       |          |       !‖
 REMARKS            ‖        0 |       0 |      61240 |        |       |       |          |       !                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
faa-wildlife-strikes_describe|                                                 I        16 columns  

In VisiData, this is called the “Describe Sheet”. You’ll notice that there are only min/max/median/etc. calculations for the columns we’ve given types — just the COST_REPAIRS column so far. If we go back to the data sheet and tell VisiData that the HEIGHT, SPEED, and BIRDS_STRUCK fields are numbers, too, then pressing Shift-I will result in something like this:

 column              errors  ♯| nulls  ♯| distinct  ♯| mode  ~| min  ~| max  ~| median  ~| mean  > 
 OPERATOR           ‖        0 |       0 |        282 | UNKNOWN|       |       |          |       ! ATYPE              ‖        0 |       0 |        400 | UNKNOWN|       |       |          |       !‖
 INCIDENT_DATE      ‖        0 |       0 |       2343 | 10/30/…|       |       |          |       !‖
 STATE              ‖        0 |       0 |         63 |        |       |       |          |       !‖
 AIRPORT            ‖        0 |       0 |       1512 | UNKNOWN|       |       |          |       !‖
 PHASE_OF_FLT       ‖        0 |       0 |         21 |        |       |       |          |       !‖
 HEIGHT             ‖    35629 |       0 |        318 | 0      | 0     | 31300 | 50       | 897.37 ‖
 SPEED              ‖    48384 |       0 |        224 | 140    | 0     | 374   | 140.0    | 144.21 ‖
 SPECIES            ‖        0 |       0 |        641 | Unknow…|       |       |          |       ! BIRDS_STRUCK       ‖     7966 |       0 |          1 | 1      | 1     | 1     | 1.0      |   1.00 ‖
 EFFECT             ‖        0 |       0 |          8 | None   |       |       |          |       !‖
 DAMAGE             ‖        0 |       0 |          6 | N      |       |       |          |       !‖
 COST_REPAIRS       ‖    72145 |       0 |        478 | 5000   | 10    | 65000…| 13468    | 124227…‖
 PERSON             ‖        0 |       0 |          7 | Carcas…|       |       |          |       ! REMAINS_COLLECTED  ‖        0 |       0 |          2 | 1      |       |       |          |       !‖
 REMARKS            ‖        0 |       0 |      61240 |        |       |       |          |       !                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
                                                                                                    
faa-wildlife-strikes_describe|                                                 I        16 columns