An Introduction To Data Science On The Linux Command Line

submited by
Style Pass
2021-07-24 02:30:05

     This article will provide the reader with a brief overview for a number of different Linux commands.  A special emphasis will be placed on explaining how each command can be used in the context of performing data science tasks.  The goal will be to convince the reader that each of these commands can be extremely useful, and to allow them to understand what role each command can play when manipulating or analyzing data.

     Many readers are likely already familiar with the '|' symbol, but if not, it's worth pointing it out in advance:  All of the inputs and outputs for the commands discussed in the next few sections can be automatically 'piped' into one another using the '|' symbol!  This means that all of the specialized tasks done by each command can be chained together to make extremely powerful and short mini programs, all directly on the command line!

     What is grep?  'grep' is a tool that can be used to extract matching text from files.  You can specify a number of different control flags and options that allow you to be very selective in determining what subset of text you'd like to have extracted from a file or stream.  Grep is generally used as a 'line-oriented' tool, which means that when matching text is found, grep will print all of the text on that line, although you can use the '-o' flag to only print the matched part of the line.

Leave a Comment