Sed & Awk

Dale Dougherty, Arnold Robbins

Mentioned 8

Explains the progression in Unix from grep to sed and awk, describes how to write sed scripts, covers common programming constructs, and details awk's built-in functions

More on Amazon.com

Mentioned in questions and answers.

I have a big HTML file that has lots of markup that looks like this:

<p class="MsoNormal" style="margin: 0in 0in 0pt;">
  <span style="font-size: small; font-family: Times New Roman;">stuff here</span>
</p>

I'm trying to do a Vim search-and-replace to get rid of all class="" and style="" but I'm having trouble making the match ungreedy.

My first attempt was this

%s/style=".*?"//g

but Vim doesn't seem to like the ?. Unfortunately removing the ? makes the match too greedy.

How can I make my match ungreedy?

G'day,

Vim's regexp processing is not too brilliant. I've found that the regexp syntax for sed is about the right match for vim's capabilities.

I usually set the search highlighting on (:set hlsearch) and then play with the regexp after entering a slash to enter search mode.

Edit: Mark, that trick to minimise greedy matching is also covered in Dale Dougherty's excellent book "Sed & Awk" (sanitised Amazon link).

Chapter Three "Understanding Regular Expression Syntax" is an excellent intro to the more primitive regexp capabilities involved with sed and awk. Only a short read and highly recommended.

HTH

cheers,

Are there any good books for a relatively new but not totally new *nix user to get a bit more in depth knowledge (so no "Linux for dummies")? For the most part, I'm not looking for something to read through from start to finish. Rather, I'd rather have something that I can pick up and read in chunks when I need to know how to do something or whenever I have one of those "how do I do that again?" moments. Some areas that I'd like to see are:

  • command line administration
  • bash scripting
  • programming (although I'd like something that isn't just relevant for C programmers)

I'd like this to be as platform-independent as possible (meaning it has info that's relevant for any linux distro as well as BSD, Solaris, OS X, etc), but the unix systems that I use the most are OS X and Debian/Ubuntu. So if I would benefit the most from having a more platform-dependent book, those are the platforms to target.

If I can get all this in one book, great, but I'd rather have a bit more in-depth material than coverage of everything. So if there are any books that cover just one of these areas, post it. Hell, post it even if it's not relevant to any of those areas and you think it's something that a person in my position should know about.

I've wiki'd this post - could those with sufficient rep add in items to it.

System administration, general usage books

Programming:

Specific tools (e.g. Sendmail)

Various of the books from O'Reilly and other publishers cover specific topics. Some of the key ones are:

Some of these books have been in print for quite a while and are still relevant. Consequently they are also often available secondhand at much less than list price. Amazon marketplace is a good place to look for such items. It's quite a good way to do a shotgun approach to topics like this for not much money.

As an example, in New Zealand technical books are usurously expensive due to a weak kiwi peso (as the $NZ is affectionately known in expat circles) and a tortuously long supply chain. You could spend 20% of a week's after-tax pay for a starting graduate on a single book. When I was living there just out of university I used this type of market a lot, often buying books for 1/4 of their list price - including the cost of shipping to New Zealand. If you're not living in a location with tier-1 incomes I recommend this.

E-Books and on-line resources (thanks to israkir for reminding me):

  • The Linux Documentation project (www.tldp.org), has many specific topic guides known as HowTos that also often concern third party OSS tools and will be relevant to other Unix variants. It also has a series of FAQ's and guides.

  • Unix Guru's Universe is a collection of unix resources with a somewhat more old-school flavour.

  • Google. There are many, many unix and linux resources on the web. Search strings like unix commands or learn unix will turn up any amount of online resources.

  • Safari. This is a subscription service, but you can search the texts of quite a large number of books. I can recommend this as I've used it. They also do site licences for corporate customers.

Some of the philosophy of Unix:

I recommend the Armadillo book from O'Reilly for command line administration and shell scripting.

alt text

Jason,

Unix Programming Environment by Kernighan and Pike will give you solid foundations on all things Unix and should cover most of your questions regarding shell command line scripting etc.

The Armadillo book by O'Reilly will add the administration angle. It has served me well!

Good luck!

The aforementioned Unix Power Tools is a must. Other classics are sed&awk and Mastering Regular Expressions. I also like some books from the O'Reilly "Cookbook" series:

Possible Duplicate:
What are the differences between Perl, Python, AWK and sed?
What is the difference between sed and awk?

Maybe not a very specific question, but I am confusing aboutthe differences among grep, awk and sed in terms of their role in unix/linux system admin and text processing.

Short definition:

grep: search for specific terms in a file

#usage
$ grep This file.txt
Every line containing "This"
Every line containing "This"
Every line containing "This"
Every line containing "This"

$ cat file.txt
Every line containing "This"
Every line containing "This"
Every line containing "That"
Every line containing "This"
Every line containing "This"

Now awk and sed are completly different than grep. awk and sed are text processors. Not only do they have the ability to find what you are looking for in text, they have the ability to remove, add and modify the text as well (and much more).

awk is mostly used for data extraction and reporting. sed is a stream editor
Each one of them has its own functionality and specialties.

Example
Sed

$ sed -i 's/cat/dog/' file.txt
# this will replace any occurrence of the characters 'cat' by 'dog'

Awk

$ awk '{print $2}' file.txt
# this will print the second column of file.txt

Basic awk usage:
Compute sum/average/max/min/etc. what ever you may need.

$ cat file.txt
A 10
B 20
C 60
$ awk 'BEGIN {sum=0; count=0; OFS="\t"} {sum+=$2; count++} END {print "Average:", sum/count}' file.txt
Average:    30

I recommend that you read this book: Sed & Awk: 2nd Ed.

It will help you become a proficient sed/awk user on any unix-like environment.

I used sed for a batch ptovess where I could not do it with awk. Vould awk have done it? Or is it more a matter of choice and call awk and sed equivalent for the usage. They both do the common search replace similar with i/o. Is there a good example what can't be done with one that the other can?

G'day,

Awk is more powerful. sed tends to be more limited in what it can do.

Sed is good for line-based changes to data. It has some simple looping constructs, the usual ed/ex/vi regexp stuff and substitution things, compound statements, decisions etc. Most people use it for modifying piped data.

Awk is good for filtering or rearranging data. It mostly gets used for reporting.

I'd suggest having a look at Dale Dougherty's excellent book "sed & awk" (sanitised Amazon link). BTW It's got one of the best explanations of regexps in there as well.

Many people would say use Perl anyway! (-:

Edit: Forgot to say that the awk language is quite C like which is no surprise given that the 'k' in awk stands for Brian Kernighan. Yes. That Brian Kernighan!

Also, sed only works on data streams whereas awk works on both data streams and files.

HTH

cheers,Sed only works on data streams whereas awk works on both data streams and files.

Is there a simple and lightweight program to search over a text file and replace a string with regex?

sed or awk. I recommend the book sed&awk to master the subject or the booklet sed&awk pocket reference for a quick reference. Of course mastering regular expressions is a must...

All of my codes fail. They should print "he", "hello" and "5 \n 3", respectively:

awk -v e='he' {print $e}       // not working, why?
awk NF { print hello }
awk { print ARGV[5,3] }

Are there some simple examples about AWK?

I use this site. The O'Reilly Sed & Awk book is good as well.

I'm trying to parse this HTML block:

<div class="v120WrapperInner"><a href="/redirect?q=http%3A%2F%2Fwww.google.com%2Faclk%3Fsa%3DL%26ai%3DCKJh--O7tSsCVIKeyoQTwiYmRA5SnrIsB1szYhg2d2J_EAhABIJ7rxQ4oA1CLk676B2DJntmGyKOQGcgBAaoEFk_Qyu5ipY7edN5ETLuchKUCHbY4SA#0%26num%3D1%26sig%3DAGiWqtwtAf8NslosN7AuHb7qC7RviHVg7A%26q%3Dhttp%3A%2F%2Fwww.youtube.com%2Fwatch%253Fv%253D91sYT_8CN8Q%2526feature%253Dpyv%2526ad%253D3409309746%2526kw%253Dsusan%25252#0boyle&amp;adtype=pyv&amp;event=ad&amp;usg=bR7ErKA_3szWtQMGe2lt1dpxzHc=" title="The Valley Downs Chicago"><img class="vimg120" alt="The Valley Downs Chicago" src="http://i2.ytimg.com/vi/91sYT_8CN8Q/1.jpg">

to capture the redirect link:

/redirect?q=http%3A%2F%2Fwww.google.com%2Faclk%3Fsa%3DL%26ai%3DCKJh--O7tSsCVIKeyoQTwiYmRA5SnrIsB1szYhg2d2J_EAhABIJ7rxQ4oA1CLk676B2DJntmGyKOQGcgBAaoEFk_Qyu5ipY7edN5ETLuchKUCHbY4SA#0%26num%3D1%26sig%3DAGiWqtwtAf8NslosN7AuHb7qC7RviHVg7A%26q%3Dhttp%3A%2F%2Fwww.youtube.com%2Fwatch%253Fv%253D91sYT_8CN8Q%2526feature%253Dpyv%2526ad%253D3409309746%2526kw%253Dsusan%25252#0boyle&amp;adtype=pyv&amp;event=ad&amp;usg=bR7ErKA_3szWtQMGe2lt1dpxzHc=

and video title:

The Valley Downs Chicago

When I use this simple Perl code:

foreach $_ (@promotedVideos)
{
   if (/\s<div class="v120WrapperInner"><a href="([^"]*)" title="([^"]*)"><img/six)
   {
     print $1;
     print $2;
   }
}

nothing prints. While I'm troubleshooting this, I thought I'd ask you the experts if you see anything wrong or problematic. Thanks so much in advance for your help!

G'day,

If you're having problems understanding regexp's can I suggest having a read of the regexp intro in Dale Dougherty's excellent book "sed & awk" (sanitised Amazon link).

Definitely one of the best intro's to regexp's around.

HTH

cheers,

In a series of similar questions, what is the best AWK reference you've ever seen?

If there isn't really one (I've yet to find the grail), perhaps we could compile one in a separate question.

Whenever I needed to know something about AWK I always went to my copy of UNIX in a Nutshell. Now I see that there is a SED/AWK in a Nutshell book and pocket guide. The former was enough for what I needed. Haven't used the latter two.