text processing | Eikonal Blog

2010.10.05

sed tricks

Filed under: scripting, transformers — Tags: sed, text processing, transformers — sandokan65 @ 15:58

These one-liners are collected from various sites and articles on web – see the list of Sources at the bottom of this posting.

Deleting all empty lines from the input file:
```
sed ‘/^$/d’ 
```
In-place replacement:
```
sed –i ‘/^$/d’ INPUTFILE
```
In-place replacement with backup of original file:
```
sed –ibak ‘/^$/d’ INPUTFILE
```
In-place deletion of all occurences of a string in a file:
```
sed –i ‘/WORDTOBEDELETED/d’
```
How to replace the first occurrence only (of a string match) in a file, using sed
```
sed '0,/THISSTRING/s//TOTHATSTRING/' INPUTFILE
```

Append environment variable PATH with sed:

sed -e '/^PATH/s/"$/:\/usr\/lib\/myprog\/bin"/g' -i /etc/environment

Remove all whitespace from beinning of lines:
```
sed 's/^[ \t]*//g' foo
```
Deleting the / from all html files contained in current folder:
```
sed -i ‘s/src=”\//src=”/g’ *.html
```
Greedy matching:
```
% echo "foobar" | sed 's///g'
bar
```

Non greedy matching:

% echo "foobar" | sed 's/]*>//g'
foobar

Sources:

How do you delete empty lines using ‘sed’? – http://ksearch.wordpress.com/2010/09/25/delete-empty-lines-using-sed/
How to replace the first occurrence only (of a string match) in a file, using sed – http://techteam.wordpress.com/2010/09/14/how-to-replace-the-first-occurrence-only-of-a-string-match-in-a-file-using-sed/
Append environment variable PATH with sed (2010.09.09) – http://kdguntu.wordpress.com/2010/09/09/append-path/
Unix tip #3: Introduction to Find, Grep, Sed – http://developmentality.wordpress.com/2010/09/07/unix-tip-3-introduction-to-find-grep-sed/
util: sed for text parsing – http://zosim26.wordpress.com/2010/09/01/util-sed-for-text-parsing/
To delete the / after the src From HTML Files – http://somepalli.wordpress.com/2010/08/05/to-delete-the-word-after-the-src-from-html-files/
“sed – non greedy matching” by Christoph Sieghart (2008.07.08) – http://0x2a.at/b/sed–non-greedy-matching

References

UNIX man pages : sed – http://unixhelp.ed.ac.uk/CGI/man-cgi?sed
sed – http://www.opengroup.org/onlinepubs/007908799/xcu/sed.html
“Sed – An Introduction and Tutorial” by Bruce Barnett – http://www.grymoire.com/Unix/Sed.html
man sed (1posix) – stream editor – http://pwet.fr/man/linux/commandes/posix/sed
Get better at awk/sed/grep – http://kristianrumberg.wordpress.com/2010/09/01/get-better-at-awksedgrep/

Related here: Command line based text replace – https://eikonal.wordpress.com/2010/07/13/command-line-based-text-replace/.

Related here: Scripting languages – https://eikonal.wordpress.com/2010/06/15/awk-sed/ | Unix tricks – https://eikonal.wordpress.com/2011/02/15/unix-tricks/ | SED tricks – https://eikonal.wordpress.com/2010/10/05/sed-tricks/ | Memory of things disappearing > nmap stuff > getports.awk – https://eikonal.wordpress.com/2010/06/23/memory-of-things-disappearing-nmap-stuff-getports-awk/ | AWK – https://eikonal.wordpress.com/2011/09/30/awk/

Comments (5)

2010.07.13

Command line based text replace

Filed under: perl, transformers, unix — Tags: awk, bash, command line replace, grep, Perl, regexp, regular expressions, sed, text processing, transformers, vi — sandokan65 @ 13:36

sed

sed 's/Mark Monre/Marc Monroe/' 1.txt > 2.txt

find ./* -type f -exec sed -i 's///g' {} \;

The “replace” command

Syntax:

replace OLD-STRING NEW-STRING OUTPUT-FILE

Example:
```
$ replace UNIX Linux  newfile
```
Example:
```
$ cat /etc/passwd | replace : '|'
```
Partial support for regular expressions: \^ – matches start of line, and $ matches end of line.
Example: replace all IP address 192.168.1.2 start of line:
```
$ replace \^192.168.1.2 192.168.5.10  newfile
```
a bash script, ‘fixer.sh’
#!/bin/bash replace CHANGEFROM CHANGETO $1.tmp rm $1 mv $1.tmp $1
now run this command line:
```
$ grep CHANGEFROM |cut -d':' -f1 |xargs -n 1 fixer.sh
```
the results is that all files in the directory (or whatever you grep for) will be changed automagically.
just make sure the grep doesn’t include the fixer script itself, or it will die half-way through changing when execute permissions are reset!

Perl

Perl Pie:

perl -p -i -e ’s/hello/goodbye/g’ textfile.txt

http://www.debian-administration.org/articles/298 has a fine article and discussion on Perl Pie.

perl -p -i -e ’s/|00000000.00|/||/g’ myfile.txt

Sources:

How do I replace text string in many files at once? – http://www.cyberciti.biz/tips/how-do-i-replace-text-string-in-many-files-at-once.html

Related: Regular expressions – https://eikonal.wordpress.com/2010/04/02/regular-expressions/ | Perl online – https://eikonal.wordpress.com/2010/02/15/perl-online/

Comments (4)

2010.04.02

Regular expressions

Filed under: regexp, scripting, unix — Tags: .Net, awk, bash, bookmarklets, grep, Java, JavaScript, JScript, Perl, PHP, regular expressions, Ruby, scripting, sed, text processing, transformers, VBScript — sandokan65 @ 14:52

Sites

Regular Expressions specification – http://pubs.opengroup.org/onlinepubs/007908799/xbd/re.html
at WikiPedia – http://en.wikipedia.org/wiki/Regular_expression
Regular Expressions at Open Directory – http://www.dmoz.org/Computers/Programming/Languages/Regular_Expressions/
Regular-Expressions.info site – http://www.regular-expressions.info/
a table of Regular Expressions rules – http://www.3gwt.net/demo/regexps.html
Regular expressions in Ruby – http://www.ruby-doc.org/docs/ProgrammingRuby/html/language.html#UJ; {Ruby}
Regular Expressions Library – http://regexlib.com/
JavaScript 5.6/VBScript Regular Expression Syntax – http://www.regextester.com/jssyntax.html; {VBScript}
PCRE Regular Expression Pattern Syntax Refference (PHP preg*) – http://www.regextester.com/pregsyntax.html‘ {PHP}
POSIX 1003.2 Regular Expression Pattern Syntax Refference (PHP ereg*) – http://www.regextester.com/eregsyntax.html; {PHP}
java.util.regex – http://download.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html; {Java}
A Tao of Regular Expressions – http://www.cs.colorado.edu/~schenkc/UNIX_Regular_Expressions.pdf
Steven Levithan’s blog “Flagrant Badassery” – A JavaScript and regular expression centric blog – http://blog.stevenlevithan.com/; {JavaScript}
“Regex Gury” blog by Jan Goyvaerts – http://www.regexguru.com/
Lesson: Regular Expressions (at Java Tuturials) – http://download.oracle.com/javase/tutorial/essential/regex/index.html; {Java}
PerlRE module – http://perldoc.perl.org/perlre.html; {Perl}
“Microsoft Beefs Up VBScript with Regular Expressions” – http://msdn.microsoft.com/en-us/library/ms974570.aspx; {VBScript}
“.NET Framework Regular Expressions” – http://msdn.microsoft.com/en-us/library/hs600312.aspx; {.Net}
Regular Expressions and Other Pattern Matching – http://billposer.org/Linguistics/Computation/Resources.html#patterns
“Structural Regular Expressions” by Rob Pike – http://doc.cat-v.org/bell_labs/structural_regexps/
“Regular Expressions” at MDN Dev Center – https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Regular_Expressions; {JavaScript}
“Regular Expression HOWTO” by A.M. Kuchling – http://www.amk.ca/python/howto/regex/

Tools

Standalone tools:

RegexBuddy [COMMERCIAL] by Jan Goyvaerts of Just Great Software (http://www.just-great-software.com/): http://www.regexbuddy.com/ | Trial – http://www.regexbuddy.com/RegexBuddyCookbook.exe
PowerGREP [COMMERCIAL] by Just Great Software (http://www.just-great-software.com/): http://www.powergrep.com/ | Demo/trial – http://download.jgsoft.com/powergrep/SetupPowerGREPDemo.exe
RegexMagic [COMMERCIAL] by Just Great Software (http://www.just-great-software.com/): http://www.regexmagic.com/ | Demo/trial – http://download.jgsoft.com/magic/SetupRegexMagicDemo.exe
JavaScript Regex Syntax Highlighter by Steven Levithan – http://stevenlevithan.com/regex/syntaxhighlighter/
XRegexp, a JavaScript library by Steven Levithan – http://xregexp.com/
Expresso (at UltraPico) – http://www.ultrapico.com/Expresso.htm
Wingrep [COMMERCIAL] – http://wingrep.com/
RegexRenamer – http://regexrenamer.sourceforge.net/

Online testers:

RegexPal by Steven Levithan – http://regexpal.com/ — a JavaScript regular expression tester.
XRegExp by Steven Levithan – http://blog.stevenlevithan.com/archives/xregexp-1-0
Regular expression tool – http://regex.larsolavtorvik.com/

NRegex – http://www.nregex.com/nregex/default.aspx

NRegex bookmarklet – Drag this to your browser’s Bookmarks Toolbar

javascript:var%20r%20=%20window.prompt('Enter%20regular%20expression');
location.href='http://www.nregex.com/nregex/default.aspx?regex='%20+%20encodeURIComponent(r);

Rubular – a Ruby regular expression editor – http://www.rubular.com/
Regexp Editor – Java applet by Serbei Evdokimov – http://www.myregexp.com/ | same with a signed JAR – http://www.myregexp.com/signedJar.html | Source code (“Regex Util”) at Sourceforge – http://sourceforge.net/projects/regex-util/develop
“My Regex Tester” – http://www.myregextester.com/
REGex TESTER (ver. 1.5.3) by Venimus – http://www.regextester.com/
Regex Tester 2.0 alpha – http://www.regextester.com/index2.html
Regular Expression Test Page by FileFormat.Info – http://www.fileformat.info/tool/regex.htm
Oscar Steele’s reAnimator – http://osteele.com/tools/reanimator/
- Visualizing Regular Expressions – http://osteele.com/archives/2006/02/reanimator

Books

“Regular Expressions Cookbook” by Jan Goyvaerts & Steven Levithan (O’Reilly; 2009.05; ISBN-10: 9780596520687; ISBN-13: 978-0596520687) – http://www.amazon.com/dp/0596520689/
- Book page at O’Reilly – http://oreilly.com/catalog/9780596520694
- Book’s home page – http://www.regular-expressions-cookbook.com/
- “Regular Expressions for Regular Programmers” (book review; Coding Horror; 2009.06.08) – http://www.codinghorror.com/blog/2009/06/regular-expressions-for-regular-programmers.html
“Mastering Regular Expressions” by Jeffrey E.F. Friedl (O’Reilly; 2006.08.15; ISBN-10: 9780596528126; ISBN-13: 978-0596528126) – http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/ | home at O’Reilly – http://oreilly.com/catalog/9781565922570/
“Regular Expression Pocket Reference: Regular Expressions for Perl, Ruby, PHP, Python, C, Java and .NET (Pocket Reference)” by Tony Stubblebine (O’Reilly; 2007.07.25; ISBN-10: 9780596514273; ISBN-13: 978-0596514273) – http://www.amazon.com/Regular-Expression-Pocket-Reference-Expressions/dp/0596514271/
“Sams Teach Yourself Regular Expressions in 10 Minutes” by Ben Forta (Sams; 2004.03.05; ISBN-10: 0672325667; ISBN-13: 978-0672325663) – http://www.amazon.com/Teach-Yourself-Regular-Expressions-Minutes/dp/0672325667/

Tidbits

Sources: The above links.

[abc] – A single character: a, b or c
[^abc] – Any single character but a, b, or c
[a-z] – Any single character in the range a-z
[a-zA-Z] – Any single character in the range a-z or A-Z
^ – Start of line
$ – End of line
\A – Start of string
\z – End of string
. – Any single character
\s – Any whitespace character
\S – Any non-whitespace character
\d – Any digit
\D – Any non-digit
\w – Any word character (letter, number, underscore)
\W – Any non-word character
\b – Any word boundary character
(…) – Capture everything enclosed
(a|b) – a or b
a? – Zero or one of a
a* – Zero or more of a
a+ – One or more of a
a{3} – Exactly 3 of a
a{3,} – 3 or more of a
a{3,6} – Between 3 and 6 of a
^\s[ \t]*$ – Match a blank line
\d{2}-\d{5} – Validate an ID number consisting of 2 digits, a hyphen, and another 5 digits

Special common strings:

Personal Name: ^[\w\.\’]{2,}([\s][\w\.\’]{2,})+$
Username: ^[\w\d\_\.]{4,}$
Password at least 6 symbols: ^.{6,}$
Password or empty input: ^.{6,}$|^$
email: ^[\_]*([a-z0-9]+(\.|\_*)?)+@([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}$
Email address: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b[A-z0-9_.%+-]+@[A-z0-9_.%+-]+\.[A-z]{2,4}
US phone: \W?\d{3}\W?\d{3}\W?\d{4}
US Phone number: ^\+?[\d\s]{3,}$
US Phone with code: ^\+?[\d\s]+\(?[\d\s]{10,}$
URL: \W?\d{3}\W?\d{3}\W?\d{4}\b\w+://(\w|-|\.|/)+(/|\b)
US Social Security Number (SSN): \d{3}-\d{2}-\d{4}
US ZIP: \d{5}(-\d{4})?
IP (v4) address: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
IP (v4) address: \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
IP (v4) address: ^(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]){3}$
IP (v4) address: \b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
IP (v4) address: \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
IP (v6) address:
MAC address: ^([0-9a-fA-F][0-9a-fA-F]:){5}([0-9a-fA-F][0-9a-fA-F])$
Positive Integers: ^\d+$
Negative Integers: ^-\d+$
Integer: ^-{0,1}\d+$
Positive Number: ^\d*\.{0,1}\d+$
Negative Number: ^-\d*\.{0,1}\d+$
Positive Number or Negative Number: ^-{0,1}\d*\.{0,1}\d+$
Floating point number: [-+]?([0-9]*\.[0-9]+|[0-9]+)
Floating point number: [-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?
Roman number: ^(?i:(?=[MDCLXVI])((M{0,3})((C[DM])|(D?C{0,3}))?((X[LC])|(L?XX{0,2})|L)?((I[VX])|(V?(II{0,2}))|V)?))$
Domain Name: ^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}$
Domain Name: ^([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}$
Windows File Name: (?i)^(?!^(PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(\..+)?$)[^\\\./:\*\?\”\|][^\\/:\*\?\”\|]{0,254}$
Date in format yyyy-MM-dd: (19|20)\d\d([- /.])(0[1-9]|1[012])\2(0[1-9]|[12][0-9]|3[01])
Date (dd mm yyyy, d/m/yyyy, etc.): ^([1-9]|0[1-9]|[12][0-9]|3[01])\D([1-9]|0[1-9]|1[012])\D(19[0-9][0-9]|20[0-9][0-9])$
Year 1900-2099: ^(19|20)[\d]{2,2}$

Related (here at this blog):
Command line based text replace – https://eikonal.wordpress.com/2010/07/13/command-line-based-text-replace/ |
Perl online – https://eikonal.wordpress.com/2010/02/15/perl-online/

Comments (1)

2010.02.15

Perl online

Filed under: perl, programming languages — Tags: Linux Magazine, Performance COmputing Magazine, Perl, perl hash, Perl Journal, programming language, Randal Schwartz, regexp, regular expressions, scripting, SysAdmin Magazine, text processing, transformers, Unix Review Magazine, Web Techniques Magazine — sandokan65 @ 14:51

Perldoc – http://perldoc.perl.org/
Perl FAQ – http://faq.perl.org/
Perl Monks – http://www.perlmonks.org/
Randal Schwartz’s collections of his columns: http://www.stonehenge.com/merlyn/columns.html
- The Linux Magazine Perl columns – http://www.stonehenge.com/merlyn/LinuxMag/
- The Perl Journal Perl columns – http://www.stonehenge.com/merlyn/PerlJournal/
- The SysAdmin/PerformanceComputing/UnixReview Perl columns – http://www.stonehenge.com/merlyn/UnixReview/
- The WebTechniques Perl columns – http://www.stonehenge.com/merlyn/WebTechniques/
“Higher-Order Perl” book online: http://hop.perl.plover.com/book/
About.com> Perl – http://perl.about.com/
Perl Circus – http://www.perlcircus.org/
Accumulated Perl Tips and Tricks – http://perl-tricks.blogspot.com/
Perl 6 Tricks and Treats – http://szabgab.com/perl6_tricks_and_treats.html
The top 10 tricks of Perl one-liners – http://blog.ksplice.com/2010/05/top-10-perl-one-liner-tricks/
Perl tips – http://www.edlin.org/perl/
Perl tricks by NEIL KANDALGAONKAR – http://montreal.pm.org/tech/neil_kandalgaonkar.shtml
20 Perl Tips And Tricks – http://www.programmersheaven.com/2/20-PERL-Tips
PerlMeme – http://perlmeme.org/ | HowTos – http://perlmeme.org/howtos/
Alex Batko’s pages on Perl – http://www.cs.mcgill.ca/~abatko/computers/programming/perl/
Perl core development – http://dev.perl.org/
Bytes.com’s Perl answers – http://bytes.com/topic/perl/answers/
Perl tips at DevDaily – http://www.devdaily.com/perl/
Perl Regular Expressions by Example – http://www.somacon.com/p127.php

Hashes

“Perl Hash Howto” by Alex Batko – http://www.cs.mcgill.ca/~abatko/computers/programming/perl/howto/hash/
PERL Hashes – http://www.tizag.com/perlT/perlhashes.php
Perl hash print – how to print the elements of a hash in Perl – http://www.devdaily.com/blog/post/perl/how-print-values-items-elements-perl-hash
PERL Hash Variable – http://www.tutorialspoint.com/perl/perl_hashes.htm

Files

“How to Read and Write Files in Perl” (About.com > Perl) – http://perl.about.com/od/perltutorials/a/readwritefiles.htm

Chomp()

“Using the Perl chomp() function” – http://perlmeme.org/howtos/perlfunc/chomp_function.html
chomp() changes [in Perl 6] – http://dev.perl.org/perl6/rfc/58.html
http://perldoc.perl.org/functions/chomp.html
“Removing new line character from a string” – http://bytes.com/topic/perl/answers/728645-removing-new-line-character-string

Control structures

“For Loop – Beginning Perl Tutorial, Control Structures” – http://perl.about.com/od/perltutorials/a/forloop_2.htm

Tidbits

Rename files

Alex Batko says (at http://www.cs.mcgill.ca/~abatko/computers/programming/perl/):

Here is a brilliant program for renaming one or more files according to a specified Perl expression. I found it on page 706 of Programming Perl (3rd edition).

#!/usr/bin/perl
$op = shift;
for( @ARGV ) {
    $was = $_;
    eval $op;
    die if $@;
    rename( $was, $_ ) unless $was eq $_;
}

In the code above, the second last line calls the built-in function “rename”, not the program itself (which is named “rename.pl”). Below are a few examples of use.

% rename.pl 's/\.htm/\.html/' *.htm         # append an 'l'
% rename.pl '$_ .= ".old"' *.html           # append '.old'
% rename.pl 'tr/A-Z/a-z/' *.HTML            # lowercase
% rename.pl 'y/A-Z/a-z/ unless /^Make/' *   # lowercase

Printing hashes

Starting with an input file with data in two columns separated by coma (,):

#/bin/perl -t

my %TempHash = ();
my $InputFile = shift;
print "Input file = ",$InputFile,"\n";

my ($line,$column1,$column2,);

#reading input file to generate hash
open (INPUTSTREAM, '<',  $InputFile) || die ("Could not open $InputFile");
while ( $line =  ) {
	chomp;
        #print $line;
	($column1, $column2) = split ',', $line;
        $TempHash{$column1}=$column2;
        #print $column1," ==> ",$TempHash{$column1};
}
close (INPUTSTREAM);

## printing hash - way #1
print "The following are in the DB: ",join(', ',values %TempHash),"\n";

## printing hash - way #2
while (($key, $value) = each %TempHash)
{
     print "$key ==> $value";
}

## printing hash - way #3
foreach $key (sort keys %TempHash){
   print "$key ==> $TempHash{$key}";
}

Removing white spaces

Sources:

Perl trim function to strip whitespace from a string – http://www.somacon.com/p114.php

# Declare the subroutines
sub trim($);
sub ltrim($);
sub rtrim($);

# Perl trim function to remove whitespace from the start and end of the string
sub trim($)
{
	my $string = shift;
	$string =~ s/^\s+//;
	$string =~ s/\s+$//;
	return $string;
}
# Left trim function to remove leading whitespace
sub ltrim($)
{
	my $string = shift;
	$string =~ s/^\s+//;
	return $string;
}
# Right trim function to remove trailing whitespace
sub rtrim($)
{
	my $string = shift;
	$string =~ s/\s+$//;
	return $string;
}

# Here is how to output the trimmed text "Hello world!"
print trim($string)."\n";
print ltrim($string)."\n";
print rtrim($string)."\n";

Related: Regular Expressions – https://eikonal.wordpress.com/2010/04/02/regular-expressions/ | Command line based text replace – https://eikonal.wordpress.com/2010/07/13/command-line-based-text-replace/

Comments (1)

Eikonal Blog

2010.10.05

sed tricks

References

2010.07.13

Command line based text replace

sed

The “replace” command

Perl

2010.04.02

Regular expressions

Sites

Tools

Books

Tidbits

2010.02.15

Perl online

Hashes

Files

Chomp()

Control structures

Tidbits

Rename files

Printing hashes

Removing white spaces

Categories

Archives

Email Subscription

Recent Posts

art and fun

Blogroll

free thinkers

health

infosec & privacy

it

knowledge management

life

mathematics & physics & astro, oh my!

mind & brain

past

science

skils & hacks

society & technology

unix