• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Jeanne Boyarsky
  • Bear Bibeault
  • Knute Snortum
  • paul wheaton
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Ganesh Patekar
  • Tim Holloway
  • Carey Brown
  • salvin francis

Linux Grep Commands For TAR file to identify a pattern

Ranch Hand
Posts: 76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

For Zip files *.gz, We are using zgrep command to search for a pattern in zip files.

I have created these two zip files one.gz, two.gz

Created a Tar TestTar.tar with the above two zip files

[b]How to find a pattern in a tar file without extracting the above tar file.

do we have any commands?[b]
lowercase baba
Posts: 12760
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
without extracting at all?  I think you have to extract them to some extent, but you don't have to extract them to disk.  The -O flag should extract them to stdout, which you could then pipe to your zgrep...so something like

tar xvf <your.tar.file> -O | zgrep <yourpattern>
Posts: 20982
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A ZIP file as Windows people know them is both an archive (bundle of files/directories contained within a single file) and a compressed file. That includes the Java jar utility, since a JAR is just a ZIP archive with an extra file or so of meta-data in the archive.

To produce a "zipped tar", though, the process is to first archive everything with tar*, then gzip the archive file which also known as a "tarball". Unlike zip, gzip does not archive, it just crunches a single file.

The convention for such compressed archives is to suffix with with either ".tar.gz" or ".tgz". The gzip utility by default will take a file "f", crunch it down, and replace is with "f.gz"

About the time Linux first began to catch on, some genius realized that they could save some typing by making tar run the gzip (or some other selected) utility against the archive it was working with. That's the "z" option now widely seen in tar commands.

ZIP-style compression distorts the data so much that there's absolutely no way that grep can pattern-match against a gzipped file (whether it's tar'ed or not). Although I think that there is a form of grep that can uncompress on the fly to do comparisons. If not, that's why pipelining is so popular in the Unix world.

Note that the zip utility by default actually strives to obtain maximum compression, but not every compression algorithm is optimal for every file. In fact, choose the wrong algorithm and the "compressed" file can end up larger than the original one! So zip has 4 or 5 different algorithms and it will try them all. You can see this at work if you watch the output from ZIP, since it names which algorithm it decided on as it processes each file. If a file is listed simply as "stored", that means that the best compression for it was no compression at all.

So you can see that zip optimizes compression of each file in a ZIP archive, but tar/zip optimizes compression of the archive as a unit.

*  For those who never worked with old time mainframes and minicomputers, "tar" means "tape archive" from back when most data storage was on reels of magnetic tape.
There's a way to do it better - find it. -Edison. A better tiny ad:
Enterprise-grade Excel API for Java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!