August Feng

File cardinality on zipping performance

About

Just documenting some quick experiments relating zipping performance when the number of files vary.

Creating the data for the experiment

  (set! *random-state* (random-state-from-platform))

  (use-modules (ice-9 binary-ports)
               (ice-9 textual-ports)
               (ice-9 popen)
               (ice-9 format)
               (srfi srfi-19))

  (define ls
    (lambda ()
      (let ((port (open-input-pipe "ls -l")))
        (get-string-all port))))

  (define mktempdir
    (lambda ()
      (let ((port (open-input-pipe "mktemp -d")))
        (string-drop-right (get-string-all port) 1))))

  (define zip
    (lambda (zipfile)
      (let ((cmd (format #f "zip ~s *" zipfile)))
        (system cmd))))

  (define time
    (lambda (f)
      (let ((start (current-time)))
        (f)
        (time-difference (current-time) start))))

  (define kb
    (lambda (n)
      (let ((kb-in-bytes 1024))
        (* n kb-in-bytes))))

  (define mb
    (lambda (n)
      (let ((mb-in-bytes (kb 1024)))
        (* n mb-in-bytes))))

  (define gb
    (lambda (n)
      (let ((gb-in-bytes (mb 1024)))
        (* n gb-in-bytes))))

  (define write-random-data
    (lambda (port n)
      (unless (equal? n 0)
        (put-u8 port (random 256))
        (write-random-data port (- n 1)))))

  (define create-random-file
    (lambda (filename n)
      (let ((port (open-file filename "wb")))
        (write-random-data port n)
        (close-port port))))

  (define create-files
    (lambda (n l)
      (unless (equal? n 0)
        (create-random-file (format #f "~r" n) l)
        (create-files (- n 1) l))))

  (begin
    (let ((directory "/tmp/work"))
      (chdir directory)
      (create-files 1000 (mb 1))
      (let ((t (time (lambda () (zip "foobar.zip")))))
        (display (time-second t)))))

Results

compression ratio

The compression ratio is 0%, which makes sense because the random data is so unordered.

In fact, when compressing a single file, the archive turned out to be bigger.

compression time

The compression time made no difference between 1000 files @ 1mb, or 1 file @ 1gb. Both took 19 seconds.