Metadata i bilder: Mer omfångsrik extraktion

2013-11-27

Mer komplett extraktion av metadata i bilder än mini-koden i Nyhetsanalys: Sunt förnuft när det gäller bildanalysen. Ev. kollisioner mellan variabler djupare i trädet hanteras inte. Vetskap om dem behöver ändå hanteras och ev. kollisioner kan hänteras när de uppstår. Viss redunans mellan de tre moduler som används finns d.v.s. för skräp indexering på begränsad hårdvara går det bra att optimera en del.


Exempel utskriften för en bild från Reuters kan dessutom för den intresserade läsaren ge ett kompletterande besläktat men enklare exempel för metod diskuterad kort i Snowden-filerna: Att detektera manipulerad information. Ex. avseende preferenser person eller organisatoriskt bias där vissa program, bildstorlek m.m. är mer eller mindre normalt (i sig eller givet värde för annat metadata).



Exempel: Utskrift metadata


För många (men inte alla) fält hittas information tillsammans med de moduler som används färdiga och hittas på search.cpan.org. Annat metadata som kan förekomma är varierat legacy mer eller mindre riktigt mot hur tänkt att vara och ibland med egen formatering i datafälten.


Bits Per Sample 8

Color Components 3

Comment CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 95


Current IPTC Digest bf21543e5c98c2174bac65abbe29c7ca

Directory test

Encoding Process Baseline DCT, Huffman coding

ExifTool Version Number 9.27

File Access Date/Time 2013:11:27 10:32:53+01:00

File Creation Date/Time 2013:11:27 10:33:31+01:00

File Modification Date/Time 2013:11:27 10:33:45+01:00

File Name a1.jpg

File Permissions rw-rw-rw-

File Size 41 kB

File Type JPEG

Image Height 215

Image Size 380x215

Image Width 380

JFIF Version 1.01

MIME Type image/jpeg

Resolution Unit None

SamplesPerPixel 3

X Resolution 1

Y Cb Cr Sub Sampling YCbCr4:2:0 (2 2)

Y Resolution 1

by-line DENIS BALIBOUSE

caption/abstract European Union foreign policy chief Catherine Ashton (3rd L) delivers a statement during a ceremony next to British Foreign Secretary William Hague, Germany's Foreign Minister Guido Westerwelle, Iranian Foreign Minister Mohammad Javad Zarif, Chinese Foreign Minister Wang Yi, U.S. Secretary of State John Kerry, Russia's Foreign Minister Sergei Lavrov and French Foreign Minister Laurent Fabius (L-R) at the United Nations in Geneva November 24, 2013. Iran and six world powers reached a breakthrough agreement early on Sunday to curb Tehran's nuclear programme in exchange for limited sanctions relief, in a first step towards resolving a dangerous decade-old standoff. REUTERS/Denis Balibouse (SWITZERLAND - Tags: POLITICS ENERGY TPX IMAGES OF THE DAY)

category I

city GENEVA

color_type YCbCr

country/primary location code CHE

country/primary location name Switzerland

credit REUTERS

date created 20131124

edit status CORRECTION

file_ext jpg

file_media_type image/jpeg

fixture identifier GM1E9BO0W9W01

headline European Union foreign policy chief Catherine Ashton delivers a statement during a ceremony at the United Nations in Geneva

height 215

image type 3S

keywords :rel:d:bm:GF2E9BO09X801

language identifier en

object name IRAN-NUCLEAR-DEAL/

original transmission reference DBA01

originating program JPEGTOII2/MED

program version 1.0.0.16

source X90072

supplemental category DIP POL ENR tpx

time created 053640+0000

urgency 4

width 380

writer/editor DBA/KR

Kod


Perl.


use FileHandle;
use Image::Info qw(image_info dim);
use Image::EXIF;
use Image::ExifTool qw(:Public);
use Image::IPTCInfo;

my $debug = 1;
my %metadata_image;

&run_it("_RULE_bRITANNIA",
 "test/" . "a1.jpg");

sub run_it()
{
    my $session_id = $_[0];
    my $file = $_[1];

    if ( length($session_id) < 3 )
    {
 die;
    }

    my $fp = FileHandle -> new($file);

    if ( ! $fp )
    {
 die;
    }

    $fp -> close();

    &sense__image__init_session($session_id);

    #.................................

    &sense__image__iptc($session_id,$file);
    &sense__image__elif_tags($session_id,$file);
    &sense__image__image_info($session_id,$file);

    if ( $debug )
    {
 &power_print();
    }

    #.................................

    &sense__image__end_session();
}

sub sense__image__init_session()
{
    undef %metadata_image;

    return $_[0];
}

sub sense__image__end_session()
{
    undef %metadata_image;

    return 1;
}
    
sub sense__image__iptc()
{
    # Legacy i värden för datafält. 

    my $file_name = $_[1];
    my $info = new Image::IPTCInfo($file_name);

    my %db = %{$info};

    if ( ! %db )
    {
 return 0;
    }


    my @keys = keys %db;
    my $i = 0;

    my $dirty = 0;

    while ( $i < @keys )
    {
 if ( ! ( ref ( $db{$keys[$i]} ) eq "HASH" ) )
 {
     goto abc;
 }

 my @gg = keys %{$db{$keys[$i]}};
 my $k = 0;
     
 while ( $k < @gg )
 {
     my @ww;
     if ( ref ( $db{$keys[$i]} -> {$gg[$k]} ) eq "ARRAY" )
     {
  @ww = @{$db{$keys[$i]} -> {$gg[$k]}};
     }
     else
     {
  my $value = $db{$keys[$i]} -> {$gg[$k]};
  $ww[0] = $db{$keys[$i]} -> {$gg[$k]};
     }
     
     my $cc = 0;

     while ( $cc < @ww )
     {
  my $value = $ww[$cc];
  
  if ( length($value) > 0 )
  {
      # Kolliderar meta-data: stopp-fält eller hantera annat :-D
      $metadata_image{$gg[$k]} -> {$value}++;
      
      $dirty = 1;
  }
  
  $cc++;
     }

     $k++;
 }

      abc:

 $i++;
    }

    return $dirty;
}    

sub sense__image__image_info()
{
    my $file_name = $_[1];

    my %info = %{image_info($file_name)};
    my @keys = keys %info;
    my $i = 0;
    my $dirty = 0;

    while ( $i < @keys )
    {
 if ( 
     ( $keys[$i] eq "color_type" ) ||
     ( $keys[$i] eq "file_media_type" ) ||
     ( $keys[$i] eq "file_ext" ) ||
     ( $keys[$i] eq "width" ) ||
     ( $keys[$i] eq "height" ) ||
     ( $keys[$i] eq "SamplesPerPixel" ) ||
     ( $keys[$i] eq "Interlace" ) ||
     ( $keys[$i] eq "Compression" ) ||
     ( $keys[$i] eq "Gamma" ) ||
     ( $keys[$i] eq "LastModificationTime" ) 
     )
 {

     if ( length($info{$keys[$i]}) > 0 )
     {
  $metadata_image{$keys[$i]} -> {$info{$keys[$i]}}++;
  $dirty = 1;
     }
 }


 $i++;
    }

    return
 $dirty;
}

sub sense__image__elif_tags()
{
    my $file_name = $_[1];

    # Re-used dokumentations-texten ungefär...

    my $exifTool = new Image::ExifTool;
    $exifTool->Options(Unknown => 1);
    my $info = $exifTool->ImageInfo($file_name);

    my $group = '';
    my $tag = '';
    my $c1 = 0;
    my $dirty = 0;

    foreach $tag ($exifTool->GetFoundTags('Group0'))
    {
 if ($group ne $exifTool->GetGroup($tag))
 {
     $group = $exifTool->GetGroup($tag);
 }

 my $val = $info->{$tag};

 if (ref $val eq 'SCALAR') 
 {

     if ($$val =~ /^Binary data/)
     {
  $val = "($$val)";
     } 
     else 
     {
  my $len = length($$val);
  $val = "(Binary data $len bytes)";
     }
 }

 # Antingen värdet eller förklaring av det om ej.
 my $value = 
     $exifTool->GetDescription($tag);

 if ( 
     ( ! ( index($val,"Bad IPTC data") != -1 ) ) &&
     ( length($tag) > 0 )
     )
 {
     $metadata_image{$value} -> {$val}++;     
     $dirty = 1;
 }

 if ( $c1 > 200 )
 {
     goto safety;
 }

 $c1++;
    }

  safety:

    return
 $dirty;
}

sub power_print()
{
    my $out = FileHandle -> new("debug.tmp","w");

    my @gg = sort keys %metadata_image;
    my $i = 0;

    print
 @gg . "\n";

    while ( $i < @gg )
    {
 my @hh = sort keys %{$metadata_image{$gg[$i]}};
 my $k = 0;

 while ( $k < @hh )
 {
     print
  $out
  $gg[$i] . "\t" . $hh[$k] . "\n";

     $k++;
 }

 print
     $out
     "\n";

 $i++;
    }

    $out -> close();
}