ImageEn for Delphi and C++ Builder ImageEn for Delphi and C++ Builder

 

ImageEn Forum
Profile    Join    Active Topics    Forum FAQ    Search this forumSearch
 All Forums
 ImageEn Library for Delphi, C++ and .Net
 ImageEn and IEvolution Support Forum
 How to Improve Blank Page Detection

Note: You must be registered in order to post a reply.
To register, click here. Registration is FREE!

View 
UserName:
Password:
Format  Bold Italicized Underline  Align Left Centered Align Right  Horizontal Rule  Insert Hyperlink   Browse for an image to attach to your post Browse for a zip to attach to your post Insert Code  Insert Quote Insert List
   
Message 

 

Emoji
Smile [:)] Big Smile [:D] Cool [8D] Blush [:I]
Tongue [:P] Evil [):] Wink [;)] Black Eye [B)]
Frown [:(] Shocked [:0] Angry [:(!] Sleepy [|)]
Kisses [:X] Approve [^] Disapprove [V] Question [?]

 
Check here to subscribe to this topic.
   

T O P I C    R E V I E W
Sidney Egnew Posted - Nov 18 2024 : 08:44:00
I have over 110,000 PDF files that were scanned in color in duplex mode. I used the code shown below to classify the documents as follows:
53,077 - Classified as all blank backs using 99.8% threshold
19,252 - Classified as duplex
35,365 - Waiting to be classified

v_Index := 1;
v_MinPercent := MaxInt;
repeat
  v_ImageView.ClearAll;
  v_ImageView.IO.Params.FileName := v_FileName;
  v_ImageView.IO.Params.ImageIndex := v_Index;
  v_ImageView.IO.LoadFromFilePDF(v_FileName);
  v_ColorPercent := v_ImageView.Proc.GetDominantColor(v_RGB);
  if v_ColorPercent < 99.8 then
    v_MinPercent := Min(v_MinPercent,v_ColorPercent);
  v_Index := v_Index+2;
until (v_Index >= V_ImageView.IO.Params.ImageCount);
if v_MinPercent = MaxInt then
  UpdateDuplex (p_ScanPageNo,-1,'N');
else
  UpdateDuplex (p_ScanPageNo,v_MinPercent,'Y');

Documents identified as duplex with percentages of 96-97% were sampled with 55% of those classified incorrectly. Since more than half of all duplex classified documents have a percentage of 96% or higher, a very large number of documents are likely to have been classified incorrectly.

The backs in the sample with content showed text, watermarks, handwriting, and in a few instances paper damage. The documents that should have been classified as Non-Duplex are clearly blank to the human eye. What can be done to improve the image classification?

Thanks
4   L A T E S T    R E P L I E S    (Newest First)
xequte Posted - Nov 19 2024 : 22:14:37
Hi Sidney

You can just ignore the border area when testing if blank:

// Test if the image is blank (with 1% threshold and ignoring the border area)
threshold  := 1.0;  // Allow 1% of image to be a different color
borderPerc := 10;   // Border area is 10% of width/height
ImageEnView1.SelectionBase := iesbBitmap;
ImageEnView1.Select( MulDiv( ImageEnView1.IEBitmap.Width, borderPerc, 100 ),
                     MulDiv( ImageEnView1.IEBitmap.Height, borderPerc, 100 ),
                     MulDiv( ImageEnView1.IEBitmap.Width, 100 - borderPerc, 100 ),
                     MulDiv( ImageEnView1.IEBitmap.Height, 100 - borderPerc, 100 ));
if ImageEnView1.Proc.GetDominantColor(cl) >= 100 - threshold then
  ShowMessage('Image is blank!')
else
  ShowMessage('Image is NOT blank!');
ImageEnView1.Deselect();


Nigel
Xequte Software
www.imageen.com
Sidney Egnew Posted - Nov 19 2024 : 21:33:56
I will keep your solution in mind. But I am not too concerned about the smudges as there are watermarks and other valid markings that might be lost. Going forward, I can ask users if they want to ignore the backs when the dominant color is close. Many documents are only front and back and those are indexed anyway.

I am more interested in detecting the roller marks. They seem to be predominately near the left edge of the paper. How can I trim a bit off the images before checking for the dominant color?

Thanks
xequte Posted - Nov 19 2024 : 18:10:02
Hi Sidney

Yes, these images contain artifacts that reduce the percentage of the dominant color.

I think you need to have a special case for images that are almost blank like this (e.g. >98%).

For those "maybe blank" images do a further test. Reduce the number of colors/merge similar colors and then reperform GetDominantColor().

For example, your image, 5714771_B.png (which has a gray smudge) returns 98.6% for GetDominantColor(), but I could easily increase this to nearly 100% percent by reducing the color depth, e.g. using thresholding or adjusting the level.

You should try the Every Method demo and add this to the end of the PerformOperation method:

  dd := DestIEViewer.Proc.GetDominantColor( rgb );
  Desc := Desc + format( ' + Dom Color: %s (%s%%)', [ ColorToHex( TRGB2TColor( rgb )), FloatToStrF( dd, ffGeneral, 4, 4 )]);

Then you can try out the various color adjustment and depth methods to find which gives the most reliable result (without increasing the rate of false positives).

Nigel
Xequte Software
www.imageen.com
xequte Posted - Nov 18 2024 : 22:14:59
Hi Sidney

Can you save some of the pages that are mis-classified to PNG files and post or email them to us?


Nigel
Xequte Software
www.imageen.com