Create searchable (Images) in PDFs via the SDK (by using .Net)
good afternoon,
we're starting roll-out (pdf) document generation solution based on xsl-fo transformation xml pdf files works fine, 1 current constraint embedded images not searchable users.
now understand can in fact manually when having acrobat professional installed , using it's text recognition/ocr functionality, wondering whether same technology available through sdk.
as haven't worked sdk myself, yet, i'd know how performs when being used in .net/c#-based applications, there specific constraints, limitations environment cause?
the way want implement add (optional) post-processing step after xsl-fo transformation (which takes place on central server queue etc) perform text recognition, since create couple hundred thousand pages each day, reliability , scalability big factor. 1 document-generation machine/server creates couple documents in parallel on standard multicore machine couple gigs of ram, has tested sdk's functionality under heavy load extensively?
i double-checking because we've hit several walls in past such under-communicated limitations of 3rd party vendors....
cheers , thanks,
-jörg battermann
we're starting roll-out (pdf) document generation solution based on xsl-fo transformation xml pdf files works fine, 1 current constraint embedded images not searchable users.
now understand can in fact manually when having acrobat professional installed , using it's text recognition/ocr functionality, wondering whether same technology available through sdk.
as haven't worked sdk myself, yet, i'd know how performs when being used in .net/c#-based applications, there specific constraints, limitations environment cause?
the way want implement add (optional) post-processing step after xsl-fo transformation (which takes place on central server queue etc) perform text recognition, since create couple hundred thousand pages each day, reliability , scalability big factor. 1 document-generation machine/server creates couple documents in parallel on standard multicore machine couple gigs of ram, has tested sdk's functionality under heavy load extensively?
i double-checking because we've hit several walls in past such under-communicated limitations of 3rd party vendors....
cheers , thanks,
-jörg battermann
> wondering whether same technology available through sdk.
yes, because sdk automates acrobat. not self-contained sdk or redistributable - applications developed sdk require copy of acrobat installed on same machine in order run.
> way want implement add (optional) post-processing step after xsl-fo transformation (which takes place on central server queue etc) perform text recognition
then want @ different product, since acrobat's eula prevents being installed on server part of purely automated workflow.
> since create couple hundred thousand pages each day, reliability , scalability big factor.
again, acrobat not technically suitable type of environment.
> has tested sdk's functionality under heavy load extensively?
no, since again sdk automates acrobat, , acrobat not technically suited nor licensed type of high-availability server environment.
have contacted adobe find out if of livecycle server products suit needs? livecycle pdf generator has ocr functionality (not sure if directly suit needs - should contact adobe directly answer), know can ocr filetypes tiff, , designed work in high-availability server environment.
yes, because sdk automates acrobat. not self-contained sdk or redistributable - applications developed sdk require copy of acrobat installed on same machine in order run.
> way want implement add (optional) post-processing step after xsl-fo transformation (which takes place on central server queue etc) perform text recognition
then want @ different product, since acrobat's eula prevents being installed on server part of purely automated workflow.
> since create couple hundred thousand pages each day, reliability , scalability big factor.
again, acrobat not technically suitable type of environment.
> has tested sdk's functionality under heavy load extensively?
no, since again sdk automates acrobat, , acrobat not technically suited nor licensed type of high-availability server environment.
have contacted adobe find out if of livecycle server products suit needs? livecycle pdf generator has ocr functionality (not sure if directly suit needs - should contact adobe directly answer), know can ocr filetypes tiff, , designed work in high-availability server environment.
More discussions in Acrobat SDK
adobe
Comments
Post a Comment