I received an excellent question in the comments of another post from Rajeev, who asked:
"I am trying to extract text from a pdf document using ColdFusion. But I need to invoke this as a web service from my PHP code.
I am able to use 'cfpdf' tag with 'extracttext' but as I said I need to run this from PHP. But the 'http://localhost/CFIDE/services/pdf.cfc?wsdl' WSDL has 'extractpages' and it does not have 'extracttext'.
So is it possible to somehow invoke CF as a webservice from PHP and use it to extract text from a pdf document?"
The clear answer is YES, absolutely.
What are the services?
Now, I love the ColdFusion services.. I've blogged about them before, and have presented on the 'awesomeness' of using them within a Flex/AIR application to utilise some ColdFusion functionality in a desktop app.
This is a perfect example of their intended use; using some of the powerful ColdFusion features and tags inside of a PHP project. Accessing the power, versatility and rapid development/implementation of ColdFusion in an external project is what the CFaaS features were made for... after all, we need to share the love with other developers and languages, and help to make their lifes easier and more productive. :)
Relating to the question in hand, the first and most important thing to note here is that the exposed services in ColdFusion 9 are nothing more than ColdFusion components, accessed as a wsdl.
These 'default' service CFCs are stored within the /CFIDE/services directory of your CF installation.
The PDF Service, although fairly well packed with useful methods, does indeed miss out the extractText function that we have available in tag form. For a full list of available methods in all of the exposed services, feel free to reference the ColdFusion as a Service Cheat Sheet, available as a PDF to download and cherish forever.
Create a web service
ColdFusion has always been awesome at creating web services. On a rudimentary level, if you can write a CFC, you can write a web service. It's as simple as that.
So, let's jump straight in and build a CFC that will satisfy the task at hand and allow us to use the extractText PDF action.
Create a new CFC called 'pdfText.cfc' and save it in the /CFIDE/services directory within your webroot. This will put it alongside the other exposed service layer components.
/CFIDE/services/pdfText.cfc
<cfcomponent displayname="pdfText" output="false" extends="CFIDE.services.base" hint="I am the pdfText component, and I'm going to let you extract text from a PDF."> <--- We are extending the CFIDE.services.base component. This will allow us to access some security methods. ---> <cffunction name="extractText" access="remote" returntype="Any" output="false" hint="I extract text from the PDF. Output in either XML or String format."> <cfargument name="serviceusername" required="false" type="String" default="" hint="I am the service username used to access the web services." /> <cfargument name="servicepassword" required="false" type="String" default="" hint="I am the service password used to access the web services." /> <cfargument name="source" required="true" type="String" hint="I am the absolute or relative path of the PDF file|PDF document variable| cfdocument variable" /> <cfargument name="password" required="false" type="String" default="" hint="I am the password of the PDF document." /> <cfargument name="pages" required="true" type="String" default="*" hint="The page numbers from where the text needs to be extracted from the PDF document. Wildcard '*' is ALL pages." /> <cfargument name="returnData" required="false" type="String" default="xml" hint="I am the return format for the extracted text. String or XML." /> <!--- var scope the return variable ---> <cfset var pdfData = '' /> <!--- We have extended the /CFIDE/services/base Class and so we now have access to it's methods. Let's use these for security to ensure that there is no 'illegal' access to this service. ---> <!--- Run the isAllowed method to check the username and password ---> <cfif super.isAllowed( username=arguments.serviceusername, password=arguments.servicepassword, service='PDF') <!--- AND check the IP address matches that set within the administrator security roles ---> AND super.isAllowedIP( username=arguments.serviceusername, service='PDF')> <!--- The PDF tag, action set to 'extractText' ---> <cfpdf action="extracttext" source="#arguments.source#" pages="#arguments.pages#" type="#arguments.returnData#" useStructure="true" honourspaces="true" password="#arguments.password#" name="pdfData" /> </cfif> <cfreturn pdfData /> </cffunction> </cfcomponent>
Why the security?
The CFC would work perfectly well without extending the CFIDE.services.base component and without the added security checks. If you want to use it without those, you can. I have added these in as an extra level of security and to protect unwarranted access to the service layer.
As we are emulating and 'hooking into' the existing exposed services, it would be prudent to err on the side of caution and follow the same security measures. By default, all access to the exposed services is restricted within the ColdFusion administrator.
To access these, a user permission must be created, complete with allowed IP address/range. My previous post on using ColdFusion as a Service has the step-by-step instructions on setting up this access.
Calling the new web service is incredibly easy, and can be made directly within the browser to view the data:
http://localhost:8500/CFIDE/services/pdfText.cfc?wsdl&method=extractText&serviceusername=user&servicepassword=pass&source=/path/to/pdf
All done
There you have it. A new ColdFusion 'exposed service' component, developed to use the tags and functionality you are currently missing, extending the security restriction model of the default services.