Friday, June 22, 2007

Custom Document Parsers Part 2

Finally I have it up and running. We had to contact MS Support to get all the info required.

I have updated the MSDN Wiki with all IID's involved.

Another VERY important thing is that you don't have to add anything to the DOCPARSE.XML file. You have to register the parser using the OM.

SPFarm farm = SPFarm.Local;
SPWebService service = farm.Services.GetValue();
SPDocumentParser customParser = new SPDocumentParser(CUSTOMPARSER_PROGID, extension);
service.PluggableParsers.Add(extension, customParser);
service.Update();

/Jonas





View Jonas Nilsson's profile on LinkedIn

Tuesday, May 29, 2007

Custom Document Parsers in Share Point

For the last weeks I have been working on implementing a custom document parser in Share point v3. I was pleasantly surprised by Andrew May's excelent postings about Document Parsers. http://blogs.msdn.com/andrew_may/archive/2006/07/21/SharePointBeta2DocumentParserOverview4.aspx



So I actually thought this was going to be pretty straightforward since there's actually documentation..



I was wrong....



A custom parser is a COM object (I don't know why they choose this route). This didn't bother me since I was quite happy to do some COM programming again ;) Both the SDK documentation http://msdn2.microsoft.com/en-us/library/aa543908.aspx and Andrew's posts describe the interfaces ALMOST in detail.



What is missing is all the IID's for the COM interfaces. Without this GUID you can't create a custom parser... Since I'm used to browsing through the registry and using OleView from the good old COM days, I thought it was going to be easy to find this "last" piece of information. But I couldn't find any ISPDocumentParser entry under the interfaces section and instantiating the OTB Office parser from OleView didn't show any interfaces either....

OK there's always Google, so let's google it. I found one post where another developer was asking the same question: Where's the IID for ISPDocumentParser. Unfortunately there was no answer.

I decided to try to replace the SharePoint office parser with my "parser" that only traced out all QueryInterface calls made on it. I did this by creating a COM object with the same CLSID as the SharePoint Office parser and just update the registry to point to my dll instead of the share point parser.

Here's what I saw:

WSS is querying for two interfaces:

1) {E19C7100-9709-4DB7-9373-E7B518B47086}
2) {9E13184F-C136-41D4-899D-4331DB736BA1}

But I didn't know which one of them was for the ISPDocumentParser.

I decided to create an instance of the Office parser and forward these calls to it to see if it supports both of them. After doing this there was one left:

Let me introduce you to {9E13184F-C136-41D4-899D-4331DB736BA1}
This is the IID for ISPDocumentParser.

Here's my IDL

[
object, uuid(9E13184F-C136-41D4-899D-4331DB736BA1),
oleautomation,
nonextensible,
helpstring("ISPDocumentParser Interface"),
pointer_default(unique)
]
interface ISPDocumentParser : IUnknown
{
HRESULT Parse([in] ILockBytes *pilb, [in] IParserPropertyBag *pibag,[out] VARIANT_BOOL *pfChanged);
HRESULT Demote([in] ILockBytes *pilb, [in] IParserPropertyBag *pibag, [out] VARIANT_BOOL *pfChanged);
HRESULT ExtractThumbnail([in]ILockBytes *pilb, [in] IStream *pistmThumbnail );
}

I was now able to do some more investigation by wrapping this interface pointer returned by the Office parser. This way I was able to look at the property bag before and after the office parser had parsed the file.

Now there's "only" one problem left. Share point is NEVER trying to instantiate any of my custom parsers... I'm adding them to the Web Server Extensions\12\CONFIG\DOCPARSE.XML file and I'm doing an iisreset but no luck.

To be continued.

/Jonas

Saturday, February 17, 2007

CAS and SharePoint

I have been working with SharePoint WSS2 SPS 2003 for about a year and a half now and what has been the most frustr... challenging part has been dealing with Code Access Security and trying to convince others that we are now developing in an environment where we don't have unrestricted access to the server(s).

Typical examples are writing web parts that in the initial design saved their configuration settings in xml files... You might shake your head but it's true ;)

If you say that's a good idea let me ask you where do you save these files? How do you ensure that the code can access these files? How do you ensure that every user running the code can access these files?

You have a solution to this too!!

In a load balanced environment how do you ensure that all these files are in synch, how many copies do you have, what happens when the server admin is adding a new web front end?

OK so this is a rookie mistake, but a lot of us will run into it because we are coming from ASP.NET or other environments and we must now use SharePoint and thus play by some rules.

What I will try to do is write a few posts about Code Access Security, and try to show how it's working and help you that are new to WSS MOSS SPS... get a much easier transition than I had.

More to come.
/Jonas