Friday, September 12, 2008

Custom Parsers continued

It has been a while since I posted anything new about developing custom parsers.

The short of it is that it requires a substantial investment in time and if at all possible I would stay away from it until the feature has matured. I suggest that you implement the same functionality using event handlers instead. This area is well understood and doesn't require you to deploy a COM parser.

Since it's a COM object it can't be deployed using wsp solution it requires an (msi) installer because the COM registration needs administrative right to write to the registry.

The main problem we have run into is the "complete" lack of detailed API documentation.

I would suggest that you develop the parser in unmanaged code or at least a wrapper layer between the parser interfaces and your managed code. This layer is to shield you from the unfortunate descision to use LPCSTR's (Ansi strings) as input/output parameters from the parser.
These out parameters are especially troublesome since you can't use strightforward marshalling since the memory is owned by the parser and becomes invalidated everytime you update a property.

I have no idea why MS didn't follow the defacto standard related to memory ownership in COM and using BSTR's???

Your wrapper layer will return automation types to your managed code so you can use regular masrshalling.

You should also be aware of the fact that you can NOT write parsers for the most common image formats like jpg, gif, tiff, bmp and others.
This is because SharePoint internally handles these formats (I think to support picture libraries) and it won't call your parser. You can't even "hook" the out of the box parser the way you can with the office parser since the image parser is not an external COM object. It looks to me like it's an internal object.

Here are some resource that you can look into.

ZipParser a custom parser on CodePlex written in C++

A discussion thread on MSDN related to custom document parsers.

Good luck, and if you have more information related to custom parsers please share it here.

Thanks
/Jonas



Friday, June 22, 2007

Custom Document Parsers Part 2

Finally I have it up and running. We had to contact MS Support to get all the info required.

I have updated the MSDN Wiki with all IID's involved.

Another VERY important thing is that you don't have to add anything to the DOCPARSE.XML file. You have to register the parser using the OM.

SPFarm farm = SPFarm.Local;
SPWebService service = farm.Services.GetValue();
SPDocumentParser customParser = new SPDocumentParser(CUSTOMPARSER_PROGID, extension);
service.PluggableParsers.Add(extension, customParser);
service.Update();

/Jonas





View Jonas Nilsson's profile on LinkedIn

Tuesday, May 29, 2007

Custom Document Parsers in Share Point

For the last weeks I have been working on implementing a custom document parser in Share point v3. I was pleasantly surprised by Andrew May's excelent postings about Document Parsers. http://blogs.msdn.com/andrew_may/archive/2006/07/21/SharePointBeta2DocumentParserOverview4.aspx



So I actually thought this was going to be pretty straightforward since there's actually documentation..



I was wrong....



A custom parser is a COM object (I don't know why they choose this route). This didn't bother me since I was quite happy to do some COM programming again ;) Both the SDK documentation http://msdn2.microsoft.com/en-us/library/aa543908.aspx and Andrew's posts describe the interfaces ALMOST in detail.



What is missing is all the IID's for the COM interfaces. Without this GUID you can't create a custom parser... Since I'm used to browsing through the registry and using OleView from the good old COM days, I thought it was going to be easy to find this "last" piece of information. But I couldn't find any ISPDocumentParser entry under the interfaces section and instantiating the OTB Office parser from OleView didn't show any interfaces either....

OK there's always Google, so let's google it. I found one post where another developer was asking the same question: Where's the IID for ISPDocumentParser. Unfortunately there was no answer.

I decided to try to replace the SharePoint office parser with my "parser" that only traced out all QueryInterface calls made on it. I did this by creating a COM object with the same CLSID as the SharePoint Office parser and just update the registry to point to my dll instead of the share point parser.

Here's what I saw:

WSS is querying for two interfaces:

1) {E19C7100-9709-4DB7-9373-E7B518B47086}
2) {9E13184F-C136-41D4-899D-4331DB736BA1}

But I didn't know which one of them was for the ISPDocumentParser.

I decided to create an instance of the Office parser and forward these calls to it to see if it supports both of them. After doing this there was one left:

Let me introduce you to {9E13184F-C136-41D4-899D-4331DB736BA1}
This is the IID for ISPDocumentParser.

Here's my IDL

[
object, uuid(9E13184F-C136-41D4-899D-4331DB736BA1),
oleautomation,
nonextensible,
helpstring("ISPDocumentParser Interface"),
pointer_default(unique)
]
interface ISPDocumentParser : IUnknown
{
HRESULT Parse([in] ILockBytes *pilb, [in] IParserPropertyBag *pibag,[out] VARIANT_BOOL *pfChanged);
HRESULT Demote([in] ILockBytes *pilb, [in] IParserPropertyBag *pibag, [out] VARIANT_BOOL *pfChanged);
HRESULT ExtractThumbnail([in]ILockBytes *pilb, [in] IStream *pistmThumbnail );
}

I was now able to do some more investigation by wrapping this interface pointer returned by the Office parser. This way I was able to look at the property bag before and after the office parser had parsed the file.

Now there's "only" one problem left. Share point is NEVER trying to instantiate any of my custom parsers... I'm adding them to the Web Server Extensions\12\CONFIG\DOCPARSE.XML file and I'm doing an iisreset but no luck.

To be continued.

/Jonas

Saturday, February 17, 2007

CAS and SharePoint

I have been working with SharePoint WSS2 SPS 2003 for about a year and a half now and what has been the most frustr... challenging part has been dealing with Code Access Security and trying to convince others that we are now developing in an environment where we don't have unrestricted access to the server(s).

Typical examples are writing web parts that in the initial design saved their configuration settings in xml files... You might shake your head but it's true ;)

If you say that's a good idea let me ask you where do you save these files? How do you ensure that the code can access these files? How do you ensure that every user running the code can access these files?

You have a solution to this too!!

In a load balanced environment how do you ensure that all these files are in synch, how many copies do you have, what happens when the server admin is adding a new web front end?

OK so this is a rookie mistake, but a lot of us will run into it because we are coming from ASP.NET or other environments and we must now use SharePoint and thus play by some rules.

What I will try to do is write a few posts about Code Access Security, and try to show how it's working and help you that are new to WSS MOSS SPS... get a much easier transition than I had.

More to come.
/Jonas