Thursday 3 November 05
I have uploaded the latest version of Sled (2.0.2) to Sourceforge – which includes the fixes described in my last couple of entries below
Archive for November 2005
I have uploaded the latest version of Sled (2.0.2) to Sourceforge – which includes the fixes described in my last couple of entries below
I have been trying again with load testing Sled and CopperCore. I’ve managed to update the Sled player so that it doesn’t generate errors, but it is still being quite slow at times, especially when there are multiple users (I’ve tested with up to concurrent 15 users)
The main bottleneck on the Sled player appeared to be the server side transformation of the XML (I’d put some counters in the code to test where the code seemed to be slow), so I tried caching some of the transformed content (parts of the pages which didn’t tend to change from hit to hit), but the slow transformation was usually the one that generates the activity menu tree – which I can’t really cache very easily because it changes so much depending on which user is logged in and which page they are visiting. But the XML transformation isn’t the only speed problem, it’s just part of it.
I also went back to have a look at testing the CopperCore player again and this time I did manage to generate the same error I was getting with Sled (but Sled now retries if an error is generated) – the reason I found that didn’t come up before was that I was only hitting the frame page and not the individual frames. So I amended the load testing script so that it also hit the individual frames. It’s difficult to compare the response times between Sled and the CC player, because Sled generates the whole page all at once, and CC needs a number of hits to get the equivalent content displayed (and also the XSL transformation isn’t being done in CC). But from the results it looks like the generation of the activity tree is also the slowest part on the CC player.
Overall, the speed problem seems to be a combination of a number of factors. So far I’ve still only tested when the server has a couple of UoLs & a couple or users, so I’ll run the same tests again once I’ve added more users & UoLs to see if that slows it down significantly further – we were finding that the speed on a standalone installation with just a few users and UoLs was fine, but on our Sled demo site where we have around 10 UoLs and over 50 users (registered, not concurrent) the speed slowed right down – even thought the users aren’t accessing the site simultaneously.
The fix I have put in Sled is really just a sticking plaster, not a full & proper solution and that there could also be other reason for the player being slow under load that I haven’t spotted! Please feel free to let me know if you have any suggestions as to how it could be made faster
I’ve spent a bit of time trying to sort out the performance problems with Sled – and the error messages that get generated. I’ve been trying to recreate the problems by using a load testing tool (JMeter) on the Sled/CC installation on my laptop. By forcing about 5 or 6 virtual users to access the site at the same time (and go through a sequence of hitting several pages on the Sled site) I could get errors to be generated, or get pages returned which were blank. I also tried to hit the CopperCore player in the same way but couldn’t get the same problems created.
I think that what was happening was that the error messages that appeared in the browser were actually due to an error that had previously occurred on another page request which meant that a java object wasn’t created properly, this then meant that on another page which needed to refer to the object it didn’t exist – so created the error.
I’m not 100% sure but I think the root cause of this is java threading with the CCSI module – as the errors that were generated were when the system tried to create objects defined in the CCSI module. If this is the case, I think I can explain why this doesn’t affect the CopperCore in the same way. The CC player is essentially a single piece of code, which is written so that this single bit of code can only process one request at time (subsequent requests have to wait until the first request has completed), which means that it will only ever make contact with the CCSI module with one request at a time. However the Sled player is coded differently – so although the Struts framework is thread safe and will handle multiple concurrent users, this doesn’t stop it making multiple requests to the CCSI module at the same time, which is where the problem occurs.
I’ve slightly altered the Sled code so that when it tries to create one of the CCSI objects, it checks that it’s a valid (ie not null & hasn’t generated an error) and if it’s not the thread tries again a few milliseconds later, and keeps trying until a valid object has been created. Testing this again with the load tester seems to have stopped the error messages appearing to the user. However I am still sometimes getting deadlock exceptions which I think are arising in the CC code, but these don’t; seem to be affecting the actual operation and output back to the user – so it seems that java is just reporting this as an error but it coping ok with it.
I’m sure that I’ve not resolved this in the ‘proper’ way – I’m sure there must be better ways of doing it, but I need to be a better java programmer to do that!
However, I’m not sure that this will fix the speed of the Sled application, the machine I’m testing it on only has a couple of UoLs and a couple of users on it – and the speed problems seemed to mainly occur once there were more users & UoLs than this. I’ll add some more users an UoLs and keep testing to see what happens. The speed that’s being reported by the load tester is an average of about 7 seconds per request – which is really quite bad! – but that’s with 10 users hitting the site making about 250 pages requests in total over a period of about 1 minute. The CopperCore player is much quicker than this (around 1 sec per request for same no of page hits over same length of time) – and I’m sure much of this is to do with the fact that the Sled player is doing transformations server side (so doing file reads to get the XSL) so this could be speeded up by maybe caching the XSL files or something similar.
Once I’ve tried load testing with more users and UoLs I’ll be able to see if it slows down significantly more or not – if it does then I’ll try and find out what is really slowing the processing down – but if not, it might not be a good use of my time to spend ages just trying to cut a second or so off the processing time – especially as it hasn’t really been built with production or heavy use in mind.