Start the C# Coding
The Math Part
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace MySuperCoolRemoteUDF.MathTools
{
public class MCForecastTool
{
...
public MCForecastTool(List<double> Prices)
{
...
Initialize();
}
public double PedictNextPrice()
{
double absolutePriceChange = lastPrice - beforeLastPrice;
double rnd1 = RandomNumberGenerator.NextSystem();
double rnd2 = RandomNumberGenerator.NextSystem();
double propUpTomorrow = 1d - IndicatorState.ProbUp;
double propDownTomorrow = IndicatorState.ProbUp;
short tomorrowsDirection = (short)0;
short direction = absolutePriceChange < 0 ? (short)-1 : (short)1;
...
double priceChange = CalculateCurrentAVG();
double lambda = 1d / priceChange;
double rnd3 = RandomNumberGenerator.NextExponential(lambda);
double priceTomorrow = ...;
return priceTomorrow;
}
...
}
}
The UDF Part
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using MySuperCoolRemoteUDF.MathTools;
using ExcelDna.Integration;
namespace MySuperCoolRemoteUDF
{
public class MonteCarloRunner
{
[ExcelFunction(Description = "Glass bowls the next stock price",
Category = "My UDFs", IsClusterSafe=true)]
public static double NextPricePlease(double Iterations, object[,] HistoricalPrices)
{
List<double> Prices = new List<double>(HistoricalPrices.Length);
// read the Excel Cells:
foreach (object HistoricalPriceObj in HistoricalPrices)
{
double Price = (double) HistoricalPriceObj;
Prices.Add(Price);
}
int IterationsInt = (int)Iterations;
List<MCForecastTool> MCToolList = new List<MCForecastTool>(IterationsInt);
// Create N-many forecast tools:
for (int i = 0; i < IterationsInt; i++)
MCToolList.Add(new MCForecastTool(Prices));
double tempSum = 0d;
// Predict N-many forecast prices
foreach (MCForecastTool MCFT in MCToolList)
tempSum += MCFT.PedictNextPrice();
// Tomorrow's price is the mean of all predictions:
double NextPrice = tempSum / (double)IterationsInt;
return NextPrice;
}
}
}
Configure Excel-DNA
<DnaLibrary> <ExternalLibrary Path="MySuperCoolRemoteUDF.dll" /> </DnaLibrary>
Use the UDF in Excel 2010
Local UDF Run
By default every UDF will be executed locally. A user must explicitely activate remote UDF execution on an HPC Server 2008 R2 cluster. Let’s use a local run first to create a new Excel 2010 workbook. As you can see in the screenshot below the spreadsheet will contain historical prices of 20 stocks. These prices could be real- or near-time prices but in our sample they are randomized. Starting from Now we will use the SuperCoolRemoteUDF to predict tomorrows price of every stock. The day after tomorrow will be predicted based on the historical prices from row B to row L plus the predicted price of row M. Thus we have dependencies between every cell starting from row N to row Q.
The local computation of one cell (like M8) takes approximately 6.8 seconds using 250,000 iterations. To calculate every cell row M, N, O, P and Q in parallel we need at least a 20 core machine. Assuming a typical 4 core workstation the whole matrix will be calculated within 20 / 4 * 5 * 6,8 = 170 seconds.
Remote UDF Run
Using an HPC Server 2008 R2 cluster we can scale the computational power to thousands of cores. If there were no dependencies between the cells we would be able to calculate every cell in parallel within the cluster. But since there are dependencies between the rows M to Q we can only calculate row by row.
To enable a parallel run we just need to allow user-defined XLL functions to run on a cluster. That’s it. The spreadsheet will immediately re-calculate its values once they update on a cluster. This reduces the pure computational time from 170 seconds to 20 / 20 * 5 * 6,8 = 35 seconds. Approximately 3-5 seconds need to be added to create a session and send and receive the message requests.




Thanks for this. I have set up a cluster to run some Quant code, once I used the correct version of Excel DNA all worked perfectly. Version 59433 (from codeplex) works perfectly.
Sure! Govert did a fantastic job…
When passing a UDF along to a cluster are there any restrictions or drawback associated with code that already has threading in it? Does it matter if my function were something like
[ExcelFunction(IsClusterSafe=true)]
public static double test()
{
double r1 = 0;
Thread t1 = new Thread(new ThreadStart(() => r1 = expensiveFunction()));
t1.Start();
double r2 = 0;
Thread t2 = new Thread(new ThreadStart(() => r2 = expensiveFunction()));
t2.Start();
t1.Join();
t2.Join();
return r1 + r2;
}
Or should I leave the workload distribution to the cluster? I suppose I’m curious as to what qualifies as “cluster safe”.
I’m new to all things cluster related, so pardon my ignorance. Any help would be appreciated.
Thanks.
Rob
Rob, you should distrubute what makes sense
ie. I’ve seen that running 12 processes on a 12 core machine with core affinity executes faster than 1 process with 12 threads on the same machine. Clusters are build not bought. This means you need to test what is faster. Run it with both expensiveFunctions in 1 UDF call and run it with both expensiveFunctions – 1 in 1 UDF call each…
Hope this makes sense. Torsten