The past few weeks have not held many conclusions. The only decisions I have made prior to this week are with transforming variables and how to accommodate the 3d structure of the data. These weeks have mostly consisted of filling in knowledge gaps and learning about relationships in my data. I also revisited a few inversions to try to improve results.

I started by studying grouping in the data. The variables I was interested in were texture, bulk density and SOC. The idea was to try to define physical structures in the field. For instance, we know that there are paleo channels consisting of high sand content. The question I wanted to answer, was can I reconstruct the paleo channels by grouping (statistically) the above mentioned variables. One reason for this is because it was an option for dealing with the three dimensional structure of the data. The 3d structure presents some interesting challenges. First, the vertical distances (less than 1m) are magnitudes of order smaller than the horizontal distances (up to 750m). Because of this, 3d kriging routines are unlikely to yield good results. Even if the vertical variation is isolated, the range of spatial correlation is probably larger than the depth of the soil measurements. Layering of the soil also presents challenges because measurements in different layers may not be correlated at all. One paper in the literature utilized a soil structure based approach which motivated me to try a similar method. One key difference between that study and this one is the depth of the soil profile. Their profile was about 15m thick and they collected profile information (layer type and depth) for each core. From the profile information they created 3d structures. They studied the spatial variability within each structure. My hope was that I would be able to isolate the same information with factorial kriging because one range of covariation would represent the within structural variance and another larger range would represent between structure variance. This may still be an outcome but I have only been partially successful at isolating the structures. However, this wasn’t a futile exercise because I have discovered that the top two layers (8cm and 28cm) are very similar to each other and different from the bottom two layers (48cm and 68cm). The bottom two layers are not as similar to each other as the top two but still similar. For now, I will maintain each layer as an independent dataset but in the future, combining of the layers may be considered. The main reason for not combining is that averaging the hydraulic curves is not exactly a straightforward task.

After defining the data grouping, I studied the distributions of the variables. many of the variables approximated a normal distribution but some were very odd. In the literature it is common to find log distributions of alpha, n and Ks. Theta s and theta r vary from study to study. I did not find anything in the literature about l. This is likely because many studies use a fixed value of .5 of -1 instead of a fitted value. For my data, alpha and Ks fit a log transformation very well. N however, was best fit by a log(log(n)) transformation. L is very odd and no combination of transformations was able to give it a normal distribution. Theta r was normal and theta s was not exactly normal but not too extreme. The only other interesting variable was bulk density which has a bimodal “bump” at very low values. The other variables were mostly normal but have some skew and ketosis. After discussing these findings with Dr. Morari, he suggested a gaussian anamorphosis in Isatis (the spatial stats software). The procedure works by approximating the actual cumulative distribution with a series of hermite polynomials. Each hermite polynomial is a gaussian density function or a derivative of one. This is similar to a taylor series approximation. The algorithm then uses some nifty math tricks to transform the sample to a normal distribution. I still have a few questions about the process like how it is affected by outliers. I will be addressing this question and some related ones in the future.

As of today, I have been successful at executing all of the various types of analyses that I will need to complete my research. This doesn’t sound like much but it has taken several weeks for me to learn the Isatis software package. It is very powerful but not strait forward and the help files are poor. Figures 1 and figure 2 include some very preliminary results of a multi-collocated cokriging interpolation and a map of one factor at one spatial scale for a factorial kriging analysis that I ran. These are crude representations just to show you what I am working on. Figure1. Map of factor1 at short range (209m). It mostly represents variation in density. Figure 2. Bulk density estimates from multi-collocated cokriging with elevation data.

The most important decision that I have made involves cokriging. When I gave my proposal presentation I was unaware of how many options there are for a cokriging model. I have spent quite a bit of time studying all of the options. One goal of this work is to create an accurate and detailed map for each hydraulic parameter of the van Genuchten model. We only know these values at the sample locations of which there are less than 50. Interpolations made using only these points will have high uncertainty at locations far from the points. In order to improve the results and reduce uncertainty, we want to incorporate information from variables which are sampled at more locations. In our case, we have 2 apparent electrical conductivity measurements (each sampled at 9000+ locations) and elevation measurements (sampled at 1500+ locations). One option is kriging with external drift (KED). This model, interpolates the dense variable and uses a simple regression equation to predict the variable of interest from the dense variable prediction. The classic example for this method is using elevation to predict temperature. Temperature is only measured at weather stations but elevation is known almost everywhere (3-10m resolution grid). The regression model estimates that temperature changes -3C per 100m in elevation. The shape of the final temperature surface is essentially a scaled version of the elevation surface. However, more external drift variables can be added and the outcome is not as simple. The major drawback to KED is that it ignores the spatial structure of the correlation. To include the spatial structure cokriging is used. Before discussing the cokriging models, there are some important notes about the data. Each data set has been measured at different locations. In other words, none of the datasets have measurements at the same location. This can be problematic when modeling the spatial cross correlation because no points are available to estimate variation at small scales (points at the same location are separated by 0m). One woke around is to interpolate the dense variables to a grid and then migrate the under-sampled, variable of interest to a grid node. If the grid resolution is small enough and the dense variable dense enough, this can be accomplished without the addition of much error. The result is a partially heterotrophic data set where the dense variable is known “everywhere”. The work around will be used for this study. For cokriging there are many options but the most applicable for this study are simple cokriging (with heterotrophic data) and multi-collocate cokriging. Both approaches use the collocated data points (points where both variables are known) to model spatial cross correlation. The difference is in the neighborhood. Simple cokriging uses all of the points from the dense variable to estimate the value of the variable of interest at a location. Multi-collocated cokriging uses only the points that contain both variables and one additional point. The additional point is the dense variable at the point where the estimation is occurring. There are two advantages to multi-collocated cokriging. First is with computation. Solving the kriging system requires inverting an nxn matrix where n is the number of points included in the calculation. By reducing n, the calculation is simplified. Calculation issues also arise because of the high likelihood of autocorrelation with neighboring samples. The second advantage is that it avoids the screening effect discussed in Xu et al. (1992) where the abundance of data near the estimation location impedes influences from farther away locations. Results obtained with multi-collocated cokriging are typically very good, because the model is quite robust. Multi-collocated cokriging will be used to interpolate hydraulic parameters for this study.