Multivariate Adaptive Regression Splines (earth)

Multivariate adaptive regression splines (MARS) is a non-parametric regression method that extends a linear model with non-linear interactions.

This module borrows the implementation of the technique from the Earth R package by Stephen Milborrow.

Example

>>> import Orange
>>> data = Orange.data.Table("housing")
>>> c = Orange.regression.earth.EarthLearner(data, degree=2, terms=10)
>>> print c
MEDV =
   23.587
   +11.896 * max(0, RM - 6.431)
   +1.142 * max(0, 6.431 - RM)
   -0.612 * max(0, LSTAT - 6.120)
   -228.795 * max(0, NOX - 0.647) * max(0, RM - 6.431)
   +0.023 * max(0, TAX - 307.000) * max(0, 6.120 - LSTAT)
   +0.029 * max(0, 307.000 - TAX) * max(0, 6.120 - LSTAT)
class Orange.regression.earth.EarthLearner(degree=1, terms=21, penalty=None, thresh=0.001, min_span=0, new_var_penalty=0, fast_k=20, fast_beta=1, pruned_terms=None, scale_resp=True, store_instances=True, **kwds)

Earth learner class. Supports both regression and classification problems. For classification, class values are expanded into continuous indicator columns (one for each value if the number of values is grater then 2), and a multi response model is fit to these new columns. The resulting classifier the computes response values on new instances to select the final predicted class.

class Orange.regression.earth.EarthClassifier(domain, best_set, dirs, cuts, betas, subsets=None, rss_per_subset=None, gcv_per_subset=None, instances=None, multitarget=False, expanded_class=None, original_domain=None, **kwargs)

Earth classifier.

base_features()

Return a list of features for the included Earth terms. The attributes can be used in Orange’s domain translation (i.e. they define the proper get_value_from functions).

base_matrix(instances=None)

Return the base matrix (bx) of the Earth model for the table. If table is not supplied, the base matrix of the training instances is returned. Base matrix is a len(instances) x num_terms matrix of computed values of terms in the model (not multiplied by beta) for each instance.

Parameters:instances (Orange.data.Table) – Input instances for the base matrix.
evimp(used_only=True)

Return the estimated variable importances.

Parameters:used_only – if True return only used attributes
predict(instance)

Predict the response value(s)

Parameters:instance (Orange.data.Instance) – Data instance
to_string(percision=3, indent=3)

Return a string representation of the model.

used_attributes(term=None)

Return the used terms for term (index). If no term is given, return all attributes in the model.

Parameters:term (int) – term index

Utility functions

Orange.regression.earth.gcv(rss, n, n_effective_params)

Return the generalized cross validation.

gcv = rss / (n * (1 - NumEffectiveParams / n) ^ 2)

Parameters:
  • rss – Residual sum of squares.
  • n – Number of training instances.
  • n_effective_params – Number of effective paramaters.
Orange.regression.earth.plot_evimp(evimp)

Plot the variable importances as returned from EarthClassifier.evimp call.

import Orange
data = Orange.data.Table("housing")
c = Orange.regression.earth.EarthLearner(data, degree=3)
Orange.regression.earth.plot_evimp(c.evimp())
files/earth-evimp.png

The left axis is the nsubsets measure and on the right are the normalized RSS and GCV.

Orange.regression.earth.bagged_evimp(classifier, used_only=True)

Extract combined (average) evimp from an instance of BaggedClassifier

Example:

from Orange.ensemble.bagging import BaggedLearner
bc = BaggedLearner(EarthLearner(degree=3, terms=10), data)
bagged_evimp(bc)
class Orange.regression.earth.ScoreEarthImportance(t=10, degree=2, terms=10, score_what='nsubsets', cached=True)

A subclass of Orange.feature.scoring.Score that. scores features based on their importance in the Earth model using bagged_evimp.

Examples

import Orange

l1 = Orange.multitarget.earth.EarthLearner(name="earth")
l2 = Orange.multitarget.binary.BinaryRelevanceLearner(
	learner = Orange.regression.mean.MeanLearner, name = "Majority")
learners = [l1, l2]
# PLSClassifier do not work with missing values, the missing values need to be imputed
data = Orange.data.Table('multitarget-synthetic')

results = Orange.evaluation.testing.cross_validation(learners, data, 3)

print "Regression - multitarget-synthetic.tab"
print "%18s  %6s" % ("Learner    ", "RMSE")
for i in range(len(learners)):
    print "%18s  %1.4f" % (learners[i].name,
    Orange.multitarget.scoring.mt_average_score(results, Orange.evaluation.scoring.RMSE)[i])
    

Table Of Contents

Previous topic

PLS Classification Learner (pls)

Next topic

Multi-target Scoring (scoring)

This Page