Lasso on dense and sparse data

We show that linear_model.Lasso provides the same results for dense and sparse data and that in the case of sparse data the speed is improved.

print(__doc__)
 
from time import time
from scipy import sparse
from scipy import linalg
 
from sklearn.datasets.samples_generator import make_regression
from sklearn.linear_model import Lasso

The two Lasso implementations on Dense data

print("--- Dense matrices")
 
X, y = make_regression(n_samples=200, n_features=5000, random_state=0)
X_sp = sparse.coo_matrix(X)
 
alpha = 1
sparse_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=1000)
dense_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=1000)
 
t0 = time()
sparse_lasso.fit(X_sp, y)
print("Sparse Lasso done in %fs" % (time() - t0))
 
t0 = time()
dense_lasso.fit(X, y)
print("Dense Lasso done in %fs" % (time() - t0))
 
print("Distance between coefficients : %s"
      % linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_))

The two Lasso implementations on Sparse data

print("--- Sparse matrices")
 
Xs = X.copy()
Xs[Xs < 2.5] = 0.0
Xs = sparse.coo_matrix(Xs)
Xs = Xs.tocsc()
 
print("Matrix density : %s %%" % (Xs.nnz / float(X.size) * 100))
 
alpha = 0.1
sparse_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=10000)
dense_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=10000)
 
t0 = time()
sparse_lasso.fit(Xs, y)
print("Sparse Lasso done in %fs" % (time() - t0))
 
t0 = time()
dense_lasso.fit(Xs.toarray(), y)
print("Dense Lasso done in %fs" % (time() - t0))
 
print("Distance between coefficients : %s"
      % linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_))

Total running time of the script: (0 minutes 0.000 seconds)

Download Python source code: lasso_dense_vs_sparse_data.py

Download IPython notebook: lasso_dense_vs_sparse_data.ipynb

Links:

http://scikit-learn.org/stable/auto_examples/linear_model/lasso_dense_vs_sparse_data.html

doc_scikit_learn

2025-01-10 15:47:30

Comments