We show that linear_model.Lasso provides the same results for dense and sparse data and that in the case of sparse data the speed is improved.
1 2 3 4 5 6 7 8 | print (__doc__) from time import time from scipy import sparse from scipy import linalg from sklearn.datasets.samples_generator import make_regression from sklearn.linear_model import Lasso |
The two Lasso implementations on Dense data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | print ( "--- Dense matrices" ) X, y = make_regression(n_samples = 200 , n_features = 5000 , random_state = 0 ) X_sp = sparse.coo_matrix(X) alpha = 1 sparse_lasso = Lasso(alpha = alpha, fit_intercept = False , max_iter = 1000 ) dense_lasso = Lasso(alpha = alpha, fit_intercept = False , max_iter = 1000 ) t0 = time() sparse_lasso.fit(X_sp, y) print ( "Sparse Lasso done in %fs" % (time() - t0)) t0 = time() dense_lasso.fit(X, y) print ( "Dense Lasso done in %fs" % (time() - t0)) print ( "Distance between coefficients : %s" % linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_)) |
The two Lasso implementations on Sparse data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | print ( "--- Sparse matrices" ) Xs = X.copy() Xs[Xs < 2.5 ] = 0.0 Xs = sparse.coo_matrix(Xs) Xs = Xs.tocsc() print ( "Matrix density : %s %%" % (Xs.nnz / float (X.size) * 100 )) alpha = 0.1 sparse_lasso = Lasso(alpha = alpha, fit_intercept = False , max_iter = 10000 ) dense_lasso = Lasso(alpha = alpha, fit_intercept = False , max_iter = 10000 ) t0 = time() sparse_lasso.fit(Xs, y) print ( "Sparse Lasso done in %fs" % (time() - t0)) t0 = time() dense_lasso.fit(Xs.toarray(), y) print ( "Dense Lasso done in %fs" % (time() - t0)) print ( "Distance between coefficients : %s" % linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_)) |
Total running time of the script: (0 minutes 0.000 seconds)
Download Python source code:
lasso_dense_vs_sparse_data.py
Download IPython notebook:
lasso_dense_vs_sparse_data.ipynb
Please login to continue.