The problems with parallelism in the preconditioner have led to searches for other preconditioners. Often simple diagonal scaling is an adequate preconditioner and this is trivially parallelizable. For results on a Connection Machine, see . Often this approach leads to a significant increase in iteration steps. Still another approach is to use polynomial preconditioning: , i.e. , for some suitable jth degree polynomial. This preconditioner can be implemented by forming only matrix--vector products, which, depending on the structure of A, are easier to parallelize . For one often selects a Chebychev polynomial, which requires some information on the spectrum of A.
Finally we point out the possibility of using the truncated Neumann series for the approximate inverse of A, or parts of L and U. Madsen et al.  discuss approximate inversion of A, which from the implementation point of view is equivalent to polynomial preconditioning. In  the use of truncated Neumann series for removing some of the recurrences in the triangular solves is discussed. This approach leads to only fine-grained parallelism (vectorization).