Skip to content

CSR merge path SPMV fails on NVIDIA GB10 DGX Spark #1981

@Slaedr

Description

@Slaedr

I have not been able to diagnose the cause of the issue yet. test/matrix/csr_kernels2_cuda and test/matrix/matrix_cuda fail on the DGX Spark in release mode. These are the tests in csr_kernels2 that fail:

[ RUN      ] Csr.SimpleApplyIsEquivalentToRefWithMergePath
/home/aditya/code/ginkgo_orig/test/matrix/csr_kernels2.cpp:353: Failure
Relative error between dresult and expected is 0.73649633698692007
	which is larger than r<value_type>::value (which is 2.2204460492503131e-15)
dresult saved as Csr.SimpleApplyIsEquivalentToRefWithMergePath.dresult.mtx
expected saved as Csr.SimpleApplyIsEquivalentToRefWithMergePath.expected.mtx


[  FAILED  ] Csr.SimpleApplyIsEquivalentToRefWithMergePath (9 ms)
[ RUN      ] Csr.SimpleApplyIsEquivalentToRefWithMergePathUnsorted
/home/aditya/code/ginkgo_orig/test/matrix/csr_kernels2.cpp:365: Failure
Relative error between dresult and expected is 0.73000055491306148
	which is larger than r<value_type>::value (which is 2.2204460492503131e-15)
dresult saved as Csr.SimpleApplyIsEquivalentToRefWithMergePathUnsorted.dresult.mtx
expected saved as Csr.SimpleApplyIsEquivalentToRefWithMergePathUnsorted.expected.mtx


[  FAILED  ] Csr.SimpleApplyIsEquivalentToRefWithMergePathUnsorted (11 ms)
[ RUN      ] Csr.AdvancedApplyIsEquivalentToRefWithMergePath
/home/aditya/code/ginkgo_orig/test/matrix/csr_kernels2.cpp:376: Failure
Relative error between dresult and expected is 0.73414369462669749
	which is larger than r<value_type>::value (which is 2.2204460492503131e-15)
dresult saved as Csr.AdvancedApplyIsEquivalentToRefWithMergePath.dresult.mtx
expected saved as Csr.AdvancedApplyIsEquivalentToRefWithMergePath.expected.mtx


[  FAILED  ] Csr.AdvancedApplyIsEquivalentToRefWithMergePath (8 ms)
...
[ RUN      ] Csr.SimpleApplyToDenseMatrixIsEquivalentToRefWithMergePath
/home/aditya/code/ginkgo_orig/test/matrix/csr_kernels2.cpp:432: Failure
Relative error between dresult and expected is 0.72785922538979009
	which is larger than r<value_type>::value (which is 2.2204460492503131e-15)
dresult saved as Csr.SimpleApplyToDenseMatrixIsEquivalentToRefWithMergePath.dresult.mtx
expected saved as Csr.SimpleApplyToDenseMatrixIsEquivalentToRefWithMergePath.expected.mtx


[  FAILED  ] Csr.SimpleApplyToDenseMatrixIsEquivalentToRefWithMergePath (9 ms)
[ RUN      ] Csr.AdvancedApplyToDenseMatrixIsEquivalentToRefWithMergePath
/home/aditya/code/ginkgo_orig/test/matrix/csr_kernels2.cpp:443: Failure
Relative error between dresult and expected is 0.72562353484944564
	which is larger than r<value_type>::value (which is 2.2204460492503131e-15)
dresult saved as Csr.AdvancedApplyToDenseMatrixIsEquivalentToRefWithMergePath.dresult.mtx
expected saved as Csr.AdvancedApplyToDenseMatrixIsEquivalentToRefWithMergePath.expected.mtx


[  FAILED  ] Csr.AdvancedApplyToDenseMatrixIsEquivalentToRefWithMergePath (9 ms)

With the other test suite:

[  FAILED  ] Matrix/CsrWithMergePathStrategy.SpMVIsEquivalentToRef, where TypeParam = CsrWithMergePathStrategy
[  FAILED  ] Matrix/CsrWithMergePathStrategy.AdvancedSpMVIsEquivalentToRef, where TypeParam = CsrWithMergePathStrategy
[  FAILED  ] Matrix/CsrWithMergePathStrategy.MixedSpMVIsEquivalentToRef, where TypeParam = CsrWithMergePathStrategy
[  FAILED  ] Matrix/CsrWithMergePathStrategy.MixedAdvancedSpMVIsEquivalentToRef, where TypeParam = CsrWithMergePathStrategy
[  FAILED  ] Matrix/CsrWithMergePathStrategy.MixedInputSpMVIsEquivalentToRef, where TypeParam = CsrWithMergePathStrategy
[  FAILED  ] Matrix/CsrWithMergePathStrategy.MixedInputAdvancedSpMVIsEquivalentToRef, where TypeParam = CsrWithMergePathStrategy
[  FAILED  ] Matrix/CsrWithMergePathStrategy.MixedOutputSpMVIsEquivalentToRef, where TypeParam = CsrWithMergePathStrategy
[  FAILED  ] Matrix/CsrWithMergePathStrategy.MixedOutputAdvancedSpMVIsEquivalentToRef, where TypeParam = CsrWithMergePathStrategy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions