Skip to content

Term OOC covariance TSMM#2529

Open
tuluyhansozen wants to merge 1 commit into
apache:mainfrom
AdityaPandey2612:aditya-amls-test-branch
Open

Term OOC covariance TSMM#2529
tuluyhansozen wants to merge 1 commit into
apache:mainfrom
AdityaPandey2612:aditya-amls-test-branch

Conversation

@tuluyhansozen

@tuluyhansozen tuluyhansozen commented Jul 2, 2026

Copy link
Copy Markdown

This PR adds out-of-core coverage for covariance and improves OOC TSMM support.

Changes include:

  • Add CovarianceOOCInstruction for unweighted and weighted covariance.
  • Register OOC covariance parsing through OOCInstructionParser and OOCType.COV.
  • Skip zero-weight covariance updates to avoid NaN results.
  • Extend OOC TSMM to support multi-tile outputs and right-side TSMM cases.
  • Add OOC tests for:
    • cov(A, B)
    • cov(A, B, W)
    • left/right TSMM
    • dense/sparse inputs
    • single-tile and multi-tile output cases

@tuluyhansozen tuluyhansozen force-pushed the aditya-amls-test-branch branch from 7f235ad to 0c121eb Compare July 2, 2026 09:59
@tuluyhansozen tuluyhansozen marked this pull request as ready for review July 2, 2026 10:04
@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.00000% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.67%. Comparing base (c099a2f) to head (0c121eb).

Files with missing lines Patch % Lines
...ime/instructions/ooc/CovarianceOOCInstruction.java 82.60% 5 Missing and 3 partials ⚠️
...s/runtime/instructions/ooc/TSMMOOCInstruction.java 88.00% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2529      +/-   ##
============================================
+ Coverage     71.66%   71.67%   +0.01%     
- Complexity    49338    49357      +19     
============================================
  Files          1580     1581       +1     
  Lines        190516   190597      +81     
  Branches      37364    37373       +9     
============================================
+ Hits         136525   136608      +83     
+ Misses        43464    43460       -4     
- Partials      10527    10529       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tuluyhansozen

Copy link
Copy Markdown
Author

@janniklinde we solved conflicts and failing checks and created a fresh clean pr.

@janniklinde janniklinde left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. Overall, these changes look very good to me. I left some minor comments in the code @tuluyhansozen @122Astha @AdityaPandey2612

Comment on lines +66 to +67
if(w2 == 0)
return cov1;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change necessary?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its a reliability case where during the weighted mean computation, if its a sparse matrix, we run into the divide by zero error but a zero weight has no impact on the original aggregation so I just returned the original covariance itself.

Comment on lines -62 to -92
public void processInstruction( ExecutionContext ec ) {
public void processInstruction(ExecutionContext ec) {
MatrixObject min = ec.getMatrixObject(input1);
int nRows = (int) min.getDataCharacteristics().getRows();
int nCols = (int) min.getDataCharacteristics().getCols();
int bLen = min.getDataCharacteristics().getBlocksize();

OOCStream<IndexedMatrixValue> qIn = min.getStreamHandle();
int numRowBlocks = Math.toIntExact(min.getDataCharacteristics().getNumRowBlocks());
int numColBlocks = Math.toIntExact(min.getDataCharacteristics().getNumColBlocks());
int blocksPerJoinGroup = _type.isLeft() ? numColBlocks : numRowBlocks;
int partialsPerOutput = _type.isLeft() ? numRowBlocks : numColBlocks;

OOCStreamable<IndexedMatrixValue> inputStreamable = min.getStreamable();
final boolean createdCache = !inputStreamable.hasStreamCache();
final CachingStream inputCache = createdCache ? new CachingStream(min.getStreamHandle())
: inputStreamable.getStreamCache();

OOCStream<List<IndexedMatrixValue>> groupedPartials = createWritableStream();
OOCStream<IndexedMatrixValue> partials = createWritableStream();
OOCStream<IndexedMatrixValue> out = createWritableStream();
addOutStream(out);
ec.getMatrixObject(output).setStreamHandle(out);

CompletableFuture<Void> joinFuture = joinManyOOC(inputCache.getReadStream(), inputCache.getReadStream(), groupedPartials,
this::createPartialOutputTiles, this::getJoinIndex, this::getJoinIndex,
blocksPerJoinGroup, blocksPerJoinGroup);
CompletableFuture<Void> expandFuture = expandOOC(groupedPartials, partials, values -> values);

BinaryOperator plus = InstructionUtils.parseBinaryOperator(Opcodes.PLUS.toString());
CompletableFuture<Void> outFuture = groupedReduceOOC(partials, out, (left, right) -> {
MatrixBlock result = ((MatrixBlock) left.getValue()).binaryOperations(plus, right.getValue());
left.setValue(result);
return left;
}, partialsPerOutput);

propagateFailuresToOutput(out, List.of(joinFuture, expandFuture, outFuture));

outFuture.whenComplete((result, error) -> {
if(createdCache)
inputCache.scheduleDeletion();
});
}

private long getJoinIndex(IndexedMatrixValue value) {
return _type.isLeft() ? value.getIndexes().getRowIndex() : value.getIndexes().getColumnIndex();
}

//validation check TODO extend compiler to not create OOC otherwise
if( (_type.isLeft() && nCols > bLen)
|| (_type.isRight() && nRows > bLen) )
{
throw new UnsupportedOperationException();
private long getOutputIndex(IndexedMatrixValue value) {
return _type.isLeft() ? value.getIndexes().getColumnIndex() : value.getIndexes().getRowIndex();
}

private List<IndexedMatrixValue> createPartialOutputTiles(IndexedMatrixValue left, IndexedMatrixValue right) {
long leftIndex = getOutputIndex(left);
long rightIndex = getOutputIndex(right);
if(leftIndex > rightIndex)
return List.of();

MatrixBlock leftBlock = (MatrixBlock) left.getValue();
MatrixBlock rightBlock = (MatrixBlock) right.getValue();
if(leftIndex == rightIndex) {
MatrixBlock diagonal = leftBlock.transposeSelfMatrixMultOperations(new MatrixBlock(), _type);
return List.of(new IndexedMatrixValue(new MatrixIndexes(leftIndex, rightIndex), diagonal));
}

//int dim = _type.isLeft() ? nCols : nRows;
MatrixBlock resultBlock = null;

OOCStream<MatrixBlock> tmpStream = createWritableStream();

mapOOC(qIn, tmpStream,
tmp -> ((MatrixBlock) tmp.getValue())
.transposeSelfMatrixMultOperations(new MatrixBlock(), _type));

MatrixBlock tmp;
while ((tmp = tmpStream.dequeue()) != LocalTaskQueue.NO_MORE_TASKS) {
if (resultBlock == null)
resultBlock = tmp;
else
resultBlock.binaryOperationsInPlace(plus, tmp);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good for the general case. However, the more general execution path is more expensive. I'd like you to keep the special case where we can avoid creating a CachingStream and using the heavier primitives (when only a single output tile is produced).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please assert that OOC operators were used (similar to the tsmm test)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert that OOC op was used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants