Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong pdlapiv or pdlapv2 output in ScaLAPACK #90

Open
j7168908jx opened this issue Jan 23, 2024 · 2 comments
Open

Wrong pdlapiv or pdlapv2 output in ScaLAPACK #90

j7168908jx opened this issue Jan 23, 2024 · 2 comments

Comments

@j7168908jx
Copy link

j7168908jx commented Jan 23, 2024

Summary: I found an erroneous output when calling pdlapv2 with the following reproducible example. With inspection of the source code pdlapv2.f I think it's an algorithm error. (Or I misunderstood the usage of this routine). Please help!

I am trying to write some c++ code that calls ScaLAPACK and I encountered this problem.
I want to do a column reordering for a distributed matrix. Following the explanation together with the source code in ScaLAPACK, I find the result confusing. Here is a minimum 4 processes 7x7 matrix, 6x6 block-cyclic example:

#include <cstdlib>
#include <cassert>
#include <string>
#include <iostream>
#include <iomanip>
#include <vector>
#include "mpi.h"

// build with:
//  mpicxx --std=c++17 minimal.cpp -L/home/xli/software/scalapack-2.2.0 -L/home/xli/.local/lib -L/home/xli/.local/lib64 -lscalapack -llapack -lopenblas -lgfortran -o minimal.x

// run with
//  OMP_NUM_THREADS=1 LD_LIBRARY_PATH=/home/xli/.local/lib:/home/xli/.local/lib64:/opt/MPICH/4.2.0/lib:$LD_LIBRARY_PATH mpirun -np 4 ./minimal.x

#define ASSERT_DHIF(cond, msg) if (!(cond)) { std::cerr << msg << std::endl; exit(1); }
#define LOGDEBUG if (myid == 0) std::cout

int myid;

extern "C" {

  void Cblacs_get(const int ictxt, const int what, int *val);
  void Cblacs_pinfo(int *myrank, int *nprocs);
  void Cblacs_gridinit(int *ictxt, const char *order, const int nprow, const int npcol);
  void Cblacs_gridinfo(const int ictxt, int *nprow, int *npcol, int *myrow, int *mycol);
  void Cblacs_gridexit(const int ictxt);
  void descinit_(int *desc,
      const int *m, const int *n, const int *mb, const int *nb, const int *irsrc, const int *icsrc, const int *ictxt, const int *lld, const int *info);

  void pdlapiv_(char *direc, char *rowcol, char *pivroc, int *m, int *n, double *A, int *IA, int *JA, int *DESCA, int *IPIV, int *ip, int *jp, int *descip, int *iwork);
  void pdlapv2_(char *direc, char *rowcol, int *m, int *n, double *A, int *IA, int *JA, int *DESCA, int *IPIV, int *ip, int *jp, int *descip);

  // compute LOCr or LOCc (local size of data for distributed array)
  int numroc_(int *n, int *nb, int *iproc, int *isrcproc, int *nprocs);
}

void random_fill_matrix(double *A, int locr, int locc) {
  std::srand(42+myid);
  for (int i = 0; i < locr; i++)
    for (int j = 0; j < locc; j++)
      A[i + j * locr] = (double)rand() / RAND_MAX;
}

std::string print_mat(std::vector<double> &A, int locr, int locc) {
  std::stringstream ss;
  for (int i = 0; i < locr; i++) {
    for (int j = 0; j < locc; j++)
      ss << std::fixed << std::setprecision(5) << A[i + j * locr] << " ";
    ss << "\n";
  }
  return ss.str();
}

std::ostream& operator<<(std::ostream& os, const std::vector<int>& v) {
  os << "{";
  for (auto it = v.begin(); it != v.end(); ++it) {
    os << *it;
    if (it != v.end() - 1)
      os << ", ";
  }
  os << "}";
  return os;
}

void validate_pdlapiv() {

  int numprocs, info, ictxt;
  int myrow, mycol;
  int nprow = 2, npcol = 2;
  int zero = 0, one = 1, two = 2;

  MPI_Comm_rank(MPI_COMM_WORLD, &myid);
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);

  ASSERT_DHIF(numprocs == 4, "This test must be run with 4 processes");

  LOGDEBUG << "Start of test case" << std::endl;

  // Initialize BLACS context
  Cblacs_pinfo(&myid, &numprocs);
  Cblacs_get(0, 0, &ictxt);
  Cblacs_gridinit(&ictxt, "Row", nprow, npcol);
  Cblacs_gridinfo(ictxt, &nprow, &npcol, &myrow, &mycol);

  LOGDEBUG << "finish cblacs init" << std::endl;

  // Define matrix dimensions and block size
  int m = 7, n = 7, mb = 6, nb = 6;
  double *A, *B;

  // Initialize matrix descriptor for A
  int locr = numroc_(&m, &mb, &myrow, &zero, &nprow); // Local number of rows of A
  int locc = numroc_(&n, &nb, &mycol, &zero, &npcol); // Local number of columns of A
  int lldA = std::max(1, locr); // Local leading dimension of A

  int descA[9], descB[9];
  descinit_(descA, &m, &n, &mb, &nb, &zero, &zero, &ictxt, &lldA, &info);
  ASSERT_DHIF(info == 0, ("Error in descinit, info=" + std::to_string(info)).c_str());

  descinit_(descB, &m, &n, &mb, &nb, &zero, &zero, &ictxt, &lldA, &info);
  ASSERT_DHIF(info == 0, ("Error in descinit, info=" + std::to_string(info)).c_str());

  int descip[9];
  int nnb = n + nb * 2;
  descinit_(descip, &two, &nnb, &one, &nb, &zero, &zero, &ictxt, &lldA, &info);
  ASSERT_DHIF(info == 0, ("Error in descinit, info=" + std::to_string(info)).c_str());

  LOGDEBUG << "locr and locc: " << locr << " " << locc << std::endl;

  // get decip's locr and locc
  int locrip = numroc_(&two, &one, &myrow, &zero, &nprow);
  int loccip = numroc_(&nnb, &nb, &mycol, &zero, &npcol);

  LOGDEBUG << "locrip and loccip: " << locrip << " " << loccip << std::endl;

  // Allocate memory for local array A
  std::vector<double> vA(locr * locc, 0);
  std::vector<double> vB(locr * locc, 0);
  A = vA.data();
  B = vB.data();

  // Initialize matrix A with random values and make a copy because PSGEQPF will overwrite it
  random_fill_matrix(A, locr, locc);
  std::copy(A, A + locr * locc, B);

  LOGDEBUG << "Everything in A: \n" << print_mat(vA, locr, locc) << std::endl;
  LOGDEBUG << "Everything in B: \n" << print_mat(vB, locr, locc) << std::endl;

  std::vector<int> ipiv;
  if (myid % 2 == 0) {
    ipiv = std::vector<int> {7, 2, 3, 4, 5, 6};
  } else {
    ipiv = std::vector<int> {1};
  }

  LOGDEBUG << "IPIV: " << ipiv << std::endl;

  char direc = 'F', rowcol = 'C';

  // append several 0 to ipiv
  ipiv.insert(ipiv.end(), 6, 0);
  LOGDEBUG << "IPIV: " << ipiv << std::endl;
  assert(ipiv.size() == loccip);

  // print out all the parameters
  LOGDEBUG << "Everything that will be used to call pdlapiv: " << std::endl;

  // todo: pdlapv2 still not working
  pdlapv2_(&direc, &rowcol, &m, &n, A, &one, &one, descA, ipiv.data(), &one, &one, descip);

  LOGDEBUG << "Everything in A: \n" << print_mat(vA, locr, locc) << std::endl;
  LOGDEBUG << "Everything in B: \n" << print_mat(vB, locr, locc) << std::endl;

  Cblacs_gridexit(ictxt);

}


int main(int argc, char **argv) {
  MPI_Init(&argc, &argv);
  validate_pdlapiv();
  MPI_Finalize();
  return 0;
}
             

The program runs but with errornous output. Which I expect to exchange the first column and the last column of A.

(The output is limited to only the first process can output)

Start of test case
finish cblacs init
locr and locc: 6 6
locrip and loccip: 1 12
Everything in A:
0.03347 0.32996 0.69064 0.42249 0.20627 0.25013
0.63656 0.86362 0.30166 0.02492 0.36499 0.76538
0.31786 0.13573 0.10675 0.76067 0.08167 0.55096
0.56465 0.74327 0.98176 0.21885 0.45440 0.51865
0.57384 0.55523 0.72855 0.62887 0.79613 0.58372
0.27477 0.82960 0.91368 0.96540 0.25208 0.11995

Everything in B:
0.03347 0.32996 0.69064 0.42249 0.20627 0.25013
0.63656 0.86362 0.30166 0.02492 0.36499 0.76538
0.31786 0.13573 0.10675 0.76067 0.08167 0.55096
0.56465 0.74327 0.98176 0.21885 0.45440 0.51865
0.57384 0.55523 0.72855 0.62887 0.79613 0.58372
0.27477 0.82960 0.91368 0.96540 0.25208 0.11995

IPIV: {7, 2, 3, 4, 5, 6}
IPIV: {7, 2, 3, 4, 5, 6, 0, 0, 0, 0, 0, 0}
Everything that will be used to call pdlapiv:
Everything in A:
0.03347 0.32996 0.69064 0.42249 0.20627 0.25013
0.63656 0.86362 0.30166 0.02492 0.36499 0.76538
0.31786 0.13573 0.10675 0.76067 0.08167 0.55096
0.56465 0.74327 0.98176 0.21885 0.45440 0.51865
0.57384 0.55523 0.72855 0.62887 0.79613 0.58372
0.27477 0.82960 0.91368 0.96540 0.25208 0.11995

Everything in B:
0.03347 0.32996 0.69064 0.42249 0.20627 0.25013
0.63656 0.86362 0.30166 0.02492 0.36499 0.76538
0.31786 0.13573 0.10675 0.76067 0.08167 0.55096
0.56465 0.74327 0.98176 0.21885 0.45440 0.51865
0.57384 0.55523 0.72855 0.62887 0.79613 0.58372
0.27477 0.82960 0.91368 0.96540 0.25208 0.11995

where I expect the second output of A will have a different first column.

With this question in mind, I went to check pdlapv2.f and found the code at line 252-287 handles the case Forward and Column (specified in the parameters).

It seems like this code iterates the column block jb from the start to the end; broadcasting it to every process in the row, and everyone then does the same job to switch columns to the corresponding ones.

So in my case, I set the pivot 7, 2, 3, 4, 5, 6 | 1 such that first process has the first 6 columns, and the second process has the last one column.

They work together and then first found 7 when iterating 7,2,3,4,5,6 not in correct place ( line 274 in pdlapv2.f ), and
do a switch 7 <--> 1. Then they found 1 not in correct place when iterating the second block 1, and they do again, which is errornous, switch 1 <--> 7.

This makes the output errornous, and I want to know if this is intended (which means I misunderstood the routine), or this is a bug.

Thanks for any help!

@j7168908jx j7168908jx changed the title Parameter for pdlapiv or pdlapv2 in ScaLAPACK Wrong pdlapiv or pdlapv2 output in ScaLAPACK Mar 8, 2024
@j7168908jx
Copy link
Author

I raised this issue two months ago when I didn't inspect the source code.

Now I have checked the source code and I think it is a bug inside ScaLAPCK. I am happy to provide any assistance needed, please help!

@j7168908jx
Copy link
Author

With further inspection and I finally get the idea of how ipiv actually is.
From the discussion in https://icl.utk.edu/lapack-forum/viewtopic.php%3Ff=2&t=1747.html#p4904

In LAPACK there are two sorts of index arrays: IPIV (eg DGETRF) and JPVT (eg DGEQPF).
IPIV is a sequence of swaps and JPVT is a column permutation.

And I also find the correct routine, defined in the driver p*qrdriver.f. But why is this routine defined there rather than being a unique file in the SRC folder? There is no reference to this routine in the user's guide as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant