Unique values in a 1-D array and their locations #670

Beliavsky · 2022-07-30T14:57:34Z

Given a 1-D array of character variables, integers, or a user-defined type such as dates, if there are repeated values one may want to create factors (the R term) that are integer variables corresponding to the unique values. Below is a small code that does this inefficiently for an array of character variables. A library subroutine that works for any data type for which an equality operator is defined would be useful.

module factors_mod
implicit none
private
public :: compress
contains
pure subroutine compress(words,factors,values)
character (len=*)         , intent(in)               :: words(:)   ! (n) 
integer                   , intent(out)              :: factors(:) ! (n) integer values corresponding to words(:)
character (len=len(words)), allocatable, intent(out) :: values(:)  ! unique values of words(:)
integer :: i,n,nfac,imatch
n = size(words)
if (size(factors) /= n) error stop "in compress, need size(words) == size(factors)"
if (n < 1) then
   allocate (values(0))
   return
end if
allocate (values(n))
values(1) = words(1)
factors(1) = 1
nfac = 1
do i=2,n
   imatch = findloc(values(:nfac),words(i),dim=1)
   if (imatch == 0) then
      nfac = nfac + 1
      factors(i) = nfac
      values(nfac) = words(i)
   else
      factors(i) = imatch
   end if
end do
values = values(:nfac)
end subroutine compress
end module factors_mod

program xfactors
use factors_mod, only: compress
implicit none
integer, parameter :: n = 5, nlen = 1
character (len=nlen) :: words(n)
character (len=nlen), allocatable :: values(:)
integer :: factors(n)
words = ["a","c","a","b","c"]
call compress(words,factors,values)
print "(a,*(1x,a))", "data:",words
print "(a,*(1x,i0))", "factors:",factors
print "(a,*(1x,a))", "values:",values
end program xfactors
! output:
! data: a c a b c
! factors: 1 2 1 3 2
! values: a c b

The text was updated successfully, but these errors were encountered:

arjenmarkus · 2022-08-04T12:18:21Z

I wonder why you say that this code is inefficient. I tried to think of different implementations, but the code as written seems to be O(N**2), with sorted data you might get down to O(N*ln(N)), but that would presume the data can be sorted (in that case, you would need more functionality than just an equality operation). Just wondering if you have a particular algorithm in mind. Op za 30 jul. 2022 om 16:57 schreef Beliavsky ***@***.***>:

…

Given a 1-D array of character variables, integers, or a user-defined type such as dates, if there are repeated values one may want to create factors <https://www.stat.berkeley.edu/~spector/s133/factors.html> (the R term) that are integer variables corresponding to the unique values. Below is a small code that does this inefficiently for an array of characters. A library subroutine that works for any data type for which an equality operator is defined would be useful. module factors_mod implicit none private public :: compress contains pure subroutine compress(words,factors,values) character (len=*) , intent(in) :: words(:) ! (n) integer , intent(out) :: factors(:) ! (n) integer values corresponding to words(:) character (len=len(words)), allocatable, intent(out) :: values(:) ! unique values of words(:) integer :: i,n,nfac,imatch n = size(words) if (size(factors) /= n) error stop "in compress, need size(words) == size(factors)" if (n < 1) then allocate (values(0)) return end if allocate (values(n)) values(1) = words(1) factors(1) = 1 nfac = 1 do i=2,n imatch = findloc(values(:nfac),words(i),dim=1) if (imatch == 0) then nfac = nfac + 1 factors(i) = nfac values(nfac) = words(i) else factors(i) = imatch end if end do values = values(:nfac) end subroutine compress end module factors_mod program xfactors use factors_mod, only: compress implicit none integer, parameter :: n = 5, nlen = 1 character (len=nlen) :: words(n) character (len=nlen), allocatable :: values(:) integer :: factors(n) words = ["a","c","a","b","c"] call compress(words,factors,values) print "(a,*(1x,a))", "data:",words print "(a,*(1x,i0))", "factors:",factors print "(a,*(1x,a))", "values:",values end program xfactors ! output: ! data: a c a b c ! factors: 1 2 1 3 2 ! values: a c b — Reply to this email directly, view it on GitHub <#670>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAN6YRY7BRKCUS5X7S4HYZLVWU7GTANCNFSM55DLAWRQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

jvdp1 · 2022-08-04T20:17:51Z

Thank you @Beliavsky for this proposition.

A library subroutine that works for any data type for which an equality operator is defined would be useful.

This can be easiliy done with fypp (see for example stdlib_sorting_sort.fypp.
If you submit a PR with a first draft of this procedure, we can help you with fypp if needed.

My current main questions concern the module (name) in which such a procedure should go, and the API of this procedure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unique values in a 1-D array and their locations #670

Unique values in a 1-D array and their locations #670

Beliavsky commented Jul 30, 2022 •

edited

arjenmarkus commented Aug 4, 2022 via email

jvdp1 commented Aug 4, 2022

Unique values in a 1-D array and their locations #670

Unique values in a 1-D array and their locations #670

Comments

Beliavsky commented Jul 30, 2022 • edited

arjenmarkus commented Aug 4, 2022 via email

jvdp1 commented Aug 4, 2022

Beliavsky commented Jul 30, 2022 •

edited