Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

Support unicode DN #957

Open
pyhedgehog opened this issue Nov 23, 2023 · 8 comments
Open

Support unicode DN #957

pyhedgehog opened this issue Nov 23, 2023 · 8 comments

Comments

@pyhedgehog
Copy link

pyhedgehog commented Nov 23, 2023

Actual result

> require('ldapjs').parseDN('cn=проба,ou=пера,dc=example,dc=com').toString()
'cn=\\d0\\bf\\d1\\80\\d0\\be\\d0\\b1\\d0\\b0,ou=\\d0\\bf\\d0\\b5\\d1\\80\\d0\\b0,dc=example,dc=com'

Expected result

> require('ldapjs').parseDN('cn=проба,ou=пера,dc=example,dc=com').toString()
'cn=проба,ou=пера,dc=example,dc=com'

In wild life

At least microsoft AD supports (and returns) utf8 DNs.

@jsumners
Copy link
Member

The string is correct. The spec only allows ASCII characters. Any other characters must be escaped. See #860 (comment).

@pyhedgehog
Copy link
Author

Sorry for opening this issue without founding this comment...
But then I've returned from ideal RFC world to real AD life.
I've searched for some DN with non-ascii chars both in first and second RDN and done something like:

base1='cn=проба,ou=пера,dc=example,dc=com'
base2=String(ldapjs.parseDN(base1))
base3='ou=пера,dc=example,dc=com'
base4=String(ldapjs.parseDN(base3))
// check1
ldap.search(base1,{scope:'base'},cb)
// check2
ldap.search(base2,{scope:'base'},cb)
// check3
ldap.search(base3,{scope:'sub'},cb)
// check4
ldap.search(base4,{scope:'sub'},cb)

All searches got Uncaught LDAPError [NoSuchObjectError]: No Such Object error.
Then I've returned to shell:

# ldapsearch $ldap_auth_args -b 'cn=проба,ou=пера,dc=example,dc=com' -s base -LLL '' dn
dn:: Y2490L/RgNC+0LHQsCxvdT3Qv9C10YDQsCxkYz1leGFtcGxlLGRjPWNvbQ==
# ldapsearch $ldap_auth_args -b 'cn=\d0\bf\d1\80\d0\be\d0\b1\d0\b0,ou=\d0\bf\d0\b5\d1\80\d0\b0,dc=example,dc=com' -s base -LLL '' dn
No such object (32)
Matched DN: dc=example,dc=com
Additional information: 0000208D: NameErr: DSID-0310028D, problem 2001 (NO_OBJECT), data 0, best match of:
        'dc=example,dc=com'

And same for OU/sub search — raw unicode argument works, but "escaped" version doesn't. In ldapjs both formats fails.

PS: Obviously I've replaced real DNs in examples, but sense is kept intact.
PPS: How to get from returned "escaped" format of DN "raw" one?

@jsumners
Copy link
Member

What version of ldapjs and @ldapjs/filter are you using?

@pyhedgehog
Copy link
Author

$ npm ls --depth=2|grep ldap
@...
└─┬ [email protected]
  ├─┬ @types/[email protected]
  └─┬ [email protected]
    ├── @ldapjs/[email protected]
    ├── @ldapjs/[email protected]
    ├── @ldapjs/[email protected]
    ├── @ldapjs/[email protected]
    ├── @ldapjs/[email protected]
    ├── @ldapjs/[email protected]
    ├── @ldapjs/[email protected]
    ├── @ldapjs/[email protected]

@ncphins
Copy link

ncphins commented Jan 8, 2024

This is the same error I have been seeing for quite some time. We have not moved to 3.x because of it. It does not reproduce with the openLDAP container. But it does with 2 production MSAD servers.

I am trying to get some logs from the servers. I have a theory, but I need logs to be able to support it.

@ncphins
Copy link

ncphins commented Jan 9, 2024

Here is what I am seeing. I cannot reproduce this with the openldap server. But let me see if I can explain what I am seeing to see if it helps. I really want to see if I can provide some assistance.

I have 2 production MS AD Servers. Using ldapsearch, both servers behave the same when searching for a DN.
WORKS: ldapsearch -x -b "CN=Joe Glaßer,OU=users,OU=Germany,DC=FOO,DC=com" -H ldap://SERVER.COM:389 -D "CN=LDAP-Read,DC=FOO,DC=com" -W -s base "(objectclass=*)" dn cn
DOES NOT WORK: ldapsearch -x -b "CN=Joe Gla\C3\9Fer,OU=users,OU=Germany,DC=FOO,DC=com" -H ldap://SERVER.COM:389 -D "CN=LDAP-Read,DC=FOO,DC=com" -W -s base "(objectclass=*)" dn cn

The second gives Object Not Found.

However, both of those commands work fine with the OpenLDAP server used for the unit tests. I have that running in a docker container and have the same user configured. I can search either with the escaped version and without the escaped version. One interesting thing is the logs for the container seem to show non-escaped for both but I am skeptical of the log formatting.

659da634 conn=1009 op=2 SRCH base="cn=Joe Glaßer,dc=adserver,dc=com" scope=0 deref=0 filter="(&(objectClass=*))"
659da634 conn=1009 op=2 SRCH attr=dn cn
659da634 conn=1009 op=2 SEARCH RESULT tag=101 err=0 nentries=1 text=

I did some tests on the DN itself.

This code will dump the parsed DN, then do thorough the DN popping the RDNs and dumping them using toString.

const parsedDN = parseDN('cn=Joe Glaßer,dc=adserver,dc=com');
console.log(parsedDN.toString());
let rdn = parsedDN.pop();
 while (rdn) {
     console.log(rdn.toString());
     rdn = parsedDN.pop();
 }

The output looks like:

cn=Joe Gla\c3\9fer,dc=adserver,dc=com
dc=com
dc=adserver
cn=Joe Gla\c3\9fer

However, if I add the {unescaped: true} option to the console.log(rdn.toString({unescaped: true})) then I get the "expected" output.

cn=Joe Gla\c3\9fer,dc=adserver,dc=com
dc=com
dc=adserver
cn=Joe Glaßer

When I look at the search code, the baseDN that is passed in is converted to a DN if a string is passed in (via the parseDN method) or just set if it is a DN object already.

My theory is that when the search is formatted to be sent to the server, a call is made on the DN using toString. And that results in the escaped value being sent to the server. That is fine for the test OpenLDAP server (as far as my testing goes) but that fails for both of my production MS AD servers as they do not like the escaped value. There is no {unescaped: true} option for the DN.toString() but even if here were, there would be no way to tell the search code to use it. But changing the DN.toString method to use the {unescaped: true} option when concatenating the RDNs makes it work great for both my servers and maintains compatibility with the OpenLDAP test container as well.

I wish I could generate a test that fails, but I cannot using the container.

@jsumners
Copy link
Member

jsumners commented Jan 9, 2024

Thank you for the detailed diagnosis. It is appreciated. I am convinced this is the same problem under discussion in ldapjs/filter#9 (comment). I need to find some time to re-learn a lot of this stuff and determine how to properly solve it. Basically, we need to replace \c3 with the byte 0xc3 but leave \\c3 alone when building the BER instance.

@Foorack
Copy link

Foorack commented Jan 24, 2024

Was just bitten by this as well. 🙂 For example: CN=Faxälv\, Max,OU=Users,DC=example,DC=com is impossible for ldapjs to fetch, as it encodes "ä" into byte sequence which Microsoft AD refuses to understand.

NOT RECOMMENDED/NOT A FIX: Replacing all contents of ./node_modules/@ldapjs/dn/lib/utils/escape-value.js to the following debug-contents to """properly""" (against spec, but Microsoft-compatible) handle Unicode; A.K.A escape those characters in the embeddedReservedChars list but leave everything else intact, temporarily solves the issue.

'use strict'
/// DO NOT USE IN PRODUCTION - NOT TESTED - PURELY FOR DEBUG
/// This file is for patching the ldapjs library at the following location:
/// @ldapjs/dn/lib/utils/escape-value.js
/// in order to """properly""" (against spec) handle Unicode. This is against RFC 4514,
/// but is required for Microsoft Active Directory to respond to queries with Unicode characters.

module.exports = function escapeValue(value) {
  if (typeof value !== 'string') {
    throw Error('value must be a string')
  }

  // 1. Split up input into individual UTF-8 characters.
  const toEscape = value.split('');
  // 2. Loop over all forbidden characters, and escape them.
  const embeddedReservedChars = [
      '"',
      '+',
      ',',
      ';',
      '<',
      '>'
  ];
  const escaped = [];
  for (let i = 0; i < toEscape.length; i++) {
    const char = toEscape[i];
    if (embeddedReservedChars.includes(char)) {
      escaped.push('\\' + char.charCodeAt(0).toString(16).padStart(2, '0'));
    } else {
      escaped.push(char);
    }
  }
  return escaped.join('');
}

Looking in Wireshark the query now looks like this:
image
and with this it successfully gives me a result. 🥳

Comparing this to the old/broken query:
image

I do not know what the "fix" for this is; I just wanted to share my findings/experience.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants