Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add basic NFC support #114

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

[WIP] Add basic NFC support #114

wants to merge 7 commits into from

Conversation

Crote
Copy link
Contributor

@Crote Crote commented Jun 27, 2020

Recently I bought a fancy ACR122U NFC reader because I got tired of plugging in my Yubikey all the time.
To my great surprise, Firefox didn't support Webauthn via NFC yet.
So I took it upon myself to try implementing this myself as #51 isn't really showing a lot of progress 😉 .

About the implementation:
The old library was tightly coupled to wrapping all APDU command in a HID layer. This has been reworked in a backend-agnostic variant, making it possible to add new communication methods.
The NFC implementation is done using pcsc-rust, which should provide bindings on Windows, Linux, and MacOS.
To use NFC, use --features nfc. This will also disable USB support for now.
This is my first time writing Rust, so there's probably a lot of improvement to be made style-wise.

Work remaining:

  • The implementation currently provides an interface similar to the OS-agnostic one, as StateMachine only allows for a single Transaction backend at the moment. This means that enabling NFC currently disables USB, so StateMachine should be reworked.
  • The backend-agnostic interface has only been implemented for Linux at the moment. The other platforms must also be converted. This should be relatively easy, but I don't have access to those platforms so I left that for a later point in time.
  • This breaks a lot of tests. I'm deferring a rewrite of them as the current code structure is just a first draft.

Open issues:

  • It seems that there is no way to easily distinguish between a USB-attached smartcard and an NFC reader. This means that we will also try to connect to a USB-attached U2F device as if it is attached via NFC. This will, of course, not work. It seems to be harmless, but perhaps it could interfere with the proper HID implementation, so we should probably try to prevent this from happening.

@Crote Crote marked this pull request as draft June 27, 2020 14:19
@jcjones
Copy link
Contributor

jcjones commented Jun 29, 2020

Thank you for this, @Crote! I'm happy to start reviewing this for inclusion into Firefox when you think it ready. I'm expecting to start work in earnest on authenticator-rs improvements here in late July myself, for reference.

@jcjones
Copy link
Contributor

jcjones commented Jul 27, 2020

I'm doing a re-layout right now, and probably going to end up splitting out platform code into code for USB HID vs other transports. I can certainly take on helping to rebase this patch if you need the assistance, but apologies in advance!

@jcjones jcjones changed the base branch from master to main July 27, 2020 21:23
@Crote
Copy link
Contributor Author

Crote commented Aug 2, 2020

Actually, that's the very reason why it stalled: I made an attempt to support multiple transports at the same time, but it quickly turned into a complete rewrite of basically the whole architecture. As a novice Rust programmer, that turned out to be way above my pay grade.

I'll look into it some more in the coming week!

@jcjones
Copy link
Contributor

jcjones commented Aug 2, 2020

Probably won't be ready this week. Hopefully next week though! :)

@jcjones
Copy link
Contributor

jcjones commented Aug 5, 2020

#123 is the sketch of how I think this is going to work. Note that I also want to rework the directory hierarchy (#124), but I don't want to do that until my set of re-arch changes in #121, #122, #123 are merged.

@jcjones
Copy link
Contributor

jcjones commented Aug 19, 2020

I've merged the auth_service branch, so if you'd like to look at what this might look like as its own subfolder, feel free! I still want to do the mass rename of the u2f code, which will necessarily move the HID stuff away from just src, but for now you could just assume it's there and we refactor accordingly later.

@Crote
Copy link
Contributor Author

Crote commented Aug 26, 2020

There are still a few issues left to be worked out, but most of the big stuff is done.
The crucial part for now is the refactoring of the code which previously was USB-only. I've tried to avoid altering the existing code too much, but some pretty dramatic changes were required to make some of that code reusable.

Could you give your thoughts on the overall structure so far? I've tried to separate the refactoring into logical commits for easy reviewing.

@jcjones
Copy link
Contributor

jcjones commented Aug 26, 2020

I will review today! Thanks!

@jcjones jcjones self-requested a review August 26, 2020 16:48
Copy link
Contributor

@jcjones jcjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really quite pleased with how this looks within this framework. I feel like I now really don't know how to move/rename things to make more sense, but that's not your/this PR's problem. :)

@jcjones
Copy link
Contributor

jcjones commented Aug 28, 2020

(oops, hit enter too soon)

I haven't done the nitty-gritty review here, but I'm pleased with the refactoring around the device info trait, and the rework of how the state machines start and the APDU movement. I just want to run through those with the proverbial fine comb.

Also I am not sure what the best testing strategy for this will be. Do you know of any virtual NFC devices we could fake pcsc to use? Or are we similarly stuck there like we are with USB?

@Crote
Copy link
Contributor Author

Crote commented Aug 29, 2020

Yeah, it's turning into a bit of a mess. The term "U2F" is starting to lose all meaning to me.
It feels like there's a lot of cleanup possible with manager.rs / monitor.rs / transaction.rs, but I haven't looked too deep into it.

There's still an issue in the way the NFCManager works though, as it seems that the old transaction is never actually cancelled. There seems to be some sort of race condition there or something.

I'm not that familiar with testing this. We wouldn't even need a virtual NFC device, any virtual smart card reader would be fine. pcsc-lite has a pscs-spy tool allowing you view the communication with the reader, but that would still require something to actually talk to. Then again, my entire knowledge of this stuff comes from doing research to make this PR in the first place, so perhaps I've just missed it.

@Crote
Copy link
Contributor Author

Crote commented Oct 24, 2020

@jcjones I've been working on it again after some time off.

Basically, the first five commits are done and should be merge-able.


I've got NFC just about working, but I'm running into some bigger architectural issues. Mainly, the way the callbacks are structured.

Right now, there's a status callback, and the StateCallback. The latter is quite finicky to work with due to the whole observer stuff. The observer only fires on the original, so any clone won't result in a transport cancel. Besides not being documented at all, it doesn't make sense to me behaviour-wise.

Ideally, I'd say there are three kinds of callbacks:

  1. A success. The obvious one, should result in a cancel on all transports. This should only happen once, of course.
  2. Initiation errors. When a request is malformed, or no transport is available. This should result in a termination and no further callbacks will happen.
  3. Timeouts. They are currently considered fatal errors. For example, the Linux USB backend returns a U2FTokenError when this happens. This should just be a, well, timeout. Nothing more than a "you didn't get a response in time, so you'll never get one" and definitely not an error.
  4. Temporary errors. This is the challenging one. What if a device returns a corrupt message? Or initiating a single transport results in an error? Or some other weird transient error because, well, we're dealing with hardware. This should just result in a warning. Sure, they're errors, but they might just go away if the user tries again. Currently they are basically considered fatal errors, yet they are anything but. Ideally you'd want to just handle them as some kind of warning status message, but the current infrastructure makes that quite hard.

So basically, the current division between status- and state-callback make it impossible to properly handle the type of events which actually happen, and they make the proper cancelling of transports extremely difficult.

What are your thoughts on this? Should I rework it to use a single callback and overhaul all the error types? Is there something I'm missing?

@jcjones
Copy link
Contributor

jcjones commented Oct 26, 2020

Thank you for continuing to work on this! I'm taking holiday this week, but I didn't want to leave this with no reply, so here's a shorter-ish reply (I've been thinking this through all weekend).

I think you're right that we need multiple callbacks. I'm not totally clear on your suggestion of what the third one to add would be, though.

We currently have a "completion" callback that indicates success, timeout, or critical error. We also have a status callback that indicates devices coming and going. I think I understand you're suggesting adding a third callback for transient errors that don't halt.

If I'm correct in that, it seems like that could be just another condition type added to the status callback, at least from the Rust side, since the callback enum objects can be made to contain all the necesssary data. I agree, however, that the C API would require a separate callback.

Assuming the above to all be correct, I'm fine with either plumbing a third channel as a non-critical error message channel, or adding to the existing status channel type.

I was pondering this exact situation with the CTAP2 PIN protocol, actually, and imagining adding to the status channel, something like:

pub enum StatusUpdate {
    DeviceAvailable { dev_info: u2ftypes::U2FDeviceInfo },
    DeviceUnavailable { dev_info: u2ftypes::U2FDeviceInfo },
    Success { dev_info: u2ftypes::U2FDeviceInfo },
+  Condition { dev_info: u2ftypes::U2FDeviceInfo, details: TemporaryCondition },
}

+ pub trait TemporaryCondition {
+    // ... stuff
+ }

I think that would end up being easier than overhauling the error types, but I haven't figured out what traits such an object would have. Then again, maybe it's better to just have more StatusUpdate types, like one for TransientError and one for PINInvalid and one for PINUnset and so on.

Gotta run to do holiday things... thank you again @Crote, I am really looking forward to merging this into Firefox!

@Crote
Copy link
Contributor Author

Crote commented Oct 26, 2020

That's not quite what I meant. The main issue I ran into, is that there are multiple callbacks to begin with.
Instead, I'd suggest merging them into a single callback (at least on the Rust side), with a more comprehensive callback object.

Something like this makes the most sense to me:

enum StatusEvent {
  DeviceAvailable { ... },    // New device found
  DeviceUnavailable { ... },  // Connection to old device lost
}

enum ResultEvent {
  Success { ... },            // Everything went well!
  Timeout,                    // Too much time elapsed, so we aborted
  Unable,                     // Turns out we have no viable transports,
                              // so a successful result is impossible
}

enum ErrorEvent {
  // All but the last case could be transient
  // The user should probably be prompted to
  // retry the device which caused the error
  U2FError { U2FTokenError }, // We successfully talked to a device using U2F,
                              // but we used the wrong command in this state or something
  ProtocolError { ... },      // We tried talking to a device, but we got garbage we can't decode
  CommunicationError { ... }, // The OS gave us a device, but we got an error trying to use it.
                              // Maybe a write to a file descriptor failed or something? 
  TransportError { ... },     // We tried using a transport, but it gave an error.
                              // Maybe we got a permissions error trying to enumerate devices or something?
                              // In any case, this transport is unusable now.
}

enum AuthEvent {
  Update { StatusEvent },     // Something changed, but we can safely ignore this
  Warning { ErrorEvent },     // Something went wrong, we should probably tell the user
  Finished { ResultEvent },   // We're done, this is the last callback which will happen
}

And probably a kind of internal error somewhere, but that's not that important right now.

The callback would simply accept an AuthEvent, and pattern matching it would be trivial: the AuthenticatorService can simply cancel all transports on a Finished, all StatusEvents are probably only logged for debugging, ErrorEvents are displayed to the user to tell them to try again, the ResultEvent is a natural result from the whole process - and is the only thing sent into the channel as is currently done in main.rs.

As to the interface on the C side, I don't know enough about it to form a meaningful opinion. But combining two C-side callbacks into a single Rust-side callback should be trivial, right?

My main intention is to have a single, clear, and unambiguous way to tell the caller what is happening. I'm not sure if this approach is the right one, but something in this direction is probably needed.

Enjoy your holiday!

@jcjones
Copy link
Contributor

jcjones commented Oct 26, 2020

Oh, I like that very much. I'll leave more for later, but I would happily handle that patch.

Now to your earlier point - you said the first 5 commits of 7 here are ready for review? I can at least start taking a look in evenings this week...

@Crote
Copy link
Contributor Author

Crote commented Oct 27, 2020

Yeah, the first five are basically just refactoring to make it possible to sanely implement an NFC transport. This branch should pass all tests and linters after every single commit and I've tried to make them all self-containing, so the first five should be completely safe to merge already.

Some parts (especially the fifth commit) do make clear that quite a lot of the existing code should probably be cleaned up afterwards, but I wanted to avoid doing basically a complete rewrite without knowing for sure that this is the right approach to use.

The names used could probably really use a second pair of eyes, as I seem to be horrible at naming things. But there's no need to hurry. Why not just read a book in your evenings or something? It's your holiday, after all!

Copy link
Contributor

@jcjones jcjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As requested, this is the first five commits. I like what I see, and but for the nits here, would merge the first five now. I'll review the next commits next, tomorrow (Friday).

}

// Size of header + data + 2 zero bytes for maximum return size.
let mut bytes = vec![0u8; U2FAPDUHEADER_SIZE + data.len() + 2];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe construct this the same as you do size down in serialize_short?


let mut size = 5; // class, ins, p1, p2, response size field
if !data.is_empty() {
size += 1 + data.len(); // data size field and data itself
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a bit more description, because it appears that the data-size field is bytes[4], accounted for in the initial value of 5. This is actually because of the overflow zeroes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already correct: when there is no command data payload (Ne=0), you omit the command length bytes (Lc).

@@ -66,7 +66,84 @@ impl StateMachine {
Default::default()
}

pub fn register(
pub fn register<T>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I'm not a big fan of the _register / register<T> naming convention, but it's a fast rename I think.

My preference would be that the generics-using register/sign, being static methods that take the exact device to work upon, be named register_dev and sign_dev, so that usage is like
StateMachine::register_dev(dev, flags..)

Whereas the instance methods remain as just register/sign so they go back to usage of sm.register(flags..)

What do you think?

return Err(io_err("payload length > 2^8"));
}

let mut size = 5; // class, ins, p1, p2, response size field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 4: the response length bytes (Le) are omitted when the desired response length (Ne) is zero.

bytes[1] = ins;
bytes[2] = p1;
// p2 is always 0, at least, for our requirements.
bytes[4] = data.len() as u8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be skipped when data.len() == 0

bytes[2] = p1;
// p2 is always 0, at least, for our requirements.
// lc[0] should always be 0.
bytes[5] = (data.len() >> 8) as u8;
Copy link
Contributor

@micolous micolous Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be skipped when data.len() = 0.

Also, Lc should be entirely skipped when there is no data, not even a prefixed zero byte.

// When sending zero data, the two data length bytes should be omitted.
// Luckily, all later bytes are zero, so we can just truncate.
if data.is_empty() {
bytes.truncate(bytes.len() - 2);
Copy link
Contributor

@micolous micolous Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong - if Nc is zero (no command data length), then it should be omitted.

In extended mode:

  • If Ne is non-zero, then the last three (if Nc=0) or two (if Nc>0) bytes are the Ne (which haven't been included here).

  • If Ne is zero, then there should be no trailing bytes.

At the moment I think you're trying to go for maximum length (65536 bytes), which should have a message ending in 00 00 00.

(Edit: this previously incorrectly stated that Ne is always 3 bytes in extended mode, when it can sometimes be 2... also the U2F spec confusingly is "based" on ISO7816-4 but then is actually incompatible... except some revisions of the spec (not the newest) actually correct this 😢 )

@micolous
Copy link
Contributor

For context on my drive-by commentary above, I've been having a look at smart card support in another FIDO2 library, which depends on this library. That other library has errors in it's ISO 7816 implementation as well, which I'm in the process of fixing up. Given the complexity of the standards, I thought I should have a peek at what's happening here as well.

FYI: some tokens don't actually implement ISO 7816 correctly, which can cause your local tests to "unexpectedly pass" but fail with other tokens. The FIDO2 NFC specs aren't very easy to read either, and they gloss over some important details that you'll only find in the ISO 7816-3/-4 specs.

@riastradh
Copy link

Is anyone actively working on this or planning to work on it in the near, foreseeable future?

The branch currently doesn't merge cleanly because the following changes conflict:

If I merge this branch instead with 660a701, the parent of the commit 47d6920 which introduced the conflicts, the merge is clean and the code compiles, but the example program fails here:

https://github.com/Crote/authenticator-rs/blob/0de730f112e07908b481f38842b1c43682f6724d/src/nfc.rs#L108-L112

This happens because, as implemented, a single NFCManager can only do one request in its lifetime -- there's no mechanism to queue multiple requests the way U2FManager does -- and the example program tries to reuse it both to make a credential and to authenticate with it.

If I insert manager.cancel() between the manager.register(...) logic and the manager.sign(...) logic, I can verify that the code works to make and authenticate credentials with a real fido card, both via contact smartcard and via NFC. (I can also verify that all automatic tests pass.)

But I think that's the wrong approach -- should do it like U2FManager in manager.rs to allow multiple queued operations, and perhaps even factor the runloop and queue management logic out of U2FManager and NFCManager.

Happy to draft a patch to do this, and happy to do a bit more work to resolve the merge conflict, but if the work is already in progress, don't want to step on anyone's toes.

@jschanck
Copy link
Collaborator

I'm not working on this. @msirringhaus are you?

Active development is happening on the ctap2-2021 branch. We're about a month away from being able to cut that over to main. We'll be removing a lot of duplicate / unnecessary code at the same time. So if you're willing to wait, it's going to be a lot easier to work on then.

Feel free to start now if you'd prefer. I can take a closer look at the patch attached to this PR on Monday and let you know what I think needs to be changed.

@riastradh
Copy link

I took a closer look at this branch than just by blindly merging and testing, and I took a look at the ctap2-2021 branch too in order to estimate what work would be involved in bringing NFC support into it.

This nfc branch retrofits the NFC/smartcard support into two different sets of StateMachine methods: register/sign, which takes an existing U2FDevice (renamed APDUDevice) and does the u2f_register/sign operations; and _register/_sign (new names for what register/sign do in main and ctap2-2021), which tries to discover devices using an OS-dependent transaction and calls register/sign.

  • NFCManager is an AuthenticatedTransport that uses a single PCSC context to discover cards on the fly and call the StateMachine's register/sign methods.
  • U2FManager is an AuthenticatedTransport that calls the StateMachine's _register/_sign methods to discover HIDs on the fly and call the StateMachine's register/sign methods.

In other words, the NFCManager inverts the discovery vs operation logic from U2FManager: NFCManager discovers cards first, then uses StateMachine just for operation; U2FManager uses StateMachine to do discovery and operation, as in main and ctap2-2021. But the two kinds of manager do the same operation under the hood -- the same CTAP1 command/response protocol -- just on a different underlying protocol transport (PCSC library calls vs OS HID I/O).

This is a little awkward, and the ctap2-2021 branch looks like it will make this approach a little more difficult because there are two different state machines, one for CTAP1 and one for CTAP2, which would both need to be adapted for the retrofit.

It seems to me it would make more sense if either:

  1. Transaction could enumerate both OS HIDs and PCSC cards, although that might require something like making each OS's DeviceBuildParameters be a disjoint union of OS HIDs and PCSC cards (if PCSC is relevant -- not on macOS, for instance, I think) and dispatching the sum type in each OS's Monitor and Device::new; or
  2. StateMachine were parametrized by different types of transactions, say OSTransaction and PCSCTransaction, which would probably require making Transaction/Device/DeviceBuildParameters as used by StateMachine be traits/parametric rather than structs -- requiring mostly mechanical changes to the OS interfaces and a little surgery to the types in StateMachine and src/transport.

It looks like some renovation might still be under way in the organization of src/transport, judging by all the TODO comments, so maybe I should hold off on trying to take any of this up in the ctap2-2021 branch until that renovation has settled further.

@msirringhaus
Copy link
Contributor

I am not (yet) actively working on this, but it is on my (ever-growing) TODO-list.

This is a little awkward, and the ctap2-2021 branch looks like it will make this approach a little more difficult because there are two different state machines, one for CTAP1 and one for CTAP2, which would both need to be adapted for the retrofit.

That is one of the things that will go away, once we merge over to main. We have currently a lot of code-duplication, because we needed a way to switch back and forth between old and new code in case a critical bug is encountered. Once we are satisfied that the new code works, we'll remove the old completely.

In general, I'd say it will be easiest to
1.) wait for the merge and correspondig code cleanup
2.) abandon this PR completely and just copy&paste some bits from it to a new PR, as many things have changed since this PR was issued.

I would need to experiment a bit with this myself, in order to figure out, how exactly this should be integrated into the existing code.

@dsseng
Copy link

dsseng commented Nov 7, 2024

Is anyone working/planning to work on this or contributions are welcome?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants