Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast coercion date and time #417

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

ermolaev
Copy link

@ermolaev ermolaev commented Feb 2, 2021

I benchmarked dry-types and activemodel and noticed that dry-types has low perfomance, and I dip inside code rails

https://github.com/rails/rails/blob/main/activemodel/lib/active_model/type/helpers/time_value.rb#L72
https://github.com/rails/rails/blob/main/activemodel/lib/active_model/type/date.rb#L30

rails use regexp for frequent case

@ermolaev ermolaev requested a review from solnic as a code owner February 2, 2021 22:06
#
# Comparison:
# DRY 2021-02-03T00:10:54.597+03:00: 157549.5 i/s
# AM 2021-02-03T00:10:54.597+03:00: 40502.8 i/s - 3.89x (± 0.00) slower
Copy link
Author

@ermolaev ermolaev Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its confused me, and I found strange code in rails and created issue
rails/rails#41316

Copy link
Member

@solnic solnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to work on this. Performance improvements are always welcome! In my experience, regexps are typically very slow in Ruby but let's see where this will go 😄 I didn't know Date.parse is so slow btw.

ISO_DATE = /\A(\d{4})-(\d\d)-(\d\d)\z/.freeze
def fast_string_to_date(string)
if string =~ ISO_DATE
::Date.new $1.to_i, $2.to_i, $3.to_i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not thread-safe so it cannot use these nasty regexp globals

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually is. These are pseudo-global

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://ruby-doc.org/core-2.7.2/Regexp.html#class-Regexp-label-Special+global+variables

These global variables are thread-local and method-local variables.

@flash-gordon
Copy link
Member

@ermolaev why isn't it a patch to Ruby?

@flash-gordon
Copy link
Member

I mean it sounds strange we invent "performance-optimized" code and copy it between line libraries... What if there's another, even faster set of regex? :)

@flash-gordon
Copy link
Member

Just what I said lol rails/rails@9ea15f1

@flash-gordon
Copy link
Member

Another thought (sorry for spamming): we should provide a way to fix the format to ISO8601. Mb it should be a different key, e.g. :iso_time and :iso_date. It'll be 💯 faster.

@ermolaev
Copy link
Author

ermolaev commented Feb 3, 2021

why isn't it a patch to Ruby?

ruby has other methods for parsing time in strict formats

Time.iso8601("2011-10-05T22:26:12-04:00")
Time.strptime("2000-10-31", "%Y-%m-%d") 

this methods is fast, but this is a very strict format logic, this methods throw exeptions, and for fallback we need handle it, but exeptions is very slow

Dry::Types["params.date"] Dry::Types["json.date"] apply string in customs format and it more robust code for parsing times

Another thought (sorry for spamming): we should provide a way to fix the format to ISO8601. Mb it should be a different key, e.g. :iso_time and :iso_date. It'll be 100 faster.

I think new keys will be confuse developers, and nobody will use them, also better to encapsulate the parsing details in the library

@flash-gordon
Copy link
Member

this methods is fast, but this is a very strict format logic, this methods throw exeptions, and for fallback we need handle it, but exeptions is very slow

But we don't define Date.parse and Time.parse either. Shouldn't they be improved in Ruby instead? Currently, what we discuss here started as a custom parsing method for parsing time in the mysql adapter (!) in rails (rails/rails@bd087e6). It looks like nobody tried to backport it to Ruby itself.

I think new keys will be confuse developers, and nobody will use them, also better to encapsulate the parsing details in the library

These are strong assumptions and I don't share them. It's true that being faster by default is better but we also need to consider the price (support, bugs, etc). In my opinion, parsing dates without a predefined format is a potential source of errors. Delegating format detection to Time.parse you create an implicit dependency on how it operates. Not only it's slower, but it can also change quite randomly accepting more/fewer formats. This is likely not a problem for params but I'd expect json types to be more strict.

This is what I use personally

      TypeContainer.register(
        'json.iso_time',
        ::Types::Time.constrained(
          gteq: Time.new(2014, 1, 1),
          lteq: Time.new(2038, 1, 1)
        ).constructor { |value, &failed|
          Try[::ArgumentError] { ::Time.iso8601(value.sub(' ', 'T')) }.value_or {
            failed.(value)
          }
        }
      )

I'm not against the change (though I'm not sure about the implementation) but I think we should try adding it to Ruby first.

Side note: is there a safe strptime in Ruby?

@flash-gordon
Copy link
Member

I looked it up, seems like using Date._strptime or Date._iso8601 would be faster and cleaner. I checked, they are even defined in JRuby. WDYT?

@ermolaev
Copy link
Author

ermolaev commented Feb 3, 2021

I mean it sounds strange we invent "performance-optimized" code and copy it between line libraries... What if there's another, even faster set of regex? :)

Ruby has regexp for Time.iso8601 https://github.com/ruby/ruby/blob/v3_0_0/lib/time.rb#L614 😱

I looked it up, seems like using Date._strptime or Date._iso8601 would be faster and cleaner. I checked, they are even defined in JRuby. WDYT?

Date benchmark

ISO_DATE = /\A(\d{4})-(\d\d)-(\d\d)\z/
string = '2011-12-03'
Benchmark.ips do |x|
  x.report('parse') do |n|
    while n > 0
      Date.parse(string)
      n-=1
    end
  end

  x.report('_parse') do |n|
    while n > 0
      ::Date.new(*::Date._parse(string, false).values_at(:year, :mon, :mday))
      n-=1
    end
  end

  x.report('_iso8601') do |n|
    while n > 0
      ::Date.new(*::Date._iso8601(string).values_at(:year, :mon, :mday))
      n-=1
    end
  end

  x.report('_strptime') do |n|
    while n > 0
      ::Date.new(*Date._strptime(string, '%Y-%m-%d').values_at(:year, :mon, :mday))
      n-=1
    end
  end

  x.report('_strptime.values') do |n|
    while n > 0
      ::Date.new(*Date._strptime(string, '%Y-%m-%d').values)
      n-=1
    end
  end

  x.report('regexp') do |n|
    while n > 0
      if string =~ ISO_DATE
        ::Date.new($1.to_i, $2.to_i, $3.to_i)
      end
      n-=1
    end
  end

  x.compare!
end

# Comparison:
#     _strptime.values:  1282303.3 i/s
#            _strptime:  1194136.2 i/s - same-ish: difference falls within error
#               regexp:   997567.6 i/s - 1.29x  (± 0.00) slower
#             _iso8601:   529221.1 i/s - 2.42x  (± 0.00) slower
#                parse:   282310.3 i/s - 4.54x  (± 0.00) slower
#               _parse:   270369.9 i/s - 4.74x  (± 0.00) slower

Time benchmark

ISO_DATETIME = /
        \A
        (\d{4})-(\d\d)-(\d\d)(?:T|\s)            # 2020-06-20T
        (\d\d):(\d\d):(\d\d)(?:\.(\d{1,6})\d*)?  # 10:20:30.123456
        (?:(Z(?=\z)|[+-]\d\d)(?::?(\d\d))?)?     # +09:00
        \z
      /x.freeze
def fast_string_to_time(string)
  return unless ISO_DATETIME =~ string

  usec = $7.to_i
  usec_len = $7&.length
  if usec_len&.< 6
    usec *= 10**(6 - usec_len)
  end

  if $8
    offset = $8 == "Z" ? 0 : $8.to_i * 3600 + $9.to_i * 60
  end

  ::Time.local($1.to_i, $2.to_i, $3.to_i, $4.to_i, $5.to_i, $6.to_i, usec, offset)
end

string = '2021-02-04T01:48:47.551+03:00'

Benchmark.ips do |x|
  x.report('parse') do |n|
    while n > 0
      ::Time.parse(string)
      n-=1
    end
  end

  x.report('iso8601') do |n|
    while n > 0
      ::Time.iso8601(string)
      n-=1
    end
  end

  x.report('_parse') do |n|
    while n > 0
      ::Time.new(*::Date._parse(string, false).values_at(:year, :mon, :mday, :hour, :min, :sec, :sec_fraction))
      n-=1
    end
  end

  x.report('_iso8601') do |n|
    while n > 0
      ::Time.new(*::Date._iso8601(string).values_at(:year, :mon, :mday, :hour, :min, :sec, :sec_fraction))
      n-=1
    end
  end

  x.report('_strptime') do |n|
    while n > 0
      ::Time.new(*Date._strptime(string, '%Y-%m-%dT%H:%M:%S.%N%z').values_at(:year, :mon, :mday, :hour, :min, :sec, :sec_fraction))
      n-=1
    end
  end

  x.report('regexp') do |n|
    while n > 0
      fast_string_to_time(string)
      n-=1
    end
  end

  x.compare!
end

# Comparison:
#               regexp:   201472.4 i/s
#            _strptime:   159658.5 i/s - 1.26x  (± 0.00) slower
#             _iso8601:   140898.1 i/s - 1.43x  (± 0.00) slower
#              iso8601:   138998.2 i/s - 1.45x  (± 0.00) slower
#               _parse:    66379.7 i/s - 3.04x  (± 0.00) slower
#                parse:    57152.7 i/s - 3.53x  (± 0.00) slower

@flash-gordon
Copy link
Member

Thanks for this! A minor heads up. I benchmarked various options and looked into the ruby sources. It turns out that ruby itself relies on regular expressions internally. My benchmarking showed approx 20% difference between ._sptrptime and regexes though ._strptime is more accurate (doesn't parse invalid dates while regexes aren't so strict). I'm not yet sure which way is better, I'll get back to it sometime soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants