Suggestion - Post library download statistics/analytics numbers & graphs #1078

Jakobud · 2013-03-22T20:46:24Z

Does CDNJS keep track of how many times individual files are pulled from their CDN? It would be absolutely awesome if CDNJS did this and then had up-to-update charts and data tables showing how many downloads each file is getting.

This would be epic because it would allow developers to choose which library versions they want their users downloading. The developer would want to choose the library version that is obviously compatible with their code, but also they would choose the one that is downloaded the most over the past X amount of time.

For example, lets say the latest jQuery just gets released and put on CDNJS. A couple days pass and the stats for jQuery look like this for the past week:

jQuery 1.9.1 = 20,000 downloads
jQuery 1.9.0 = 50,000 downloads
jQuery 1.8.3 = 560,000 downloads
jQuery 1.8.2 = 120,000 downloads
etc...

The developer can look at this and know that it's more likely that their visitors are already going to have jQuery 1.8.3 cached as opposed to 1.9.1 since it's new. So as long as their code is 1.8.3 compatible, they would choose this one.

And since these numbers change over time, maybe a month later the developer comes back to CDNJS and see's now that the 1.9.1 stats are higher than 1.8.3, so again, as long as his code is 1.9.1 compilant, he could safely switch his site to use 1.9.1 since his visitors are now more likely to already have 1.9.1 cached.

Does this make sense? To me it would be EXTREMELY useful. The whole point of CDNJS is so that developers share libraries and resources. So over time, as more and more libraries get added to CDNJS and more and more versions of those libraries are added, it would be invaluable to have a tool like this in order for the developers to make informed decisions based on which libraries and resources are being shared the most.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

ryankirkman · 2013-03-22T22:49:14Z

@Jakobud Great suggestion Jake. You're absolutely right that this would really useful, and it is a popular request: #405

We're brainstorming solutions right now, so we're glad to have you as part of the conversation.

Lockyc · 2013-03-23T05:37:48Z

Closed old issue #405 continue conversation here

thomasdavis · 2013-06-27T07:16:21Z

Tagged as high priority, anyone have any brilliant ideas yet on how to parse a few billion lines?

Jakobud · 2013-06-27T14:53:46Z

How many lines is the typical log file? Do you split the log files up to one-per-day or smaller? Do the log files simply say what http://path/file was downloaded? Or does it have references to database row id's (id's of each filename which I assume are stored in a database)?

ryankirkman · 2013-06-28T00:52:57Z

Each edge location (currently 23) is treated independantly from every other.

So what we have is one or more log files per edge location per day, and we
are getting a significant number of hits.

On Friday, June 28, 2013, Jake Wilson wrote:

How many lines is the typical log file? Do you split the log files up to
one-per-day or smaller?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1078#issuecomment-20125044
.

Jakobud · 2013-06-28T14:44:20Z

If you could post excerpts of the log files, that would be a place to start.

Jakobud · 2013-07-31T22:44:54Z

Any progress on this? You guys need any help with it? I know there are probably a lot of huge log files, but I think it would only be a matter of a simply python script that streamed in the log files and saved out the data to a database or something like that. It would be a long running process but it probably wouldn't be that complicated really.

Jakobud · 2014-08-07T16:07:08Z

FYI, I don't know if cdnjs utilizes AWS services on the backend or not, but this is an interesting article that is potentially very relevant to this issue:

http://aws.amazon.com/blogs/aws/all-your-data-fluentd/

It discusses using software called Fluentd to stream logfile changes into data storage. So for CDNJS, it could stream library access logs into some sort of usage database that could be used to display usage statistics.

Jakobud · 2014-08-07T16:15:02Z

Also, FYI you guys could get someone to help you with a solution for this if you could divulge details about your logging. How it works, where the files are stored, give us access to a day or weeks worth of logs, etc... Someone could figure out a solution for you.

Jakobud · 2014-08-07T19:23:06Z

Another suggestion for you guys, just make your logs public. Put them up on AWS S3 or something and allow anyone to grab them. I GUARANTEE someone (or multiple people probably) will come up with an analytics solution for you.

Jakobud · 2014-12-11T06:07:30Z

Just wanted to reach out regarding this issue again. I'll say it again, provide some example log files and someone somewhere will put together a parser for you that will pull library download stats.

PeterDaveHello · 2014-12-11T07:35:03Z

ping @thomasdavis

IonicaBizau · 2015-05-23T18:50:35Z

Creating an api service for cdnjs would be nice. Something like:

api.cdnjs.com/lib/jquery/stats

Then, we can use this service to fetch the stats in the cdnjs website. 🍀

PeterDaveHello · 2015-05-24T07:27:44Z

Stats from website is easy, but people want the stats from cdn, I remember that cloudflare didn't give us that info or access log.

cc @thomasdavis @ryankirkman @terinjokes

ryankirkman · 2015-05-24T18:35:03Z

We can get access to the logs, but the log volume is so large we need to
figure out an aggregation strategy
On Sun, May 24, 2015 at 12:28 AM Peter Dave Hello [email protected]
wrote:

Stats from website is easy, but people want the stats from cdn, I remember
that cloudflare didn't give us that info or access log.

cc @thomasdavis https://github.com/thomasdavis @ryankirkman
https://github.com/ryankirkman @terinjokes
https://github.com/terinjokes

—
Reply to this email directly or view it on GitHub
#1078 (comment).

davidbau · 2015-05-26T16:50:30Z

Approximate stats would be nearly as good. If log volume is a problem, logs could be sampled.

thomasdavis · 2015-05-26T16:52:15Z

That is true! Even one day of traffic * 30 would be interesting enough.

Jakobud · 2015-05-26T17:28:30Z

Where are the logs now? Are they accessible in any form? I would think dumping daily logs on some S3 storage would be feasible and then someone could write something that parses them.

IonicaBizau · 2015-05-26T17:30:29Z

I would be excited to write a tool to parse the logs! I'm anyway involved in some statistics & visualizations projects, so that would be awesome. 🎇

Jakobud · 2015-05-26T18:24:37Z

Like I said before, all CDNJS needs to do is make the logs accessible in some form, and someone will step up to write a cool parser to generate usage stats.

PeterDaveHello · 2015-05-27T10:48:05Z

We are doing now, the IP address in the log will be sensitive, should be careful.

fj · 2015-07-13T17:39:55Z

Any update on this? Throwing my hat in the ring as another person who'd be willing to write a parser.

PeterDaveHello · 2015-07-13T18:35:26Z

Hey dear all, I'm afraid not, there are some issues more important, but will try our best to have this feature asap.

PeterDaveHello · 2015-07-13T18:36:18Z

BTW, thanks for guys want to write parser for us, if you don't mind, you can still contribute to other parts of cdnjs, like bower auto-updater or something, thanks!

Jakobud · 2015-11-19T16:48:34Z

Any more updates on this one? It's been over 2 1/2 years. Have you guys just considered making your logs publicly accessible in some form?

Help us Help you!

PeterDaveHello · 2015-11-19T16:49:41Z

though ping @thomasdavis @ryankirkman @terinjokes @drewfreyling ...

Jakobud · 2015-11-19T17:18:37Z

Hey so I know that back on #405 the issue was money. The logs are in Common Format, however to pull down the logs for 5 million hits its $300 per day or something like that. (2 1/2 years later you guys probably get WAY more than 5 million hits a day).

So the solution thrown out there was to setup a parse on an EC2 instance. This would be the best solution. As long as your EC2 instance is in the same region as your S3 container, there is no cost to transfer you log files from S3 to your EC2 instance.

So essentially, the solution would be to have some sort of daily task that happens:

EC2 instance starts up
Script pulls logs for last 24 hours from S3 container
Script parses logs
Script deletes local log
Script dumps the data in whatever form you want into some database somewhere
Script terminates EC2 instance

So this would be an absolute minimal cost. You would only pay for the time the instance is active. Scheduling an EC2 instance to turn on every 24 hours shouldn't be too hard. And I'm pretty sure you can self-terminate an EC2 instance programmatically.

Just a thought. It honestly wouldn't be too terribly difficult to figure out...

Jakobud · 2015-11-19T17:35:04Z

Actually an even better solution would be using AWS Data Pipeline

http://aws.amazon.com/documentation/data-pipeline/

And AWS Elastic Map Reduce

https://aws.amazon.com/elasticmapreduce/

Those tools are made to do exactly what you guys need to do: Analyze data/logs in a cost efficient manner.

ryankirkman · 2015-11-19T17:41:04Z

Hi Jake,

The solution you proposed is very elegant, but unfortunately we don't use
Cloudfront for hosting the CDN anymore. Cloudflare is the primary network
provider.

As for a stats solution, we don't have a good answer yet sorry Jake.
On Thu, Nov 19, 2015 at 9:35 AM Jake Wilson [email protected]
wrote:

Actually an even better solution would be using AWS Data Pipeline

http://aws.amazon.com/documentation/data-pipeline/

And AWS Elastic Map Reduce

https://aws.amazon.com/elasticmapreduce/

Those tools are made to do exactly what you guys need to do: Analyze
data/logs in a cost efficient manner.

—
Reply to this email directly or view it on GitHub
#1078 (comment).

PeterDaveHello · 2015-11-19T19:57:02Z

@ryankirkman can we evaluate the disk size we need per day, and maybe i can find the storage.

Jakobud · 2015-11-19T21:24:56Z

Are Cloudflare logs accessible to you in some form, downloadable or via an API or anything? Also, EC2 transfer pricing:

Data Transfer IN To Amazon EC2 From Internet $0.00 per GB

https://aws.amazon.com/ec2/pricing/

So I assume that means you could programmatically pull in Cloudflare logs and parse them or do whatever and it would still only cost you for the time the EC2 instance is active.

dazbradbury · 2019-04-08T15:05:01Z

Looks like this issue has been pretty stagnant - is there now an alternative / feasible solution for determining library usage stats or percentages?

Taking the jQuery example - as a site owner you care about % of users arriving with the required jquery version already cached, and any stats cdnjs can provide would be awesome in determining that.

MattIPv4 · 2019-04-18T19:45:27Z

Currently waiting on Cloudflare to establish a way for us to have stats/log access for the cdnjs.cloudflare.com domain. Will post updates as I get them.

MattIPv4 · 2019-04-18T19:46:07Z

Noted from #6186 that more in-depth stats would be useful such as country breakdowns.

MattIPv4 · 2019-06-06T16:11:16Z

@dknecht Please can we use this issue to track any updates on further stats/logs access to the cdnjs.cloudflare.com domain. Thanks :)

ryankirkman mentioned this issue Mar 23, 2013

Stats on usage? #405

Closed

petecooper mentioned this issue Jun 22, 2014

lib versions usage stats, please #3430

Closed

PeterDaveHello added 💡 Help wanted and removed 💡 Help wanted labels Aug 7, 2015

PeterDaveHello added 🚨 High Priority and removed 🚨 High Priority Feature - High Priority labels Oct 23, 2016

AmNotADev assigned AmNotADev and MattIPv4 Apr 18, 2019

AmNotADev added the 📒 Documentation label Apr 18, 2019

MattIPv4 added the ⛅️ Waiting for Cloudflare label Apr 18, 2019

MattIPv4 mentioned this issue Apr 18, 2019

Breakdown by country #6186

Closed

MattIPv4 unassigned AmNotADev Jun 6, 2019

MattIPv4 assigned dknecht Jul 7, 2019

MattIPv4 mentioned this issue Mar 30, 2020

Question: Do you plan to offer access to logs or analytics #2584

Closed

xtuc removed the 🚨 High Priority label Jun 29, 2020

Suggestion - Post library download statistics/analytics numbers & graphs #1078

Suggestion - Post library download statistics/analytics numbers & graphs #1078

Comments

Jakobud commented Mar 22, 2013 • edited by PeterDaveHello

ryankirkman commented Mar 22, 2013

Lockyc commented Mar 23, 2013

thomasdavis commented Jun 27, 2013

Jakobud commented Jun 27, 2013

ryankirkman commented Jun 28, 2013

Jakobud commented Jun 28, 2013

Jakobud commented Jul 31, 2013

Jakobud commented Aug 7, 2014

Jakobud commented Aug 7, 2014

Jakobud commented Aug 7, 2014

Jakobud commented Dec 11, 2014

PeterDaveHello commented Dec 11, 2014

IonicaBizau commented May 23, 2015

PeterDaveHello commented May 24, 2015

ryankirkman commented May 24, 2015

davidbau commented May 26, 2015

thomasdavis commented May 26, 2015

Jakobud commented May 26, 2015

IonicaBizau commented May 26, 2015

Jakobud commented May 26, 2015

PeterDaveHello commented May 27, 2015

fj commented Jul 13, 2015

PeterDaveHello commented Jul 13, 2015

PeterDaveHello commented Jul 13, 2015

Jakobud commented Nov 19, 2015

PeterDaveHello commented Nov 19, 2015

Jakobud commented Nov 19, 2015

Jakobud commented Nov 19, 2015

ryankirkman commented Nov 19, 2015

PeterDaveHello commented Nov 19, 2015

Jakobud commented Nov 19, 2015

dazbradbury commented Apr 8, 2019

MattIPv4 commented Apr 18, 2019

MattIPv4 commented Apr 18, 2019

MattIPv4 commented Jun 6, 2019

Jakobud commented Mar 22, 2013 •

edited by PeterDaveHello