HN Front Page - May 23
Text Size:   Decrease text size   Increase text size    

Don't use Hadoop – your data isn't that big (2013)

image possibly inspired by this post

"So, how much experience do you have with Big Data and Hadoop?" they asked me. I told them that I use Hadoop all the time, but rarely for jobs larger than a few TB. I'm basically a big data neophite - I know the concepts, I've written code, but never at scale.

The next question they asked me. "Could you use Hadoop to do a simple group by and sum?" Of course I could, and I just told them I needed to see an example of the file format.

They handed me a flash drive with all 600MB of their data on it (not a sample, everything). For reasons I can't understand, they were unhappy when my solution involved pandas.read_csv rather than Hadoop.

Hadoop is limiting. Hadoop allows you to run one general computation, which I'll illustrate in pseudocode:

Scala-ish pseudocode:

collection.flatMap( (k,v) => F(k,v) ).groupBy( _._1 ).map( _.reduce( (k,v) => G(k,v) ) )

SQL-ish pseudocode:

SELECT G(...) FROM table GROUP BY F(...)

Or, as I explained a couple of years ago:

Goal: count the number of books in the library.

Map: You count up the odd-numbered shelves, I count up the even numbered shelves. (The more people we get, the faster this part goes. )

Reduce: We all get together and add up our individual counts.

The only thing you are permitted to touch is F(k,v) and G(k,v), except of course for performance optimizations (usually not the fun kind!) at intermediate steps. Everything else is fixed.

It forces you to write every computation in terms of a map, a group by, and an aggregate, or perhaps a sequence of such computations. Running computations in this manner is a straightjacket, and many calculations are better suited to some other model. The only reason to put on this straightjacket is that by doing so, you can scale up to extremely large data sets. Most likely your data is orders of magnitude smaller.

But because "Hadoop" and "Big Data" are buzzwords, half the world wants to wear this straightjacket even if they don't need to.

But my data is hundreds of megabytes! Excel won't load it.

Too big for Excel is not "Big Data". There are excellent tools out there - my favorite is Pandas which is built on top of Numpy. You can load hundreds of megabytes into memory in an efficient vectorized format. On my 3 year old laptop, it takes numpy the blink of an eye to multiply 100,000,000 floating point numbers together. Matlab and R are also excellent tools.

Hundreds of megabytes is also typically amenable to a simple python script that reads your file line by line, processes it, and writes to another file.

But my data is 10 gigabytes!

I just bought a new laptop. The 16GB of ram I put in cost me $141.98 and the 256gb SSD was $200 extra (preinstalled by Lenovo). Additionally, if you load a 10 GB csv file into Pandas, it will often be considerably smaller in memory - the result of storing the numerical string "17284932583" as a 4 or 8 byte integer, or storing "284572452.2435723" as an 8 byte double.

Worst case, you might actually have to not load everything into ram simultaneously.

But my data is 100GB/500GB/1TB!

A 2 terabyte hard drive costs $94.99, 4 terabytes is $169.99. Buy one and stick it in a desktop computer or server. Then install Postgres on it.

Hadoop << SQL, Python Scripts

In terms of expressing your computations, Hadoop is strictly inferior to SQL. There is no computation you can write in Hadoop which you cannot write more easily in either SQL, or with a simple Python script that scans your files.

SQL is a straightforward query language with minimal leakage of abstractions, commonly used by business analysts as well as programmers. Queries in SQL are generally pretty simple. They are also usually very fast - if your database is properly indexed, multi-second queries will be uncommon.

Hadoop does not have any conception of indexing. Hadoop has only full table scans. Hadoop is full of leaky abstractions - at my last job I spent more time fighting with java memory errors, file fragmentation and cluster contention than I spent actually worrying about the mostly straightforward analysis I wanted to perform.

If your data is not structured like a SQL table (e.g., plain text, json blobs, binary blobs), it's generally speaking straightforward to write a small python or ruby script to process each row of your data. Store it in files, process each file, and move on. Under circumstances where SQL is a poor fit, Hadoop will be less annoying from a programming perspective. But it still provides no advantage over simply writing a Python script to read your data, process it, and dump it to disk.

In addition to being more difficult to code for, Hadoop will also nearly always be slower than the simpler alternatives. SQL queries can be made very fast by the judicious use of indexes - to compute a join, PostgreSQL will simply look at an index (if present) and look up the exact key that is needed. Hadoop requires a full table scan, followed by re-sorting the entire table. The sorting can be made faster by sharding across multiple machines, but on the other hand you are still required to stream data across multiple machines. In the case of processing binary blobs, Hadoop will require repeated trips to the namenode in order to find and process data. A simple python script will require repeated trips to the filesystem.

But my data is more than 5TB!

Your life now sucks - you are stuck with Hadoop. You don't have many other choices (big servers with many hard drives might still be in play), and most of your other choices are considerably more expensive.

The only benefit to using Hadoop is scaling. If you have a single table containing many terabytes of data, Hadoop might be a good option for running full table scans on it. If you don't have such a table, avoid Hadoop like the plague. It isn't worth the hassle and you'll get results with less effort and in less time if you stick to traditional methods.

P.S. Hadoop is a fine tool

I don't intend to hate on Hadoop. I use Hadoop regularly for jobs I probably couldn't easily handle with other tools. (Tip: I recommend using Scalding rather than Hive or Pig. Scalding lets you use Scala, which is a decent programming language, and makes it easy to write chained Hadoop jobs without hiding the fact that it really is mapreduce on the bottom.) Hadoop is a fine tool, it makes certain tradeoffs to target certain specific use cases. The only point I'm pushing here is to think carefully rather than just running Hadoop on The Cloud in order to handle your 500mb of Big Data at an Enterprise Scale.

If you need help getting started with Hadoop, O'Reilly has a decent (albeit slightly outdated) intro that will help you, only $32 on amazon.

This article is also available translated into Russian.

Close this section

Chaos Computer Clubs Breaks Iris Recognition System of the Samsung Galaxy S8

The Samsung Galaxy S8 is the first flagship smartphone with iris recognition. The manufacturer of the biometric solution is the company Princeton Identity Inc. The system promises secure individual user authentication by using the unique pattern of the human iris.

A new test conducted by CCC hackers shows that this promise cannot be kept: With a simple to make dummy-eye the phone can be fooled into believing that it sees the eye of the legitimate owner. A video shows the simplicity of the method. [0]

Iris recognition may be barely sufficient to protect a phone against complete strangers unlocking it. But whoever has a photo of the legitimate owner can trivially unlock the phone. „If you value the data on your phone – and possibly want to even use it for payment – using the traditional PIN-protection is a safer approach than using body features for authentication“, says Dirk Engling, spokesperson for the CCC. Samsung announced integration of their iris recognition authentication with its payment system „Samsung Pay“. A successful attacker gets access not only to the phone’s data, but also the owner’s mobile wallet.

Iris recognition in general is about to break into the mass market: Access control systems, also at airports and borders, mobile phones, the inevitable IoT devices, even payment solutions and VR systems are being equipped with the technology. But biometric authentication does not fulfill the advertised security promises.

CCC member and biometrics security researcher starbug has demonstrated time and again how easily biometrics can be defeated with his hacks on fingerprint authentication systems – most recently with his successful defeat of the fingerprint sensor „Touch ID“ on Apple’s iPhone. [1] „The security risk to the user from iris recognition is even bigger than with fingerprints as we expose our irises a lot. Under some circumstances, a high-resolution picture from the internet is sufficient to capture an iris“, Dirk Engling remarked.

But it is not sufficient to not upload selfies to the internet: The easiest way for a thief to capture iris pictures is with a digital camera in night-shot mode or the infrared filter removed. In the infrared light spectrum – usually filtered in cameras – the fine, normally hard to distinguish details of the iris of dark eyes are well recognizable. Starbug was able to demonstrate that a good digital camera with 200mm-lens at a distance of up to five meters is sufficient to capture suitably good pictures to fool iris recognition systems. [2]

Depending on the picture quality, brightness and contrast might need to be adjusted. If all structures are well visible, the iris picture is printed on a laser printer. Ironically, we got the best results with laser printers made by Samsung. To emulate the curvature of a real eye’s surface, a normal contact lens is placed on top of the print. This successfully fools the iris recognition system into acting as though the real eye were in front of the camera.

The by far most expensive part of the iris biometry hack was the purchase of the Galaxy S8 smartphone. Rumor has it that the next generation iPhone will also come with iris recognition unlock. We will keep you posted.


[0] Video in English (HD), also in German

[1] Chaos Computer Club breaks Apple TouchID

[2] Video (in German): Ich sehe, also bin ich … Du – Gefahren von Kameras für (biometrische) Authentifizierungsverfahren

Close this section

I’ve become worse, not better, at programming

Once in a while I like to take a peek at reddit's cscareerquestions and look at
the topics there. On a recent travel, I found an interesting question:

What makes someone a bad programmer?

In looking at the answers, I realized some of those apply to me now, but didn't
apply to me in the past. So what happened?

To begin with, my mainstay project is Lily, a programming language, and I've
been working on it for about 6 years now (give or take a couple months). Over
the years, the language has evolved but a majority of the work has been mine.
Solo developer projects are supposed to be awesome and consistent, right?

The symptoms


If I had to trace it back to something, it would be the implementation of
closures. Closures work by having the lowest-level closure function create a
pool of cells, and pass those to upward cells. The problem comes in needing to
make sure that functions are getting fresh values.

Suppose there's a var 'someval' that's closed over. One way to make sure that
the value is always up-to-date is that any read of 'someval' should be prefixed
by an access of the closure. Any assignment should be followed after with a
write to the closure. This ensures that all accesses of 'someval' will have the
same actual result.

This is implemented as a transformation that walks through opcodes. When I first
wrote it, opcodes didn't have a consistent format, so I had to write custom
iteration code and it wasn't pretty. But it was good enough to work.

Nowadays I've normalized most of the opcodes, but there are still a few strange
outliers. The closure code takes jumps into account and supposedly fixes their

Getting closures to work at all was a battle, and still today there are cases
where closures crash out the interpreter. I've resolved to fix those at some
point in the future, maybe.

I think difficulties in closures stem from how I'm inherently very bad at doing
math in my head. Often times, extending the interpreter for a new opcode
involves prodding existing opcodes to see if I move by +3 or +4, and repeated
compiles with different adjustments to see if I can get it to work.

Release cycles

The last release was in October of last year, roughly half a year ago. I try to
make releases once every 3 months since that's enough to get whatever I feel
like doing done. But this one has been an open window, and there's a 1.0 blocker
I should get to.

The roadmap

I never sat down to get an idea of where I want the project to go. This was
good at first, because I had some terrible ideas of what I wanted to do. But now
the issue tracker is barren compared to a few ideas that I have.

I've often used the excuse that nobody's around, so it doesn't matter. But
seeing a project in this state is likely to turn people away. Right now I'm
focusing on writing a new binding tool that I can later use for documentation
extraction so I can have nicer documentation.

The documentation

It's bad enough that I don't use it, yet I've never bothered to write something
until recently (nearly 6 years in) to have a pretty doc generator. Right now,
the pink and the "it came from markdown" make the documentation so bad I just
grep the builtin package for keywords or try it to see what happens.

The underlying causes

Fires everywhere

With an interpreter, there's a large enough playground that something always
needs critical attention. One fire is left to rage while another gets attention.
A better strategy would have been to never let these fires grow. Better testing,
more coverage would help. But some metrics are difficult to test for, which goes
back to me just being lazy.

Documentation, I guess

I like writing, so the core ended up with a whole lot of documentation. Some is
certainly out of date, because I've neglected to comb through it recently. In
recent other project, I hardly comment at all since few others read my work.


I'm generally antisocial, so cutting away from email/etc. for a few days is
standard practice. I set up an IRC channel which I don't visit often, a discord
I'm off most of the time, and a subreddit that's empty. Most of my social media
comes through shitposting on reddit.

I also don't blog much.


Somewhere along the line I resigned myself to "this is the way it is". I came to
accept that some areas aren't going to be that great because it's me alone. I
should have been better about spending time on a section before leaving it. But
if I do that, what do new people end up doing?

I wasn't always like this. I used to chart out how much memory that the
interpreter was using. I used to have better test coverage.

I don't like this anymore

The worst part is that I used to have some measure of pride in what I did. Now,
well, something breaks and I fix it. Done. I don't bother blogging about much of
what I do since I don't find it exciting, and it's not new. I used to be
excited at seeing new features activate and now I hit a random crash once in a
while and I just sigh.

I keep pushing on in hopes because giving up would be such a massive loss of
work. Yet at the same time, I wonder if working on this project is making me a
worse coder, instead of a better one.

Close this section

Arq 5.8.5 for Mac Fixes a Bad Bug

I hate writing blog posts like this. 

In Arq 5.7.9 we changed the way Arq caches the list of objects at the destination for AWS, Google Drive and Google Cloud Storage. For those 3 destination types, all non-packed objects were stored in one “objects” directory. If you had years of backups and/or terabytes of data, refreshing Arq’s cache could take a very long time, and if you rebooted in the meantime, it would have to start over. So we changed Arq to only query for 1/256th of the list at a time, on demand, by “translating” the actual object paths like /objects/8a3a1fcac5dd03050dc91f8231fee8959339d68d into /objects/8a/3a1fcac5dd03050dc91f8231fee8959339d68d for use in Arq.

Unfortunately, we didn’t properly test the effect this would have on folks who had old caches. Very unfortunately, the effect was that during the periodic object-cleanup process, Arq would mistake existing objects as not being referenced by any backup record, and would delete them.

I sincerely apologize for this. I don’t know if it’s the smartest business move to write this blog post, but I’d rather honestly explain what happened than try to cover up the issue. That way people can understand what’s happening with their backups.

If you installed Arq 5.7.9, 5.8.1 or 5.8.2 sometime between April 3 and May 21, and Arq’s periodic budget enforcement (or “object cleanup” if you didn’t set a budget) process happened during that time, and you’re backing up to AWS, Google Drive or Google Cloud Storage, then you were likely affected by this issue. Please run the latest installer to get Arq 5.8.5. Arq will upload what it needs to upload going forward. But your older backup records are likely missing objects.

I’m working on putting more tests in place so this doesn’t happen before we ship another update.

Close this section

Europe's last primeval forest on 'brink of collapse'

Scientists and environmental campaigners have accused the Polish government of bringing the ecosystem of the Białowieża forest in north-eastern Poland to the “brink of collapse”, one year after a revised forest management plan permitted the trebling of state logging activity and removed a ban on logging in old growth areas.

Large parts of the forest, which spans Poland’s eastern border with Belarus and contains some of Europe’s last remaining primeval woodland, are subject to natural processes not disturbed by direct human intervention.

A Unesco natural world heritage site – the only one in Poland – the forest is home to about 1,070 species of vascular plants, 4,000 species of fungi, more than 10,000 species of insect, 180 breeding bird species and 58 species of mammal, including many species dependent on natural processes and threatened with extinction.

“At some point there will be a collapse, and if and when it happens, it’s gone forever – no amount of money in the universe can bring it back,” said Prof Tomasz Wesołowski, a forest biologist at the University of Wrocław who has been conducting fieldwork in Białowieża for each of the last 43 years. “With every tree cut, we are closer to this point of no return.”

Bialowieza map

Logging is prohibited in the Białowieża national park nature reserve, which contains woodland untouched by humans for thousands of years, but the reserve only accounts for 17% of the forest on the Polish side, leaving approximately 40,000 hectares vulnerable to state-sanctioned logging.

On recent visits to the forest, the Guardian encountered evidence of widespread logging of trees in apparent contravention of Polish and European law, including many trees that appeared to be more than 100 years old in Unesco-protected areas, with logs marked for commercial distribution.

“They are logging natural, diverse forest stands which were not planted by humans and replacing them with plantations of trees of a single age and species,” said Adam Bohdan of the Wild Poland Foundation, which monitors logging activity and provides data for scientists working at the Białowieża botanical research station.

Logging in Białowieża forest. Photograph: Adam Bohdan/Wild Poland Foundation

“They are logging in Unesco zones where timber harvesting is forbidden, they are logging 100-year-old tree stands in contravention of European law, they are logging during breeding season and destroying habitats occupied by rare species. It is disrupting natural processes which have been continuing there for thousands of years. We are losing large parts of the last natural forest – my worst nightmares are coming true,” said Bohdan.

The government argues that the logging is needed to protect the forest from a bark beetle outbreak and for reasons of public safety, both hotly disputed by conservationists.

“Logging of infested spruces does not stop a bark beetle outbreak, it just leaves thousands of hectares of clear-felled sites instead,” said Dr Bogdan Jaroszewicz of the University of Warsaw, the director of the Białowieża research station.

“Of course, dead trees can’t be left standing along public roads or tourist trails, but logging is taking place in places quite remote from these routes.”

Opponents accuse the environment minister, Jan Szyszko, who is a forester and lecturer in forest management, of sacrificing the forest for the sake of the vested interests of the Polish forestry industry.

Felled trees in Białowieża forest. Photograph: Adam Bohdan/Wild Poland Foundation

“You can see how many trees they are cutting down that don’t even have bark, and so can’t be vulnerable to the bark beetle,” said Bohdan. “They are also clearing away dead and decaying trees that are crucial for the forest’s biodiversity and natural processes, including tree regeneration – they just see it as a waste of valuable wood.”

Conservationists also point to the protected reserve in the Białowieża national park, which is surviving the same bark beetle outbreak without logging.

But logging proponents insist that most of the present-day forest is the product of human activity, giving the government the right to pursue a strategy it calls “active management”.

Szyszko is known to cite the grave of an elderly woman in the Orthodox cemetery in the nearby town of Hajnówka as evidence in support of his approach. The epitaph on the tombstone of Anastazja Pańko, who died in 2010 at the age of 101, reads “I planted this forest”.

With generous funding from the environment ministry, government supporters are engaged in a ground war for local hearts and minds in support of the “active management” approach. “The pseudo-ecologists are destroying the forest, we are rebuilding it”, reads a banner hanging over the road to Białowieża from Białystok, the regional capital.

The banner belongs to Santa, a campaign group that received more than 100,000 złoty (£20,000) in funding from state forest authorities between 2012 and 2015; Santa’s president, Walenty Wasiluk, owns a wood products wholesale firm whose website boasts of annual sales of €4m (£3.5m).

Responsible for managing land constituting approximately a third of all Polish territory, state foresters generate annual revenue of 7bn złoty (£1.2bn), controlling 96% of the Polish timber market that provides the raw material for annual wood, paper and furniture exports worth 45bn złoty. In a 2016 press release, state foresters promised a record supply of 40m cubic metres of wood in 2017.

Campaigners note a substantial discrepancy between figures provided by the Polish government in a joint report with Belarus submitted to Unesco in 2016, and those provided by state forest officials as a result of freedom of information requests. Whereas the report said there had been 21,172 cubic metres of wood extraction on the Polish side in 2016, according to official statistics the figure is actually 64,059 cubic metres – more than three times the number submitted to Unesco – with 39% of the logging conducted in zones where logging is not permitted in accordance with Unesco World Heritage obligations .

“They don’t care about trees – they care about wood,” said Adam Wajrak, a veteran environmental campaigner.

“Foresters think the forest exists to serve them, and not the other way round,” said Wesołowski. “It’s a classic monopoly – they extract money from land belonging to the Polish people and keep most of it for themselves, destroying the land in the process.”

State foresters generate annual revenue of 7bn złoty (£1.2bn), and control 96% of the Polish timber market. Photograph: Adam Bohdan/Wild Poland Foundation

That is unlikely to change any time soon. Foresters and hunters are a key electoral constituency of Poland’s ruling rightwing Law and Justice party; the new director general of state forests, Konrad Tomaszewski, is the cousin of the law and justice leader, Jarosław Kaczyński.

The Polish government now faces the threat of being taken to the European Court of Justice by the European commission. “Due to the threat of a serious irreparable damage to the site, the commission is urging the Polish authorities to reply within one month instead of a customary two-month deadline,” the commission said in April.

But campaigners say the government is impervious to its legal obligations and fear that fundamental irreparable damage has already been done.

“It’s like replacing wild birds with battery hens,” said Wajrak, as he surveyed an expanse of land recently cleared of trees and which is now surrounded by metal fencing and is bare except for neat rows of newly-planted oak saplings.

“We want a forest, not an oak farm.”

Close this section

The Future of Go Summit – Ke Jie vs. AlphaGo

Over 280 million people tuned in to watch AlphaGo play against Lee Sedol in 2016. The historic match not only marked a major milestone in the field of artificial intelligence, it also inspired a new wave of creative and innovative thinking about the game of Go.

The “Future of Go summit” in Wuzhen, China, will explore how Go’s best players and its most creative AI competitor can collaborate to uncover new mysteries about the ancient game.

The five-day festival will include a “Future of AI forum”, Team Go and Pair Go tournaments, and a one-to-one match between AlphaGo and world champion Ke Jie.

Below, you can find out the game schedule, watch the games live or replay the highlights.

Close this section

Ask HN: What is the most common security mistake you see?

Willful ignorance.

its 2017 and yet still the biggest vulnerability of them all is willful ignorance.

People have agendas to fulfill. Can't let the truth get in the way. Haven't found a budget to fit in the truth looks like we will have to do without it for now.

What is the truth?

Well nobody cares about security.

Who knows about the truth?

Mostly everybody

Why doesn't anyone do anything about it?

Trying hard just doesn't cut it anymore as we can see those with influence would rather destroy the entire internet for mass surveillance.

Also don't forget about 2FA

Lack of working backups. It's appalling how many people depend on systems with no backups.

Forget ransomware and other exotic attacks, what happens if after 7 years your hard drive decides to die on you?

I don't know. The backup is the first thing that needs to be setup and tested when you buy a computer. Windows and Mac have build-in systems for point-in-time backups and linux offers more than a few solutions.

Backup, backup and backup again :-)

Using bad programming languages, honestly. Most security problems boil down to type errors, but it's poor language choice that leads to making those.

Blind trust in open source package managers. Look at the damage the removal of left-pad from npm caused, for example, and imagine what could have happened if the author had malicious intent.

In my organization, forcing users to change passwords every x months. Everybody I know ends up picking simpler to remember passwords from a pool as a result.

Not training the employees:

I can provide email filtering, DNS filtering, firewalls and all sorts of technical solutions to security.

It all goes to pot if someone gets click-happy on weird websites or email attachments or falls for the "Your $Company_President need to transfer some funds" email.

Default passwords.

Bad secret management (hardcoded in Git, shared secrets not changed after an employee left ...)

Dev and live not properly separated/dev not properly secured.

Services exposed to the internet that shouldn't be.

Old and forgotten software / appliances.

Don't forget about the dev/sysadmin workstations!

Most common mistake I see: sharing your passwords with multiple people. Even when the product lets you have more than one account.

Mistake websites make: only letting you have one user account.

The biggest security mistake I see is the History.

Talking about history of :

1. Slack channels ( lot's of private stuff / links go to #general )

2. VCS ( lot's of passwords / tokens are in the commit history )

3. JIRA ( lot's of private information / company secrets are there )

Usually only one of those three can cost the company a big lawsuit if some employee/freelancer is deliberately being hired to make damage.

I'm imagining doing some client side validation and then the server just blindly assumes the content has been validated.

No infrastructure secret management or very bad infrastructure secret management:

* Private SSH/AWS API Keys in Github

* Shared Prod Passwords

Not applying updates to infrastructure regularly

Organizational firewalled "internal networks" that everyone connects to.

People opening untrusted web content and email attachments in Acrobat Reader and Office.

Reliance on antivirus products.

infrastructure department says: "applications is responsible for security" application department says: "infrastructure is responsible for security" developer ask: "Can I help you?"

1. Secrets in source. This is by far the easiest one to "see".

2. In Ruby code, I see a surprising amount of `eval`; mostly `class_eval`

Close this section

GNU Guix and GuixSD 0.13.0 released

We are pleased to announce the new release of GNU Guix and GuixSD, version 0.13.0!

The release comes with GuixSD USB installation images, a virtual machine image of GuixSD, and with tarballs to install the package manager on top of your GNU/Linux distro, either from source or from binaries.

It’s been 5 months since the previous release, during which 83 people contribute code and packages. The highlights include:

  • Guix now supports aarch64 (64-bit ARM processors). This release does not include a binary installation tarball though, and our build farm does not provide aarch64 substitutes yet. We are looking for aarch64 hardware to address this. Please get in touch with us if you can help!
  • Likewise, this release no longer includes a mips64el tarball, though Guix still supports that platform. We do not know whether we will continue to support mips64el in the long run; if you’d like to weigh in, please email us on!
  • The GuixSD installation image now supports UEFI. GuixSD can also be installed on Btrfs now.
  • GuixSD has support to run system services (daemons) in isolated containers as a way to mitigate the harm that can be done by vulnerabilities in those daemons. See this article from April.
  • A new guix pack command to create standalone binary bundles is available. We presented it in March.
  • Guix now runs on the brand-new 2.2 series of GNU Guile. The transition led to hiccups that we have been addressing, in particular for users of guix pull. Among other things though, the noticeable performance improvement that comes for free is welcome!
  • guix publish, which is what we use to distribute binaries, has a new --cache operation mode that improves performance when distributing binaries to a large number of users, as is the case of our build farm.
  • Many reproducibility issues found in packages have been addressed—more on that in a future post.
  • 840 new packages, leading to a total of 5,400+, and many updates, including glibc 2.25, Linux-libre 4.11, and GCC 7.
  • New system services for Redis, Exim, Open vSwitch, and more. The interface of existing services, notably that of the NGINX service, has been greatly improved.
  • Many bug fixes!

See the release announcement for details.

About GNU Guix

GNU Guix is a transactional package manager for the GNU system. The Guix System Distribution or GuixSD is an advanced distribution of the GNU system that relies on GNU Guix and respects the user's freedom.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. Guix uses low-level mechanisms from the Nix package manager, except that packages are defined as native Guile modules, using extensions to the Scheme language. GuixSD offers a declarative approach to operating system configuration management, and is highly customizable and hackable.

GuixSD can be used on an i686 or x86_64 machine. It is also possible to use Guix on top of an already installed GNU/Linux system, including on mips64el, armv7, and aarch64.

Close this section

UK survey of 14-24-year-olds indicates social networks harm mental health

Our addictive feeds of fitness models, exotic travel, and photo-perfect moments don’t often match with our comparatively humdrum and badly lit lives. The discontent caused by that disconnect is enough that a growing body of research suggests social media is contributing to mental-health problems such as anxiety, depression, sleep deprivation, and body-image issues in young people, who are the heaviest users of social media.

And Instagram, which now has 700 million users globally, appears to be the social network having the greatest negative effect, according to a new report by the UK’s Royal Society for Public Health (RSPH), an independent charity focused on health education.

The report combines previously published research on the health impacts of social media with its own UK-wide survey of nearly 1,500 people between the ages of 14-24. To discover how respondents felt different social networks—Instagram, Facebook, Snapchat, YouTube, and Twitter—affected their health, both positively and negatively, it asked them about their feelings of anxiety, connection to a community, sense of identity, sleep, body image, and more.

Only YouTube had a net-positive effect among the respondents. Every other social network came back with a net-negative effect. (In order from least negative to most, they were: Twitter, Facebook, Snapchat, and Instagram.) Respondents rated Instagram in particular as having negative effects on anxiety and body image. One of the report’s authors told CNN that girls often compare themselves to unrealistic images that have been manipulated.

The report quotes one respondent as saying, “Instagram easily makes girls and women feel as if their bodies aren’t good enough as people add filters and edit their pictures in order for them to look ‘perfect.'”

Earlier research has found that the unrealistic expectations and “fear of missing out” created across our social feeds can lower self-esteem and fuel issues such as anxiety and depression. These issues are only compounded by cyber-bullying and lack of sleep, another harmful effect linked to social media. The report cites recent research published in the Journal of Youth Studies that found one in five young people say they wake up during the night to check messages, causing them to feel exhausted during the day.

The findings weren’t all bad. Nearly 70% of respondents reported that they received emotional support on social media when times were tough, and many said their accounts offered a forum for positive self-expression. They were also able to create and maintain relationships online.

The problems centered more on forgetting that what we see isn’t always reality, and the RSPH offered some recommendations based on its findings. For one, fashion brands, celebrities, and others should consider disclosing when their photos have been manipulated. It also suggested that social networks give users a pop-up warning if they exceed a certain time spent logged on. Social platforms might even identify users with possible mental-health issues based on their usage and send a discreet message on where to get help.

Not least of all, the report said more research is needed into social media’s health effects. Social’s spread among younger generations is only growing. It’s too big a force not to consider the health consequences seriously.

Close this section

An internal Google email shows how the company cracks down on leaks

Google is facing a lawsuit in San Francisco Superior Court alleging that the company has fostered a culture of secrecy and fear. Leaks to the media are forbidden, and employees are encouraged to monitor their colleagues for leaks, according to the suit, which was filed in December by an anonymous ex-employee who claims they were unjustly fired.

We’re now seeing some of the first evidence to support those allegations. Earlier this month, attorneys for the plaintiffs entered an email as an exhibit in the case. The May 2016 document — one that has been previously quoted from in the lawsuit but not published in full — describes the result of an internal hunt for a leaker, mentioning a dedicated “stopleaks” internal email address, presumably directing to a team that the lawsuit claims is tasked with such investigations.

The story behind the all-staff email starts with an incident from last year. In April, Recode published internal, employee-generated memes roasting Nest CEO Tony Fadell for a decision to shut down products from Revolv, a company acquired by Nest. In response to the criticism, Fadell gave a talk at the company’s weekly TGIF meeting and claimed Nest’s culture, another subject of criticism, was “improving.” The transcript from the talk was then also published by Recode.

The internal email was sent a few weeks later, on May 6th, with the subject line “the recent leaks.” Written by Brian Katz, a former State Department special agent who now runs “investigations” at Google, it begins with a stern warning: “INTERNAL ONLY. REALLY.”

Katz introduces himself as the head of the “stop leaks” team, a group of employees that the lawsuit claims is tasked with tracing the source of information that makes its way to the public. Katz writes that the company identified and fired the leaker for “their intentional disregard of confidentiality.” As a result of the leaks, Google stopped publishing transcripts of its TGIF talks, opting for a live stream instead. (The plaintiff who brought the suit says he was falsely accused by Katz of being the source of the leaks.)

“We’ve all worked hard to create an environment where we can share information openly,” Katz writes. “Our culture relies on our ability to trust each other — we share a lot of confidential information, but we also commit to keeping it inside the company. We don’t want that to change.” He went on to encourage employees to share concerns with managers or through the human resources department, rather than airing them publicly. The Information noted in a story last year that, around the same time, Katz allegedly told employees in a webcast “to look to their left and look to their right,” saying one of those people may be leaking information.

Katz ended by writing that the conversation had turned “less than civil” inside Google, and points to conversations on internal Google tools like Memegen as problems. “Memegen, Misc, Internal G+, and our many discussion groups are a big part of our culture — they keep us honest — but like any conversation amongst colleagues, we should keep it respectful.”

The lawsuit alleges that Google’s leaks policy covers essentially all company information and prohibits reasonable discussion about company activities. The stop leaks team is a primary object of criticism in the suit, which alleges that the team encourages employees to report any suspicious activity from their colleagues and streamlined the leak-reporting process with a dedicated URL. The email references a dedicated email address as well: “stopleaks@”. Google declined to elaborate on how the stop leaks team works. An attempt at emailing the group resulted in a bounce.

A related case was also recently brought to the National Labor Relations Board.

“We're very committed to an open internal culture, which means we frequently share with employees details of product launches and confidential business information,” a Google spokesperson told The Verge in a statement. “Transparency is a huge part of our culture. Our employee confidentiality requirements are designed to protect proprietary business information, while not preventing employees from disclosing information about terms and conditions of employment or workplace concerns."

The full text of the email is copied below.

Subject: The recent leaks

From: Brian Katz

To: Googlers


Hi there. I’m Brian. I lead the Investigations team, which includes stopleaks@.

At TGIF a few weeks back we promised an update on our investigation into some recent leaks, and here it is: We identified the people who leaked the TGIF transcript and memes. Because of their intentional disregard of confidentiality, they’ve been fired.

We’ve all worked hard to create an environment where we can share information openly. Our culture relies on our ability to trust each other—we share a lot of confidential information, but we also commit to keeping it inside the company. We don’t want that to change.

That said, we’ll be making some changes to TGIF to help keep the information shared internal-only, starting by no longer posting the written transcript to go/tgif. Instead, you’ll be able to watch a live stream, and for those who can’t tune in live, we’ll be offering the full video with Q&A.

We’ll continue to share information internally because the vast majority of Googlers and Characters respect our culture and don’t leak—thank you for that. That commitment toward a common vision and goal makes this a special place to work.

Please remember: whether malicious or unintentional, leaks damage our culture. Be aware of the company information you share and with whom you share it. If you’re considering sharing confidential information to a reporter—or to anyone externally—for the love of all that’s Googley, please reconsider! Not only could it cost you your job, but it also betrays the values that makes us a community. If you have concerns or disagreements, share them constructively through your manager, HRBP or go/saysomething.

Which brings me to my final point: some of the recent discourse on Memegen and elsewhere within the company has been, shall we say, less than civil. Memegen, Misc, Internal G+ and our many discussion groups are a big part of our culture—they keep us honest—but like any conversation amongst colleagues, we should keep it respectful.

Brian Katz

Director, Protective Services, Investigations & Intelligence

Close this section

Zuckerberg-Backed Data Trove Exposes the Injustices of Criminal Justice

Amy Bach was researching her book about the US court system when she met a woman named Sharon in Quitman County, Mississippi. One July day in 2001, Sharon said, her boyfriend took her under a bridge and beat her senseless with a tire iron. Sharon passed out numerous times before her niece intervened and stopped the man from killing her. In photos from the emergency room after the attack, Sharon’s brown, almond-shaped eyes are swollen shut. She reported the crime to the police, who wrote up an aggravated assault report.

And then nothing happened. Neither the police nor the local prosecutors pursued the case. As Bach’s research later revealed, Quitman County hadn’t prosecuted a domestic violence case in 21 years. When Bach brought her discovery to the prosecutor, she remembers him saying, “Has it been that long?”

Quitman County, the fifth poorest in Mississippi, had next to no readily available data about how its own court system operates, who it affects, or who it leaves behind. As a criminal justice writer and attorney, Bach knew that wasn’t an unusual scenario. But that didn’t make it any less of a problem.

“He didn’t even see it,” Bach says of the prosecutor. “You can’t change what you can’t see.”

Compelled by the experiences of Sharon and so many others collected in her 2009 book Ordinary Injustice, Bach set off on a multi-year, labor-intensive effort to build a free, public tool that would make the many injustices in the court system a little bit tougher to ignore. Measures for Justice launches today with deep data dives on more than 300 county court systems in Washington, Utah, Wisconsin, Pennsylvania, North Carolina, and Florida, with plans to expand to 20 states by 2020. It pulls together the data that has traditionally remained hidden in ancient databases and endless Excel spreadsheets.

Even with just six states included, the comprehensiveness of the platform surpasses anything similar that currently exists. Measures for Justice compiles granular data for 32 different metrics that indicate how equitable a given county’s justice system might be. The portal shows, for instance, how many people within a county plead guilty without a lawyer present, how many non-violent misdemeanor offenders the courts sentence to jail time, and how many people are in jail because they failed to pay bail of less than $500. It offers insight into re-conviction rates and never-prosecuted cases. Users can compare counties or filter information based on how certain measures impact people of different races or income levels. And the site organizes all of it into easily digestible data visualizations.

Bach’s work has attracted the attention of the tech industry’s increasingly activist leadership. Earlier this year, awarded Measures for Justice a $1.5 million grant. Today, Mark Zuckerberg’s Chan Zuckerberg Initiative announced it is giving $6.5 million to the non-profit to help it expand into California.

“Better access to criminal justice data is important for informing efforts to make our communities safer and our criminal justice system fairer,” said Democratic strategist David Plouffe, who now serves as the Chan Zuckerberg Initiative’s president of policy and advocacy, in a statement. “You can’t solve a problem if you don’t have the facts.”

Some counties had their own databases to hand over to the Measures for Justice team. Most did not. That meant researchers often had to travel from county to county requesting individual records from local government agencies. The team then used automated character recognition tools to process the information into one central repository. “The data is like a snowflake,” says Andrew Branch, director of technology for Measures for Justice. “Every county is different.”

That process, Bach says, illustrates just how fragmented the American justice system is. More than 188,000 people are incarcerated in federal prisons, while roughly 2 million are locked up in state prisons and local jails. As Bach says, “Justice in America happens in 3,000 counties, each with its own justice system.”

For criminal justice reform advocates concerned over the newly aggressive Justice Department attitude toward maximum sentences and broad crackdowns, the fact that cities and states enjoy this autonomy can be comforting. Communities can set their own priorities and goals. But overworked and understaffed district attorneys’ offices can make only so much progress if they don’t know what’s happening in their counties.

When residents of Winnebago County, Wisconsin, elected Christian Gossett district attorney in 2006, he inherited a backlog of nearly 900 cases that the 10 attorneys on staff had never prosecuted. Gossett got to work making the department more efficient, cleared the backlog, and set up diversion programs that could keep low-level offenders out of jail. He was feeling proud of the advances his office had made.

Then he began working with the Measures for Justice team. After digging into the data, they found that among people with no previous records who had committed non-violent misdemeanors, white defendants were nearly twice as likely as non-whites to enter diversion programs instead of going to jail.

“I never would have guessed it,” Gossett says of the discovery, “but it’s there, and I’d rather know it so I can fix it.”

As it turns out, Gossett found judges were offering white and non-white defendants the option to enter diversion programs such as drug rehabilitation at equal rates. But non-white defendants opted for jail time more often. And choosing jail means opting for a criminal record, which can mean opting for a life in which everything from jobs to loans become much tougher to get. Now Gossett is working to get to the heart of why people are making this decision. Maybe it’s because they can’t afford the transportation to the diversion facility, or perhaps it’s because of a lack of trust between minority communities and law enforcement. Now Gossett says he’s at least a little closer to figuring out the answer. “We wouldn’t have known to look for that, because we didn’t know it was an issue,” he says.

For all of the valuable information Measures for Justice has collected, it’s far from complete. It does not, for instance, include any information on police behavior or anything that occurs before an arrest takes place. A broader swath of such information could come in time, according to the Measures for Justice team. For now, Bach and her staff are focused on filling a vast information void in hopes of creating a justice system that lives up to its name.

Go Back to Top. Skip To: Start of Article.

Close this section

A social coding experiment that updates its own code democratically

Chaos, the vacant and infinite space which existed according to the ancient cosmogonies previous to the creation of the world, and out of which the gods, men, and all things arose.

ChaosBot is a social coding experiment to see what happens when the absolute direction of a software project is turned over to the open source community.

There was clearly a kitty missing.

How it works

  1. Fork the code and make any changes you wish.
  2. Open a pull request.
  3. If there is general approval* from the community, the PR will be merged automatically by ChaosBot.
  4. ChaosBot will automatically update its own code with your changes and restart itself.
  5. Go to #1

In effect, you get to change the basic purpose and functionality of ChaosBot, at your discretion.

What will ChaosBot do? It's up to you. The only thing it does now is update itself with your changes. And as long as the code connecting itself to new changes remains intact, ChaosBot will continue to grow and change according to your will.

Some things it could do

  • Provide some useful service to people.
  • Be malicious.
  • Recreate itself in a different programming language.
  • Break itself and die.

There is no set purpose. What ChaosBot makes itself into is entirely up to the imagination of the open source community.


Votes on a PR are sourced through the following mechanisms:

  • A comment that contains 👍 or 👎 somewhere in the body
  • A 👍 or 👎 reaction on the PR itself
  • An accept/reject pull request review
  • The PR itself counts as 👍 from the owner

Weights and thresholds

Votes are not counted as simple unit votes. They are adjusted by taking the log of a user's followers, to the base of some low follower count. The idea is that voters with more followers should have more weight in their vote, but not so much that it is overpowering.

Vote thresholds must also be met for a PR to be approved. This is determined as a percentage of the number of watchers on the repository. However, it is more important to vote against bad PRs than to assume the minimum threshold will not be met.

See the source code for more details.

Death Counter

Chaosbot has died 2 times. This counter is incremented whenever the trunk breaks and the server must be restarted manually.

Server details

  • ChaosBot runs Ubuntu 14.04 Trusty
  • It has root access on its server. This means you are able to install packages and perform other privileged operations, provided you can initiate those changes through a pull request.
  • Its domain name is, but nothing is listening on any port...yet.
  • It's hosted on a low-tier machine in the cloud. This means there aren't a ton of resources available to it: 2TB network transfer, 30GB storage, 2GB memory, 1 cpu core. Try not to deliberately DoS it.
  • MySQL is installed locally.


Q: What happens if ChaosBot merges bad code and doesn't start again?

A: Errors can happen, and in the interest of keeping things interesting, ChaosBot will manually be restarted and the death counter will be incremented.

Q: What is "general approval" from the community?

A: Users must vote on your PR, through either a 👍 or 👎 comment or reaction, or a accept/reject pull request review. See Voting

Q: What if ChaosBot has an problem that can't be solved by a PR?

A: Please open a project issue and a real live human will take a look at it.

Close this section

Ask HN: How do you become productive in a new project as a senior developer?

I am 1 month into my new position, same position you're in - new stack & straight into senior role. My tips so far:

1) Take over all the admin stuff you can to free up your devs from distraction and pointless tasks. Productivity and morale will immediately go up. My guys were time reporting into 3 different tools (and this is a <10 person startup!), I just started writing a summary of our standups and told them they didnt have to do it individually any more, it was appreciated and mgmt still get the info they need. 2) Start with infrastructure stuff - how do you deploy, what's your build process, can anyone on the team draw an architecture diagram (mine couldn't) 3) Write tests, it's low risk changes to the codebase so you cant really break anything, but it'll give you the black box overview of how everything hangs together, detail will come in time 4) Accept that there's no way to immediately get to the level of familiarity the guys who built the system will have. I can spend 2 hours digging trying to find how widget x gets its props passed in, or I can ask my teammate who wrote it and immediately known which file and line its on.

As I have written in previous post, this is where IMO a step-debugger earns its keep.

As an IC basically my entire life, I have joined large legacy projects many times, and my typical attack strategy always revolves around a large cup of coffee and my trusty debugger. Using 2 monitors, starting at main() or index.php (or whatever of course), I will execute as much of the entire codebase as possible line-by-line.

At first I will blast through the code looking at general structures and entry/exit branches and the like, and note things and often breakpoint sections that seem hairy or where I can note inefficiencies or great design.

Then I slow down the stepping and really examine the heavy-lifting sections to try to become familiar with the style and abstractions being used.

This method has served me well, as more often then not, by the first day's afternoon I can be having intelligent conversations with the existing team, almost always much to their confused surprised.

I suppose it matters much if you are going into the senior position as really strong head-down coder or as really more of a project management liaison.

Not being much into the management side of things, I don't have much advice on that role, and your suggestions sound really smart.

I just wanted to mirror this because often the entry points to various parts of the application is what defines seniority with the particular codebase. The challenges for building how the thing starts and stops as well as the build is something that everyone is currently relying on to do the good work they are doing now. A lot of hard won stability/system issues are solved in that kind of area which dramatically affects what is in the possible vs. something from a wildly different tangent then what's already there.

I love this suggestion - using a debugger to walk through as much of a new codebase as possible. Definitely going to try this it next time I need to familiarize myself with a new large codebase.

Great tips, but I would add that those 2 hours spent exploring the code can actually be useful. I often need to read the code a few times before the big picture starts to come into view, so unless I need an immediate answer, I usually prefer to dig for it myself.

I think digging into the code for 2 hours solo is very important. If you still can't find it, ask a senior dev. Actually being familiar with the code is important to be able to nail down how things work abstractly to the actual implementation. Code is the ultimate source of truth. You are going to reach a point where you come to code written by an engineer who has left and nobody else understands. You will also need to dig into an open source project at some point without any help.

I found it easier to get a new project if you focused on support tickets as a new hire. It is quite something to be able to be productive and see some dark corners of the stack earlier on.

I'd say it depends:

* How important the code path is. It might really be some boring code that you will never touch or don't need to understand anyway.

* How much time you have. This point is really important if you're reviewing some code.

I love all of your advice, but I would start straight-away with writing tests. Not only is it low risk, it is the best way to start understanding the code base, and gives you an avenue through which you can start introducing small refactorings. The other important thing is that developers LOVE to have outside help with testing and a pair of fresh eyes to look at the test-suite through.

From experience, I find that I'm not really able to add meaningful tests until I have an idea of what's going on.

Start by adding a whole-system black box end-to-end test, if there aren't already any good ones. Ideally fully automated and automatically run for every commit via CI, but don't let the perfect be the enemy of the good, take what you can get for starters.


(1) You only really need to know the system as a user, not the codebase as well, to do this.

(2) Well, #1 was a bit of a white lie. You'll end up finding out about all the (possibly undocumented) runtime dependencies this way, spinning up throwaway VMs and databases, etc. If the project is in production, this will be very useful knowledge, since you're probably the last point of escalation for strange prod issues.

(3) Running this out of CI with throwaway VMs/containers will also force you to fully automate the install and make the damn thing actually work on a generic blessed OS configuration, which might be a huge boon to your team if you currently are in "Works on my machine" hell. I did this somewhere where we had tons of lost productivity because developers used OSX or Ubuntu on their workstations, but prod was RHEL, and the most fascinating deviations would be found by developers chasing the strangest of bugs. Making the install reproducible so we could have CI totally ended this.

(4) If you don't have this already, and you set up infrastructure to gate commits on this test passing, team love and productivity will rapidly go through the roof as suddenly your developers aren't spending half their time fighting each other's build breaking changes.

So yeah, it's just one thing, but it leads to so many benefits it's definitely where I would start.

EDITED: Added an item to the list

I've had the same experience. I'm sure it's somewhat context dependent, but if the code amounts to business rules and they are not documented, which is very common, you can only reverse engineer what the system _does_ and then write tests for that.

Of course you can ask people who are more familiar but whether you end up helping or being a distraction then is an open question.

Some times tests, but sometimes I start writing/improving the docs and tutorials, if they are in a bad shape. Mostly they are.

Plus of course improving the infrastructure. This is also mostly in the same poor shape as the docs.

Yeah, whenever new hires complain about the lack of completeness or correctness of documentation, I'm like: that's a great place for you to start. It also seems to be a great signal of the attitude of the new hire, because the people who kind of groan at that task usually end up having problems.. but that's another topic altogether.

Additionally, start up a refactoring project on a feature branch.

Seriously, the most useful thing I've ever done is just refactoring half of a project to see how it works.

I sometimes do this with code that nobody understands but generally you should first talk to the people already there and understand the code.

The existing devs will hate you. They already knew how it worked - now they may not even you've finished - and unlike normal refactoring you never knew how it was supposed to work.

This is a good way to cut the velocity of whole project so you don't look as lost though.

I don't think the point is to move the refactor into production. The refactor is a learning exercise. But if you start on the least understood, ugliest parts of the code the devs might welcome a refactor of those bits.

Also I'm assuming this is a web app but taking over a live project is a bit like getting handed a gun, you should check if it's loaded before handling it ;) at the least run a vulnerability scanner over your app to look for security issues

Burp scan, zap scan are two products for penetration testing / vulnerability scans. They mount organized attacks on your web site. They look for stuff like sql injection and xsrf, and all that.

Burp has a broader scope because it does fuzz-style random testing. Zap is more reproducible. (Burp can be a pain in the neck because it doesn't reliably retest stuff it found.)

Be gentle with your new developer colleagues as you present them the results from these tools. They almost always find a couple of more-or-less silly vulnerabilities.

regarding 1), isn't there a danger that you can end up being seen as an admin monkey by the rest of the team leading to them undervaluing your technical knowledge?

I'm sure that they appreciate it, but that's not necessarily the same thing as it's good for the team in the long run.

The trick is that then he is the throughput of his team into all the systems. So he can see if someone is having trouble with a bit of the code i.e. they haven't finished a part of the project. The Senior Dev can then pair code etc to help out. This will assuage the admin monkey syndrome when an answer comes up.

But the answer is brilliant cause it also gives him the ability to see what his team is working on and how they are working on it.

>> I just started writing a summary of our standups and told them they didnt have to do it individually any more, >> it was appreciated and mgmt still get the info they need.

I'm skeptical that management would need a daily status.

You don't need nightly builds either - but they can be useful. Shorter feedback cycles / faster iteration has a lot of benefits. Also depends on how far up the management chain you're going.

If you've got critical bugs live in production, "management" may be the ones ensuring all the hot potatos are in fact being handled by someone - that nothing is falling through the cracks, that everything is being addressed, that QA is on top of testing hot new changes that need to be rolled out ASAP. Daily isn't frequent enough in this context.

Burning through the QA backlog leading up to a big release in ye olde milestone driven waterfall style schedule? Daily might be frequent enough.

Long term feature work? Might be able to go weeks between updates - but a daily ping of "still working on X, still on schedule" takes what - 5 seconds to email, 5 seconds to read? Helps keep the mental burden of juggling everything in your manager's head down, maybe helps reassure them you're not going to pull a "so yeah, we've still got a month of work left at least" on delivery day when they're reporting up to their management. They might not be able to help you get your task done faster, but they might be able to rearrange other parts of the schedule to keep their stakeholders happy (e.g. X is running into delays, but we can give you Y ahead of schedule). Even if they can't do anything to help the schedule, they can start managing expectations ahead of time, prevent unseemly "surprises."

And hey, daily status reports are better than daily status queries.

Keeping management informed via daily email does not scale. Nor via phone or meetings. All this daily distraction nonsense.

Have a high-overview webpage where they can look it up by themselves if they need to. This is faster than daily and gives much better and accurate info. It's your task to communicate the metrics, scheduling problems, cost overruns and feature creep.

> Have a high-overview webpage where they can look it up by themselves if they need to.

Who keeps this up to date and well maintained? I see little fundamental difference between pushing metrics/status via JIRA and pushing via email. Both scale (or don't) just as badly. Both require distraction from your development tasks to properly estimate or summarize status/problems.

Don't get me wrong - keep the daily distraction the hell in check. But there's no magic bullet to make good communication free, and there are plenty of people and contexts where words and language work way better than attempting to abstract things with stats and metrics.

> This is faster than daily and gives much better and accurate info.

Maybe for you. Maybe for me. Definitely not for a lot of coworkers I've known. They do not context switch from "this is harder than I thought" to "track down the JIRA task and change my estimates". Getting some of them to even log work done is like pulling teeth. Hence hacks like the daily standup - poll, use words, get the real status.

Having worked for a company that used daily status reports via email, I can say that they are an absolute pain in the ass, but clients were delighted by them when done properly. Those status provide a clear hand written description of what your team did and next steps to take. I can honestly say that those reports did as much as the quality code to show us as a professional team in which the client could trust. And yes, if you couldn't write your own status for each day, even after training and guidance to do so, you weren't a fit for the company. It was one of the best companies I've worked for and where I learned the most.

It depends on the local environment. The way I think of it is: if I hire senior talent at $OMG annual salary, maybe more than managers make, I want to know they're DOING something. So daily status makes more sense. Then when a level of trust is established, should go to weekly, if management is at all competent.

The thing is, it's all about visibility. Can management look at your sprint board, either physical or online, jira agile board for example, or in some other tool, where they can see what's going on? That's a good place to start.

This is actually the case and I probably misused the term, mgmt is actually the founders trying to find out if the features are going to be pushed before the next client meeting 90% of the time

They're requiring reporting into three different tools, and still don't have the visibility to know if shit's going to be done by deadline?

They really have no idea what they're doing.

How are these deadlines getting set?

> My guys were time reporting into 3 different tools (and this is a <10 person startup!)

Who is the fscking idiot that calls himself a startup founder and makes people report in 3 different tools?

This kind of attitude is really unhelpful. Of course this sort of situation is undesirable, demonstrably, but it's not necessarily a case of idiocy - it's actually very easy for non-technical founders to get into this situation: let's say the developers want to use JIRA, but the non-technical staff find JIRA impossible to use (totally reasonable), so they want to carry on using Trello; straight away you've got tasks spread across two disparate platforms. Then let's say that a manager has an embedded belief from a past (probably non-startup) life that time-tracking by the hour is incredibly important, and they have a preferred tool for this (because, while JIRA has this functionality, it's pretty horrible, and Trello doesn't do it at all), and boom, you've got two different tools for tracking work and a third tool for tracking time.

Sorry, an hour a day doing busywork (like entering hours) in a startup is utterly indefensible. I've done a bunch of startups and not only did this never happen, it would not have been tolerated. Even at <<<several large software companies>> I never spent more than about ten minutes a day doing tracking; during heavy bug crunches, maybe an hour doing detailed writeups within bugs for other groups, for communication purposes. In a small team in a company with a short fuse, you can't afford this.

Either you have people who can do the work and coordinate, or all the tracking in the world is not going to help.

(Time tracking by the hour . . . I can't even . . . it's time to get a new manager by firing the current one or finding a better job elsewhere. Totally serious).

While crude, I do think raverbashing has a point here. If a 10 person startup already requires their devs to use 3 different reporting tools that startup has a problem. If you can't streamline at a small size like this you won't stay nimble and you won't make it.

I am at a 13 person startup. We use Confluence, Bitbucket, Google Apps, and we also use Asana (instead of Jira). It is a nightmare... Information is spread out all over these platforms. We can't stick to only Atlassian tools because the CEO thinks they don't allow you to collaborate... We spend a lot of time looking for where information lives and debating where it should live, complete waste of time. In a smaller company this sort of behavior can be catastrophic.

I totally agree, and I think a lot of people will have been in this situation too.

There is, unfortunately, no simple solution other than the very obvious one, which is to use the right things for the right job, and as few of those things as you can. The really crushingly miserable situation is one where the documentation is in Confluence and Google Docs and some random files in Dropbox before they started using the first two and some files that someone shared on the Slack channel and a few things that are just in the CTO's inbox which he was sent by the CEO in PowerPoint and... it's utterly toxic.

This proliferation of excellent tools we have today should be a good thing, but really we are all victims of our own best intentions sometimes (in that horrible chain of misery, even the CEO honestly thinks he's being helpful by emailing a PowerPoint rather than just telling someone over the phone).

>time-tracking by the hour is incredibly important

As someone who does the R&D tax credit for a lot of tech companies, let me tell you that it really, really is. It's worth somewhere around 10% of all your developers' salaries if the company is profitable. As in, if they work 10 hour days, entering data into a time tracking system is a net benefit even if it takes up to an hour a day.

I'm wondering how widespread claiming "R&D tax credits" actually is.

To think that most companies could/would seems preposterous (how many are truly doing something worth of the monicker R&D?)

I'm seeing an ecommerce business applying for R&D credit, and that seems absurd, since

> Your company can only claim for R&D tax relief if an R&D project seeks to achieve an advance in overall knowledge or capability in a field of science or technology through the resolution of scientific or technological uncertainty - and not simply an advance in its own state of knowledge or capability.

From the article:

Well I'm in the US, so I can't speak to the UK's regs with much expertise, except insofar as I know most of the world's R&D credits are based on ours. But in the US, most web dev qualifies, including fairly run of the mill ecommerce site.

Just judging from what you quote here thought, the US rule is less rigorous. Taxpayers must resolve technological uncertainty just as in the UK, but it's a taxpayer centric test. Meaning, that uncertainty can be "simply an advance in its own state of knowledge or capability."

Honestly, I shouldn't say this because it's very much against my own interests, but the US regs are way too loose and permit credits for work that just does not seem to me to be worth subsidizing. But for a normal development company, it's worth tracking hours to gain that subsidy because the IRS primarily challenges us on the validity of the underlying data at this point rather than the qualification of the activities themselves, because a large amount of litigation has settled most of the questions in that area in favor of taxpayers.


I guess that, even if the rules appear to be stricter, in the UK it might de-facto be the same, if HMRC is not contesting many claims.

An aside, but working 10-hour days is bad. Anything more than 40 hours/week and you're actually reducing overall output.

What kind of tracking do you need and what's the needed resolution?

Can we just have "customer work" or "R&D work" for 8h a day?

The quality of your data dictates the probability of getting an accurate, audit defensible credit calculated. More detail makes it easier to defend. We can work with "R&D work" level of detail, but it is less defensible and it will cost more to have us (an accounting or consulting firm) calculate the credit for you. It also makes it much harder to calculate in-house or using paid self-help software tools.

Particularly because IRS rules require that the wage costs be allocable to a "business component." Business component is a technical term of art specific to the R&D credit, but roughly it corresponds to projects, and more specifically to a (1) new or improved (2) product, process, technique, formula, or invention. It must be one and only one of those, no mixing of new and improved or product and process. So if you have a new product and a corresponding new production process, they must be separated in the reporting of qualified research expenditures, even if you as a taxpayer have classified the two BCs as a single project.

So when you have time tracking, it is best when it allows us to tie specific hours worked to specific projects and tasks. We can then work out from the project list how to combine or split up your projects into business components and map the costs that way. From your activity codes, we can set a very specific and defensible qualified percentage to each employee's time, without having to disrupt your business by surveying your employees (plus the IRS does not respect self-reported, non-contemporaneous survey data much if at all).

As an example, I walked into a software dev client's offices a few weeks ago to sit down with the CFO, comptroller, and their head of development. I walked out with highly detailed project tracking data and spent four hours processing it using our internal software tools, calculating a $2m benefit with a high degree of accuracy. At other firms with lower quality data, it can take hundreds of man hours to achieve the same benefit. So we gave the software client a massive break on our fee (and actually still profited more than we would have with a less prepared client) and everyone walked away happy.

Ah I see. But task tracking != project tracking

And this does not necessarily require everything to be done by the developer

Even if the value that can be obtained by this is higher, it seems to me it's cheaper (in general) to have someone track the effort in this specific way than make developers do it. (Also if your developers are unhappy or busy no tax credit can save you)

I think that you're not necessarily considering the side effects of (A) working 10 hour days, and (B) spending an hour on paperwork. People will often start to wonder why they work so hard if the company doesn't take their time seriously. This can negatively impact morale, and that has a real, if hidden, cost.

I picked those numbers to make the math easy. No one should or will spend anywhere near an hour entering time into a time tracking system, unless it is horribly designed.

>People will often start to wonder why they work so hard if the company doesn't take their time seriously.

This is a failure of communication. People tend to get frustrated with paperwork when they do not understand its purpose or importance. Those employees who are discouraged by hours tracking paperwork don't realize that the company is being paid by the government to have them fill out that paperwork. Management needs to communicate to employees that there is a very significant impact on the bottom line from them properly filling out that paperwork. That impact increases their wages and the company's ability to survive and thus to provide them continuing, stable employment.

Now, if you think its pointless that the IRS requires them to fill out this paperwork, well that's another discussion, but I can offer you arguments in favor of verifying R&D activities before granting public subsidies for them and in favor of the subsidies themselves.

If the non-technical staff wants to use Trello, that has nothing to do with the technical staff (though it might be painful for the people caught between the two).

In my opinion, detailed time tracking is pointless. I've worked for companies that required it (only small ones required this), and it was just a painful distraction that never had any perceivable benefit. In larger companies, most people just write 8 hours a day for their project, and that's it. No need to make it any more complex than that.

And Jira is not a reporting tool. It helps you keep track of the work you need to do, and roughly estimate how much work that is, but it's not suitable for detailed reporting. And for a small startup, you don't need detailed reporting. If the boss wants to know what people are doing, he just needs to drop by at the standup meeting.

Although it is understandable how someone could make this mistake, mandating redundant reporting is a very bad sign in a startup. It shows the manager does not comprehend the difference between a giant company where such large overheads are necessary, and a tiny startup where nimbleness and intimacy and the very thing that gives the company a chance to succeed.

> This kind of attitude is really unhelpful.

No, what's unhelpful is less than 10 people acting out in an noncohesive way

This is weak management and a lack of focus

I wouldn't go and say this is due to stupidity but as a senior professional that would sure be my starting point. That company is in a less than desirable situation, how many other gaffes are there?

Super useful. Thanks. The project is also in burning condition, viz. Tight deadlines, pressure etc.

Also talk to your QA and make sure they understand the level of throughput possible for your team. QA can unintentionally kill a struggling project by overloading the tracker with bug reports.

"burning condition" should not be a persistent state. Job #1 should be getting it back to a healthy state, where people feel comfortable working normal hours.

IMO, there is no magical formula, and being a senior dev doesn't change the work involved: you're still developing code, you're still working with your peers, you're still designing stuff. You're just better at it because you've seen a lot of it before.

That means you'll get productive in same way you'd become productive as a junior. Pick up easy tickets and use them to learn to navigate the codebase. Talk to your colleagues and fill in your knowledge gaps. Help others who are having issues. Pick up and address more tickets. Get in the thick of things as quickly as possible.

If the position includes additional roles, such as that of a PM, nail down the necessary role requirements and start doing those as well. You'll need to work on your soft skills - honey and vinegar, carrot and stick, etc.

You're a senior developer because of your experience and knowledge in general, not because you're magically a productivity machine. Assumptions are going to be your biggest hinderance when coming in to your first "senior" position. Don't assume that you know more than your peers, even the junior ones; when you first start they probably know more about the codebase and tech stack than you do. Don't assume that if a team works differently than what you're used to, it's automatically worse. Don't assume that their code quality and engineering practices are poor.

Don't assume; find the reality and work with your team from there.

I see a lot of advice about individually contributing. I'd recommend another path: Figure out how to be a force multiplier for your team. Sure, maybe you're great at spitting out code and tests. But that's still only one developer of work. If you have 10 people and make them 10% more efficient... That's also one developer of productivity gains.

Since you're new, you don't necessarily have all the insight to make this happen on your own. So ask your team! "What can I do that would make everyone more efficient."

Also, make it clear that you are available for general questions on technology, architecture, algorithms, secure coding, etc. I spend one to two hours each day fielding questions like this, and I don't have a lot of in-depth knowledge of our code. But all code needs architecture and algorithms and security best practices. And since I don't have to actually spend the time coding, I'm multiplying my capacity to make decisions. It's also great for developer ownership, when I get to say things like, "I can give you general advice, but you know your code the best. So apply it with discresion."

> "I can give you general advice, but you know your code the best. So apply it with discresion."

This is an excellent quote and I plan to use it, or a close variant. Thanks very much for sharing!

Sounds like your team needs better onboarding documentation. Make creating those docs your responsibility so that the next person that joins doesn't need to do what you're going to need to do.

Start by having someone on the team give you a brain dump. Record it. It doesn't need to be high quality. Use Quicktime and your laptop's webcam.

After the braindump, write out everything that was said and make the following diagrams:

- Architectural diagram showing the components involved.

- Infrastructure diagram that shows where things run.

- Deployment pipeline that shows how code gets from git commit to production.

- Data flow diagrams for the most common usage patterns. This one might come a bit later as you're more hands on in coding.

Show the diagrams to your team members as you create them and ask them to verify that it's accurate. If it's not, understand why and fix your diagram.

Add the above to your onboarding docs. If none exist, create it somewhere.

Add a page to onboarding docs that describe how to access logs.

Add a page that describes how monitoring in the system works.

Add a page that will act as an index for recipes on how to do useful things. From that page, create a bunch of useful scripts that you gather as you learn stuff daily. Encourage others to do the same.

I think you get the picture. Add a page for everything you need to know in order to be productive. Think about all the things you knew how to do in your previous position. Try to document how to do all those things in your new position.

Soon, before you know it, you'll be productive and you'll enable your team to be able to onboard new team members more efficiently.

In addition, write code and review PRs. Ask a lot of questions in PRs. Investigate how things work. Get curious.

I'd add one thing: write a threat model as well. What are you protecting against? What are you not protecting against? What do you do if X leaks?

X can be user hashed passwords, server's certificate keys, etc...

I'll add: find the requirements documentation for the system. Chances are it doesn't exist, or if it does it is horribly wrong. Go fix it.

Tackle the backlog - those bugs that have been just sitting there for months or even years because other things keep getting prioritized above them. They may be minor, low-priority fixes, but when you solve them, what people will think is "Wow! This developer fixed a bug that no one else could in 3 years! That's always annoyed me but now it's finally fixed!". Next thing you know, they'll be giving you the most important tasks.

Along the way, you'll of course need to talk to people to understand both the code that has the bugs and what is expected of it, and you'll learn years of history from both the development side and the product side, which is invaluable.

I'd also add that one of the big things to look at is the documentation and start improving it as a way for you to understand the systems.

> and start improving it as a way for you to understand the systems.

Be aware that to do this in a time-efficient way, you might need to pull in one of the existing engineers who can give you the lay of the land.

Indeed, it is amazing how many 'soft' things surrounding the software can be improved without requiring to 'get into the zone' as a developer. Of course, you also need to do the actual work and not get bogged down in these peripheral tasks.

I'm starting to think that actually writing code is more the "peripheral task" in the senior developer role - I find I spend most of my time working with other devs, doing code reviews, dealing with administrative stuff like our organization's change management process, and handling larger-scale, higher-level stuff like making our deployment process sane.

It's unusual these days that I just grab a feature or a hotfix and knock it out - generally only the ones too big and scary for anyone else to handle, and my guys are good enough that those are pretty rare.

Like a few others here, I just had to onboard to a project as well. I made a lot of mistakes, but learned a lot.

Mistake 1 - Was too focused on committing code and getting my first PR merged. I wanted to prove I could commit code quickly, and that PR is missing a lot of things and had to be patched later.

Mistake 2 - Not asking for (enough) help. I did a short pairing session that got me up to speed, I should have asked for a few more. I also should have asked "What's important in this code?" I had to learn that the hard way.

Mistake 3 - Ignoring the full release lifecycle. On day 1 you should first learn how issue management works, and then how the software is released to production. Releases caught me by surprise, and I ended up having to put hotfixes in.

Mistake 4 - I let QA become a second class citizen. Find out who and how your code will be QA'd, and try and be active in that process.

Mistake 5 - Ignoring important "sub-systems" as they didn't seem to directly pertain to my work. For instance, I knew we had a feature toggle system, and I was told I didn't need to use it for my work, but looking back, it would have been nice to know and use. I also was told to ignore end-to-end tests since my work was so small. Well guess what held up release because my code broke it? Yep, end-to-end tests.

Mistake 6 - Modifying code without reading tests and understanding the underlying data. A lot of people have mentioned this, and it can't be emphasized enough. A lot of tests have mock data at the top, why ignore that, it tells you what the code is working with. 'Describe' blocks tell you exactly what methods are supposed to do and what's important about them.

So, I made a lot of mistakes it seems, but the most important thing I did right was this:

Whenever I realized a mistake, I fixed it and I moved on. I didn't let it get me down, and I won't make that mistake again. That's ultimately what a good dev is, someone who can red/green/refactor their mistakes with minimal oversight.

Turns out, this happens pretty frequently, getting up to speed on new code bases is a skill in itself IMO. After you've done it a few times, it does get easier, but still very annoying.

I try to go in with an open mind and assume the developers who went before were at least competent and had good reasons to do things the way they did. Sometimes this charity is unfounded, and the code truly sucks, but usually some poor decisions can make sense when you have more context.

Ultimately, I just dive in and start fixing bugs or implementing features. I'll start with the build to understand what the project actually consists of, then I rely heavily on shell tools like grep and find to explore the code.

I once inherited a fairly large c++ application that could only be built on THE dev box, or a copy of it. Only one old timer understood it, it was his meal ticket, and he wasn't talking. It used auto tools so I converted it to cmake so I could build locally. Then I spent about two weeks just reading the code, started at main() and wrote everything down in an architecture doc on the wiki (file and method level, only document method internals that were really hairy). Then as enhancements or bugs came up I'd flesh out the doc some more.

Another time I inherited an old IE6-7 app that had to be modernized. Besides the MS specific stuff, it was a mess of javascript spaghetti. Again here, I had to figure what files where actually used by the app, so I grepped log files and browser network logs to figure what was actually loaded, and just read the code. The first major project I did was to remove many hundreds of unused files.

"I once inherited a fairly large c++ application that could only be built on THE dev box, or a copy of it. Only one old timer understood it, it was his meal ticket, and he wasn't talking. It used auto tools so I converted it to cmake so I could build locally. Then I spent about two weeks just reading the code..."

I am interested to know what happened to the "old timers" meal ticket?

A lot of being productive as you gain more experience is by working on the right things, so you produce more with same amount of effort.

What this means is you want to start with figuring out goals. What is the project's goal? What are the constraints? Why are things being done the way they are? What does success mean?

Once you know that you can start doing tactical stuff (and since answering those questions can take a while you can probably do lots of other stuff in parallel, as discussed in other comments).

Two relevant practical talks, from PyCon that was this weekend:

1. I gave a talk on how to choose how and where to test your software (

2. Awesome talk on how to structure documentation - prose version at, video version at

Start reading the code. Ask "why" a lot - the code can tell you "what" and "how", but "why" isn't always there and is the most important piece of information. If the codebase has been around for a while, this may also surface places where the reasoning no longer aligns with reality; these are great targets for refactoring.

There's an interesting side-effect of this practice if you've got less-experienced devs on the team: it can encourage them to also ask more questions more confidently.

I think most of the replies are solely technical. Sure that is important, but I'd say you need to also spend time with users, customers, management, marketing etc to find out what they want out of the system. Also HR issues - can you hire & fire people? Get money for resources? Bring in new tools?

Its often easy to dig into the code as that is familiar but that probably wont be what you'll be judged on down the track.

Your team, especially if their morale is subpar, is your number one asset. Code and deadlines will come and go, but the team, or the lack there of, is (in reality) all you have.

If you're not actively engaging them for input, insights, etc. then you could be short term smart and long term foolish.

Even as a serious dev you're still 75% manager, and, at best, 25% leader. Your duty is to put your people into a position to succeed. And until you know it all (sarcasm), they're going to have thoughts and feelings on what that future looks like.

Allowed freedom of choice, I'll start as follows.

1. Build on my machine

2. Reproduce issues

3. Fix bugs -- even typos or refactors

4. See the fix code review/merging/deployment to qa/dev/whatever

When I start a new project I'm most concerned with reproducibility of the process of development. A lot of time other people are blind to the little tweaks that have occurred over the years that enable them to build, test, and deploy a system.

Those tweaks often give a lot of insight into what is going on. The ability to push-button deploy is also a bellwether of what is to come.

"Clean Code" by Robert Martin can certainly help with better engineering practices. Other books that come to mind are "Continuous Integration" by Duvall and Matyas, and "Continuous Delivery" by Humble and Farley.

Apart from that, I can only recommend to play with the tech stack in a throw-away prototype before using it for real, because the first decisions you make in a project tend to be those that are hardest to change later on.

Approach the task with humility. First observe, participate and learn, then come up with improved practices.

This is a symptom of a problem you might help to address. companies spend a fortune on recruitment and next to nothing on on-boarding. The problem is multiplied when there is a lot of technical debt or weak infrastructure, because they make onboarding harder.

Often, the people who grow up with a mess don't see the issue because they learn it a little at a time as it's being created, but bring a new person in and require that they understand it all to be successful and you have a real problem.

I'm a little confused by both the question and the current answers. Maybe because, when I've had that position, there was a project manager that or similar that dictated what should I do and what was expected from me.

Also being new with a tech stack is enough of a handicap to keep me busy getting to speed, so adding quality and better engineering practices had to wait.

Actually, the first time that I found myself in such a situation, I later regretted I had not invested much more time in learning the framework so I could had detected bad design that later slowed us.

Pair program with a few people half a day for a week or two, and you'll become productive both with codebase and with your team mates (as long as you didn't capture their keyboards). Spend rest of the day on everything else mentioned in @Jedi72's comments and get the larger context.

Depends on what is expected of you, as well as your margin for manoeuvre. Some seniors are supposed to do one feature and then guide the juniors to replicate that feature, others will have more responsibility, etc.

If you are the bridge between business and dev, you have to manage business expectations and point what needs to be changed. If you are the technical guidance of the team you have to be on top of code quality and architecture.

In the end, the role name doesn't really mean a thing, it's what is expected of you and your legroom that matters. Both are dynamic. In fact, every team member is free to suggest improvements, how fast they are implemented just depends on who approves them. Senior normally just means you don't need alot of supervision.

I'm similarly one month into a new senior development role. Although I am familiar with the tech stack, the system's functionality, deployment environment, and points of interface with other systems are fairly complex. The other devs on the team will answer specific questions but are pretty stingy with their time when it comes to something like an overview of the system. I could (and have) read through a fair amount of the code, but IMO, that is very inefficient and potentially misleading way (you might be able to see what is happening but not necessarily why, something that is important for senior devs to understand) to come up to speed on it. Oddly enough, the organization has a very structured onboarding process for everything except coming up to speed on the specific details of their product's functionality. Frankly, learning everything about their expense reimbursement policy and the health insurance website by day two is nice, but not nearly as useful as coming up to speed on how their product works. I'm finding it pretty frustrating.

Both here and in previous roles, I start with the data model and work my way backwards. If you don't understand the data, it's nearly impossible to understand what the system does. In previous roles, I'd also browse through the production environment (e.g., AWS) and look at the monitoring pages for the different components (databases, EC2/ECS instances, queues, etc.) in order to get both a sense of the topology of the production environment and where bottlenecks might exist. In previous roles, I've been able to make some significant improvements early on as a result of that. In my current role, utilization is <5% on almost everything, so that hasn't been helpful.

Earlier in my career, I had a team lead who hired me into a role on a complex system and he scheduled one hour per day with me for a full month to walk through various aspects of the system. Obviously that was a pretty significant time commitment, but I think it paid off for him in that I came up to speed pretty quickly and was quite productive. That always stuck with me and when I became a team lead, I always budgeted a significant amount of my time to bring new members of my team up to speed. I'm pretty frustrated in my current situation as an unproductive senior developer due to unfamiliarity with the functionality of the system and confused as to how its beneficial to the company to operate this way since I am rather well-compensated. I think it's quite penny-wise and pound-foolish for organizations to hire developers and invest as little time as possible in bringing them up to speed.

> I'm pretty frustrated ... and confused as to how its beneficial to the company to operate this way

It's very likely not a considered strategy; rather the total opposite. Technical managers and by extension teams are, generally, really bad at onboarding.

It's nobody's fault per-se, just an unfortunate artefact of giving people responsibility for something they never trained for, usually without any formal plan, and expecting them to just figure it out on instinct.

Of course, most don't. Some do though, really well, and if you find one you've found a gem! It's unfortunately not the norm. Don't frustrate yourself thinking that it is.

Do you have Team you're overseeing or working with?

I'd take the time to ask each member of the team, what they are working on, what problems they are having, and how you can assist them in some way.

The product your company/team is building will have a roadmap of features to be delivered, spend time with members of the Dev team talking through each of those features, what is required, how they plan to build them.

As a senior, you may not know the platform, but you can get to know how the team works, offer suggestions and identify probable pain points. There might be entire sections of code you can work out that needs to be written which don't necessarily require knowledge of the platform.

All this will take time, be incredibly useful for everyone and in your spare time you can learn about the platform.

Me go-to for learning a new code base is fix bugs, fix performance problems.

The bugs give me an overview of the system and let me develop my mental map of the code base, the performance problems give me an overview of what's been done wrong.

Write all the unit tests the team has been slacking on. By the time you are done, you will understand all of the code and have full test coverage

Note to junior engineers reading this: If you don't already know the codebase, this can be a terrible way to get to know it because you get sucked down rabbit holes and end up having spent 2 weeks with not much to show for it.

You need to exercise skill of being able to do an organised exploration of the codebase, probably producing documentation along the way.

Get into the project by working right away on implementing a new feature or fixing an existing bug. Then spend additional time from your spare time learning the new stack. Bam, productive by end of week 1, 2 at the latest.

Refactor code as you read it. If you see a method that doesn't make sense to you, change it. Even if you don't merge your changes, the exercise of trying to solve the same problem the original engineers were trying to solve is, in my opinion, the fastest and most effective way to get up to speed with a codebase.

Err, I would advise to absolutely don't do this as a new employee. Ever.

If you see a method which doesn't make sense to you, ask about it. Why is it running the way it is? What corner case prompted it? Is there testing which covers it?

Only once you're sure that the code really is problematic should you refactor it. Otherwise, you're begging for regressions, new bugs, and conflicts with people who wrote the code in the first place.

Understand the impact of codebase you're looking at before you try and change it.

My approach is to learn the stack (read books, sketch/prototype).

Get the team to whiteboard the architecture with you. This is better than existing documentation because you will get the full commentary and start building a mental model of the software.

Ask the team about problem areas.

Don't push on major stack/architecture changes until you know the stack/architecture.

Make sure the base engineering practices are there. You should have a seamless workflow from pre-merge code reviews, testing, and deploy.

I'd just pick a bug/feature that seems short and relatively contained and isn't urgent, probably from the backlog.

Inform management, and make sure it'll be deployed when you fix it - you want to learn the entire process of development and not just the code.

Start learning the architecture, stack and development process with the goal of fixing that single bug. You don't have to learn the entire stack/architecture to fix that single bug. Try not to refactor the entire system while your fixing a bug - the point is to learn how the system works, not to redesign it.

Make notes of the pain points you have - from setting up a dev environment, to lack of documentation or even communication problems.

When you finish, do another bug, perhaps something more challenging.

If you feel that you need to read a book, or do a tutorial on part of the stack you don't know to fix the bug, then do so.

When your done with a few bugs, you are in a position to know and prioritize what quality and changes could be helpful, try and implement them.

Have the mentality and an unparalleled drive to execute and continuously ship and deliver tangible business value.

A lot of advice here are the same you could find in files of open source projects on Github :)

* read the docs

* document what is not documented

* write tests

* look at small issues/todos and write code for them

Fix bugs.

It will help you to learn the system and technology better than any documentation will. Also it keeps you of the way of the other developers who frankly have better things to do than babysit you (just being honest) but also gives you regular opportunities to chat with everyone as you work through different issues.

Either that or improve monitoring/supportability. Gives you an opportunity to build relationships and earn brownie points with your DevOps/SysAdmins.

Find out what all the blockers are. What is stopping all the dev's from being productive consistently.

Find out what are the black holes (the problems that will pull you in and not let go).

Introduce one improvement to the exsiting processes (automation, documentation e.g. wiki, lunch & learn/brown bag/dev days etc)

I did this:

- got a broad overview of the code from my team

- agreed to be the on-call just after two weeks of time; learned a lot because I had to move fast to fix issues (also call people)

- asked the starting point and the core part of the entire codebase (helped me understand code at least 3x faster)

- deployed a feature within 2 weeks of joining

Note: I am not a senior developer, but a software engineer.

Well, first of all, talk to the team. To every single person involved. Be it the CTO or the intern. Have someone who has been on the team for some time onboard you and find out what the current obstacles are. Then just grab one task and start working. You'll find the place you're needed most automatically.

I don't favor this because then you get someones outdated view of the product, if at all. In an ideal world, yeah, you should get onboarding, and some walk throughs from the current developers. In reality this rarely happens in my experience. The code deployed to production is the ultimate source of the truth.

I recently witnessed an instance when a senior develoer heavily relied on others for information as you suggest. The team was failing to deliver the manager was incompetent. This senior developer I hired on with was made a scapegoat and blamed for slowing down the team and causing missed deadlines. This was absurd of course, but she got away with it, and he was fired.

I've onboarded as a senior developer onto a new project few different times: twice at a startup, once at a largish company, once at a very large company, and once at a coming-out-of-startup phase company (that time as a hybrid dev-manager). A few pointers:

1. First thing is to come up with a ramp-up plan. Before you touch any code, before you fix bugs, make sure you have a 30- and 90- day plan for where you need to be and what you need to know. That plan should include with coming up with questions for people and meeting with them to ask them, understanding the company's strategy at a sufficient level, and, yes, understanding the code base and development procedures.

2. As a Senior Developer, it's good to know where the project is now, but people are looking to you to know where the project will be in 6-9 months. So, while fixing bugs is good, don't get bogged down in that - after some initial familiarization, take up tasks that let you focus on big picture architectural and process issues.

3. Take time to get to know your colleagues. Have lunch with them. Have non-work conversations. Start to learn what interests them. Without explicitly asking, pay attention to clues about career goals and what they find challenging. As a senior developer, demonstrating that you are genuinely interested in your colleagues goes a long way towards earning trust and getting embedded into the team faster.

3a. Get a sense of the team's culture. Are there cultural norms you're not used to? Are these norms good/bad/indifferent?

4. Given that you are a Senior Developer, you should make sure your team is hitting a lot of basics. Are there feedback loops, both business-wise and process-wise to your team? If not, you should work on getting these in place (or at least making sure that it happens). Business feedback loops should include some kind of KPI report, adoption report, etc. Process feedback loops usually include regular, honest and constructive retrospectives. Usually if you're hired as a senior developer on a team, most of your teammates will have limited experience in these.

5. Whenever possible, share your perspective. One of the big differences between a senior developer and non-senior developers is that you've seen a few projects, and you've probably seen pitfalls along the way. That's probably a big reason why you were hired. A great way to add value to a team when you still don't know the nitty-gritty is being able to say "It sounds like we're doing X. I was on a team once that did X, and A, B and C happened." Of course, always do it from a point of view of humility - there might be new constraints that you didn't know about.

Beyond that, a lot of it will depend. If you're joining a large company, you should spend some time doing "company networking," i.e. , meeting other senior developers, project managers your team works with, understanding not just your project but other projects near you in the org. If you're joining a start-up, chances are there is a deadline in a few months, so the emphasis will be much more on delivering something immediately. Writing documentation can sometimes be helpful, but if you're at a startup that now has hired its full allotment of engineers, writing a ton of documentation about how to get up to speed is of limited use. Fixing bugs can sometimes be a good approach, but sometimes for teams that didn't have a senior developer before, the proper fix involves a design change. Is there someone else who is also interested in raising the bar for engineering practices, but doesn't know where to begin? Work with that person, since you have the experience as a senior developer, and that person understands the specifics of the team.

Starting with a new (ish?) machine, write down everything you need to do to get your development environment set up and building the code.

Now, take a step back. How easy was this? It should ... must... be absolutely frustration free. If not, fix that.

To come up to speed on the tech stack, go sit with the other developers. Get a feel for how they are using the technology. Then, start picking off some smaller issues from the backlog and try and fix those.

Don't make assumptions. Learn the tech without an opinion. Sure, you won't like some of it (nobody likes everything), but find out if the team likes it. If they do, don't try to change that.

I think it is a little bit of exaggeration to claim Senior Developer title on a project with unfamiliar tech stack.

Every place has its own names for things, but I often see "senior" refer to your interchangeable seniority in the field, not just in the one company/code base. Then for people to grow into "lead" or "principal".

Senior Developer is typically either a junior or intermediate title.

In some places, the title progression for technical professionals is (dev, senior dev, staff dev, principal dev, arch, senior arch), meaning Senior is actually a junior title.

In other places, the title progression is (dev I, dev II, senior dev, staff dev, senior staff dev, principal), meaning Senior is an intermediate title.

There is of course some variability across companies, but it's safe to say that "Senior Developer" is never an actually senior position. People holding it are never in charge of big things.

It's worth noting that most non-enterprise companies (i.e. not Microsoft, IBM, or Oracle) don't implement staff or principal prefixes, making senior the top of the list. I haven't had a principal prefix since I worked for Oracle (and got there via an acquisition where I was a senior), yet I'm always at or near the top of the development ladder for my department.

As always, you need to ask questions, not make assumptions.

To the contrary, I'd say the ability to pick up a product in an unfamiliar tech stack and be productive relatively quickly is one factor that would make you more deserving of the "senior" qualification.

It's the way of the world. Tech stacks have life spans roughly one fifth of the duration of human careers.

much of the knowledge in one stack translates to another. you can know RDBMS, and pick up, say, spanner. you can know IIS, and pick up node.

I never cared about what stack someone uses.

Same here, s an example, all ORM's have to model the same abstractions for example, so after you've worked with one, the next is MUCH easier to pick up. Same with MVC frameworks, etc. There's a lot of convergence among languages and frameworks now.

Close this section

U.S. top court tightens patent suit rules in blow to ‘patent trolls’

By Andrew Chung | WASHINGTON

WASHINGTON The U.S. Supreme Court on Monday tightened rules for where patent lawsuits can be filed in a decision that may make it harder for so-called patent "trolls" to launch sometimes dodgy patent cases in friendly courts, a major irritant for high-tech giants like Apple and Alphabet Inc's Google.

In a decision that upends 27 years of law governing patent infringement cases, the justices sided with beverage flavoring company TC Heartland LLC in its legal battle with food and beverage company Kraft Heinz Co (KHC.O). The justices ruled 8-0 that patent suits can be filed only in courts located in the jurisdiction where the targeted company is incorporated.


In Bombardier fight, Boeing sees ghost of Airbus ascent

China slaps import duties on sugar; experts question impact

The decision overturned a 2016 ruling by the U.S. Court of Appeals for the Federal Circuit, a Washington-based patent court, that said patent suits are fair game anywhere a defendant company's products are sold.

Individuals and companies that generate revenue by suing over patents instead of actually making products have been dubbed "patent trolls."

The ruling is likely to lessen the steady flow of patent litigation filed in a single federal court district in rural East Texas because of its reputation for having rules and juries that favor plaintiffs bringing infringement suits.

Heartland said the ruling will limit the ability to "shop" for friendly courts.

"Individuals and businesses in the U.S. have been unfairly required for decades to defend patent suits in far off locales adding cost, complexity and unpredictably to the intellectual property marketplace," company Chief Executive Ted Gelov said.

Kraft Senior Vice President Michael Mullen said the company was disappointed in the ruling but did not believe it would affect the outcome of its lawsuit.

The dispute began when Kraft filed a patent suit involving liquid water flavorings in Delaware federal court against Heartland, a subsidiary of Heartland Consumer Products Holdings.

Heartland sought to transfer the case to its home base in Indiana, arguing it has no presence in Delaware and 98 percent of its sales are outside of that state. The appeals court denied the transfer last year.

Even though the lawsuit was not filed in Texas, the arguments in the case touched on the peculiar fact that the bulk of patent litigation in the United States flows to the Eastern District of Texas, far from the centers of technology and innovation in the United States.

More than 40 percent of all patent lawsuits are filed in East Texas. Of those, 90 percent are brought by "patent trolls," according to a study published in a Stanford Law School journal.



High-tech firms in particular have been vocal about the need for legislation to curb patent suits, including by limiting where they are filed. Recent efforts in Congress have failed.

Over the years, companies such as Apple, Google, Samsung Electronics Co Ltd and Microsoft Corp have been frequent targets of patent lawsuits, including in East Texas.

Limiting patent lawsuits to where a defendant company is incorporated would potentially make it harder to extract lucrative settlements from businesses being sued, and easier to get cases dismissed.

Such changes could potentially dissuade some cases from being launched in the first place, said Illinois Institute of Technology Chicago-Kent College of Law professor Greg Reilly, who has studied the issue of patent venue.

"This is a positive step for those who think there is a problem of a lot of poor-quality patents being enforced," Reilly said.

A 1990 ruling by the Federal Circuit loosened the geographic limits on patent cases and has served as a blueprint for such cases ever since. The Federal Circuit denied Heartland's transfer by relying on the 1990 ruling.

Heartland urged the Supreme Court to overturn that decision, arguing that the high court's own precedent from 1957 held that patent suits are governed by a specific law allowing suits only where defendants are incorporated.

On Monday, the Supreme Court agreed. Writing the opinion for the court, Justice Clarence Thomas said that, contrary to the Federal Circuit's rationale, the U.S. Congress did not change the rules over where patent suits may be filed since the 1957 decision.

Justice Neil Gorsuch joined the court after it heard arguments in the case and did not participate in the decision.



(Reporting by Andrew Chung; Editing by Will Dunham)

Close this section

James Gosling, the ‘Father of Java,’ joins Amazon Web Services

Amazon Web Services Inc. today pulled off a minor coup with the announcement that it’s hiring legendary engineer James Gosling, the so-called “Father of Java,” as a distinguished engineer.

Gosling, who created the popular Java programming language that is used today in much of the world’s business software, was previously serving as chief software architect at a company called Liquid Robotics Inc., which was acquired in December by Boeing Co.

Gosling wrote Java in the early 1990s while working at Sun Microsystems Inc., before leaving the company after it was acquired by Oracle. He later appeared for a brief stint at Google Inc. before spending the last six years at Liquid Robotics, which is developing an autonomous boat called the Wave Glider,

On Facebook Gosling posted that he’s “starting a new adventure” with AWS, but didn’t reveal exactly what he’ll be working on.

What’s interesting is that Gosling, never one at least in recent years to keep his views to himself, might have caused a bit of upset at AWS last year at the IP Expo Europe event in Seattle in 2016, when he suggested that cloud companies were trying to lock in their customers. “You get cloud providers like Amazon saying: ‘Take your applications and move them to the cloud.’ But as soon as you start using them you’re stuck in that particular cloud,” he said at the event, according to The Inquirer.

But because of the importance of Java, Gosling also has a lot of credibility among business decision makers, who are generally quite conservative about the technology they purchase and use. As such, he could well be useful to AWS, which is competing against rivals like Google Inc. and Microsoft Corp. to persuade those decision makers to adopt its cloud services to run more of their workloads.

Still, Gosling admitted at the same event last year that some customers will take a lot of convincing to switch to the cloud. When asked why Liquid Robotics kept all of its information technology in-house, he explained that the company felt it couldn’t use any cloud services at all. One reason he gave at Nvidia Corp.’s GPU Technology Conference last year was that when the company’s craft is at sea, it’s prohibitively expensive to reach the cloud. “When you’re way out in the middle of the ocean, there’s no cell tower,” he said. “Most of the satellite communications suck and are really expensive.”

At the IP Expo Europe, he added other reasons, again rather specific to his company: “In my case, there are no cloud providers [Liquid Robotics] can use, so we end up rolling our own for everything, which is a real pain. I mean, a lot of the cloud providers would make my life hugely easier, but convincing a random coast guard from some random country that they should trust Amazon is really, really hard.”

Given his skill set, it seems like a safe bet that Gosling will lend a hand with Amazon’s Internet of Things-related software and services as well.

With reporting from Robert Hof

Photo: Peter Campbell/Wikipedia

Close this section

What does a hen do with her unfertilized eggs?

A hen does not know if her eggs are fertilised or not. In fact (much like a human) a rooster can be infertile, so a hen's eggs might not be fertilised even if she is in a flock with a rooster.

Many modern breeds and commercial hybrid hens will do nothing with their eggs other than lay them and walk away. Many have had the instinct to brood [sit on their eggs to hatch them] bred out of them over generations. In a modern egg production facility, you do not want a hen to "go broody". When hens are ready to raise chicks, they will stop laying eggs for that period and it's very hard to convince them to give up the idea.

This does not mean that no hens will brood eggs; many breeds still retain their instincts to mother. Silkies, for instance, are renowned for their desire to sit on eggs. Other breeds, such as Orpingtons, Brahmas, Cochins, Marans, Cornish, and others go broody quite regularly. When a hen that has broody instincts lays an egg, she is forming a 'clutch' of eggs. She does nothing to care for these eggs other than hide them in a secure place until she is ready to sit on them. She will continue to lay eggs in this clutch until she has 'enough', which is a number anywhere from seven to as high as 20-plus. Once there are 'enough' eggs, a hormonal switch will occur that will put her into what's best described as a broody trance. She will stop laying eggs and begin to sit on them instead.

There are very good reasons that she does not sit on the eggs from the beginning. Firstly, she needs to continue to eat and drink so that she doesn't lose body condition and can continue to produce eggs for her clutch. Secondly, all the eggs need to begin developing on the same day. An egg does not start forming a chick as soon as it's laid. Instead, the eggs are kind of in a state of suspended animation. Once an egg is above about 98 degrees Fahrenheit for approximately 24 hours, however, it will begin to develop. This way, all the chicks start developing when the hen settles down to sit on them and are all developing at the same time. Then all the chicks will hatch over a short period (usually less than 24 hours) and are all ready to venture out for food.

Two to three days after the first chick has hatched, the mother hen will come out of her broody trance and start to care for the chicks. In the meantime, the chicks will all stay under the mother and require no food or water; they are fed from the remnants of the yolk that is in their body for this purpose. The mother will care for them for a while – the exact time is different for each mother hen. Some care for them only until they are 12 weeks old, some will care for them longer.

Erika Wiggins, owns a small egg farm

This is an edited answer from What does a hen do with her unfertilised eggs? which originally appeared on Quora: The best answer to any question. Ask a question, get a great answer. Learn from experts and get insider knowledge. You can follow Quora on TwitterFacebook, and Google+.


Close this section

Memcached-Backed Content Infrastructure at Khan Academy

Last post, I wrote about how we did profiling on App Engine's Memcached service to plan our new content-serving infrastructure that will allow us to scale our new content tools to many more languages. But that was all talk, and at Khan Academy we're all about having a bias towards action. So let's talk about how we tested the new backend, rolled it out to users, and the issues we ran into along the way!

Recap: Content serving, the old way

Tom and William have written about our content infrastructure in the past, but as a quick recap, we have a custom content versioning system that allows content creators to edit the content and then later publish it to the site. The end goal of that publish process is to build a new version of our content data, which includes a specific version of each item of content (such as a video, exercise, or topic) at Khan Academy. In most cases, these are stored as a pickled python object with the metadata for a particular piece of content – the actual video data is served by YouTube. The goal of our content serving system is to get all that data out to any frontend servers that need it to display data about that content to users, and then atomically update all of that data when a new version is published. The frontend servers can then use this content data to find the URL of a video, the prerequisites of an exercise, or all the articles within a topic.

In principle, this is a pretty simple problem – all we need is a key-value store, mapping (global content version, content ID) pairs to pickled content item objects. Then we tell all the frontend servers which content version to look for (an interesting problem in and of itself, but outside the scope of this post), and they can fetch whatever data they need. But that doesn't make for very good performance, even if our key-value store is pretty fast. Even after we cache certain global data, many times we need to look at hundreds of items to render a single request. For instance, to display a subject page we might have to load various subtopics, individual videos and articles in the subject, exercises reinforcing those concepts, as well as their prerequisites in case students are struggling.

In our old system, we solved this problem by simply bundling up all the content data for the entire site, compressing it, loading the entire blob on each server at startup, and then loading a diff from the old version any time the content version changed. This gave us very fast access to any item at any time. But it took a lot of memory – and even more so as we started to version content separately for separate languages, and thus might have to keep multiple such blobs around, limiting how many languages we could version in this way. (Between the constraints of App Engine, and the total size of data we wanted to eventually scale to, simply adding more memory wasn't a feasible solution.) So it was time for a new system.

Content serving, the new way

After discussing a few options, our new plan was fairly simple: store each content item separately in memcache, and fetch it in on-demand, caching some set of frequently used items in memory. We simulated several caching strategies and determined that choosing a fixed set of items to cache would be a good approach, with some optimizations.

We went ahead and implemented the core of the system, including the in-memory cache, and the logging to choose which items to cache. In order to avoid spurious reloads, we stored each item by a hash of its data; then each server would, after each content update, load the mapping of item IDs to hashes, in order to find the correct version of each item. We also implemented a switch to allow us to turn the new backend on and off easily, for later testing.

Testing & Iterating

But things were still pretty darn slow. It turned out we had a lot of code that made the assumption that content access was instant. Everywhere we did something of the form

for key in content_keys:
    item = get_by_key(key)
    # do something with item

was a potential performance issue – one memcache fetch is pretty fast, but doing dozens or hundreds sequentially adds up. Rather than trying to find all such pieces of code, we set up a benchmark: we'd deploy both the old and the new code to separate clusters of test servers, and run some scripts to hit a bunch of common URLs on each version. By comparing the numbers, we were able to see which routes were slower, and even run a profiler on both versions and compare the output to find the slow code.

In many cases, we could just replace the above snippet with something like the following:

content_items = get_many_by_key(content_keys)
for item in content_items:
    # do something with item

This way, while we'd have to wait for a single memcache fetch that we didn't before, at least it would only be a single one. In some cases, refactoring the code would have been difficult, but we could at least pre-fetch the items we'd need, and keep them in memory until the end of the request, similarly avoiding a sequential fetch.

This was a pretty effective method – on our first round of testing, half of the hundred URLs we hit were at least 50% slower at the 50th percentile, and was over 7 times slower! But after several rounds of fixes and new tests, we were able to get the new version faster on average and on most specific URLs. Unfortunately, this didn't cover everything – some real user traffic is harder to simulate, and because caching is less effective on test versions with little traffic, there was a lot of noise in our small sample, and so small changes in performance were hard to spot, even if they affected very very popular routes where that small difference would add up.

So we moved on to live testing! Using the switch we had built, we set things up to allow us to more or less instantly change what percentage of our servers were using the new version. Then we rolled out to around 1% of requests. The first test immediately showed a blind spot in our testing – we hadn't done sufficient testing of internationalized pages, and there was a showstopper bug in that code, so we had to turn off the test right away. But with that bug fixed, we ran the next test for several hours on a Friday afternoon. We still saw a small increase in latency overall; the biggest contributor was a few routes that make up a large fraction of our traffic that got just a little bit slower. After more improvements, we moved on to longer tests on around 5%, then 25%. On the last test, we had enough traffic to extrapolate how the new backend would affect load on memcache, and scale it up accordingly before shipping to all users.

Rollout & Future work

Finally, we were able to roll out to all users! Of course, there were still a few small issues after launch, but for the most part things worked pretty well, and our memory usage went down noticeably, as we had hoped. We still haven't rolled out to more languages yet – but when the team working on the tooling is ready to do that, we'll be able to.

We do still have some outstanding problems with the new system, which we're working to fix. First, we've opened ourselves up to more scaling issues with memcache – if we overload the memcache server, causing it to fail to respond to requests, this sometimes causes the frontend servers to query it even more. Now that we depend on memcache more, these issues are potentially more common and more serious. And the manifest that lists the hashes of each item in the current content version is still a big pile of data we have to keep around; it's not as big as the old content bundle, but we'd love to make it even smaller, allowing us to scale even more smoothly. If you're interested in helping us solve those problems and many more, well, we're hiring!

Close this section

How Anker is beating Apple and Samsung at their own accessory game

Steven Yang quit his job at Google in the summer of 2011 to build the products he felt the world needed: a line of reasonably priced accessories that would be better than the ones you could buy from Apple and other big-name brands. These accessories — batteries, cables, chargers — would solve our most persistent gadget problem by letting us stay powered on at all times. There were just a few problems: Yang knew nothing about starting a company, building consumer electronics, or selling products.

“I was a software engineer all my life at Google. I didn’t know anyone in the electronics manufacturing world,” Yang tells me over Skype from his office in Shenzhen, China. But he started the company regardless, thanks in no small part to his previous experience with Amazon’s sellers marketplace, a platform for third-party companies and tiny one- or two-person teams interested in selling directly to consumers. He named the company Anker, after the German word for ship anchor.

Anker has since become the most popular brand of portable battery packs on Amazon.

Portable chargers had a bit of a standout moment last summer, when players of the wildly popular and battery-hogging Pokémon Go could be seen roaming the streets, their phones constantly plugged in. But these accessories have existed for years as a remedy for our battery woes since the standard smartphone tends to last no more than a day on a single charge. So in airports, the back of cabs, and on city streets we’re plugging into lithium-ion slabs in our pockets and bags to stay connected. The market for portable battery packs generated $360 million in the 12 months ending in March 2017 in the US alone. The brands behind these packs are largely anonymous — Kmashi, Jackery, and iMuto — and they often stay that way.

Except Anker. The steady rise of the company’s profile is proof that it’s possible to meet one very specific consumer need and ride that wave as it continues to ripple out to other markets. A majority of Anker’s sales come from cables and wall chargers, and it’s now moving into the smart home and auto market — anywhere a plug and a cable can solve a problem.

Yang and his team started a company with the sole purpose of selling a better third-party accessory. But they stumbled onto a more lucrative reality: mobile phones, once niche luxury items, are now ubiquitous centerpieces of our digital lives. Each of these phones, and all the products that connect to them, need their own cable and plug. And each and every day these devices die before we want them to.

If there’s anything you should know about smartphone development over the last 10 years, it’s this: batteries are not lasting longer. Phones may be thinner and faster, with larger screens and better cameras, but they don’t stay powered for longer than a day, perhaps 36 hours at best.

The culprit is the fundamental science of lithium-ion batteries, which are tremendous at storing energy but only within limited size and capacity constraints.

The lithium-ion batteries we have today operate at about a fifth of their potential storage capacity, says Lynden Archer, a professor of chemical engineering at Cornell University. And yet that’s still at about 90 percent of the maximum potential today’s battery science allows. “A breakthrough would require a new paradigm,” he says.

That stagnation has allowed Anker to flourish. “A huge part of Anker’s success is the fact that our batteries don’t last long enough,” says Joanna Stern, a technology columnist and gadget reviewer for The Wall Street Journal. “They saw that accessories needed to be made to address customers’ pain points. They’re making the chargers, the cases, the cords — and they’re making it for affordable prices so you can have one of these in every place you are.”

Mobile battery packs weren’t the company’s initial goal. In the late 2000s, Yang identified a consumer need not being met: well-built, reliable laptop battery replacements. “You have a Dell or HP laptop — let’s assume you got it in 2009 — and in 2011 the battery is dying and you want to get a new one,” he says. Back then you had two choices: buy directly from Dell or HP, with a high price tag, or buy a white-label battery that’s cheap but poorly made. “Which one do I want to buy?” asks Yang. “The answer is neither of them.”

Yang saw a desire for a better type of accessory — one that wouldn’t cost as much as a replacement straight from the original manufacturer, but that would be of a high enough quality to earn consumers’ trust. Before he could sell that product, he needed to figure out how to make it.

It was a long and painful process. After he quit his job at Google in July 2011, Anker took 12 months just to prototype its first laptop battery. That was even after Yang and the core team moved to Shenzhen to find reliable manufacturing partners. “I knew that if I stayed in California and had people FedEx me prototypes in a week, it was just not going to work,” he says.

Many hardware companies, especially crowdfunded ones in the US, learn that lesson the hard way, missing deadlines and hitting snags that lead to months-long delays. A solid supply chain is so crucial to a hardware company’s survival that there’s an entire consulting industry around helping startups find suppliers and set expectations.

Critical to avoiding these pitfalls was Dongpong Zhao, Google’s then-head of sales in China. Zhao joined Anker in early 2012, and helped Yang build out the company’s supply chain. Anker was made up of around ten people at the time. “Think of it as a small family business instead of like an actual company,” Yang says. The company tirelessly built out a supply chain in its first year and began testing its first products — the laptop chargers and batteries it would begin selling direct to consumers on Amazon.

From there, Anker ventured into smartphone batteries with a replacement unit for the HTC Sensation. “We were able to make cellphone battery cells that actually had slightly better capacity,” Yang says. “That got us quite the reputation.” Early on, Anker developed close relationships with suppliers in Asia, including Panasonic and anode supplier BTR, to help test and quickly develop new batteries.

Yang says he and Anker’s small team “definitely saw the explosion of smart devices” as its biggest and most obvious opportunity, and Anker started aggressively expanding into portable battery packs, wall chargers, and cables. By the end of 2012, after shifting more resources toward portable chargers, Yang says Anker went from selling 100 to 1,000 products per day.

“The challenge wasn’t selling products,” Yang says. “It was making products and making sure they were high-quality as well. That’s why we spent a majority of our effort on R&D and product development.” The company does a majority of its sales directly to consumers over Amazon Marketplace, where a combination of strong reviews, low prices, and prominent placement in search rankings can turn a single product into a lucrative line.

For the accessories market, which tends to piggyback off trends in mainstream consumer electronics, a true breakthrough is rare. But Anker found one in charging, based on the realization that while batteries may not be improving, charging time certainly was. According a study conducted last year by PhoneArena, it took more than two hours on average to charge a device to 100 percent in 2013. Today, it can be done in almost half the time.

So Anker set out to make the fastest chargers available.

Take the PowerPort 5, a matte black rectangle no larger than a deck of playing cards, with five USB ports. When it was first introduced in 2015, it was the only accessory on the market capable of charging five devices simultaneously at optimal speed. Or the company’s standard PowerCore portable charger, roughly the size of a credit card and just under one inch thick — a white block encasing 10,000mAh of lithium-ion battery cells. It has four pinholes that light up in LED blue when you press an elliptical button on the top to check its remaining charge. It can refill a depleted iPhone 7 in just over 60 minutes nearly four times over before it needs to be recharged.

Most Anker charging products have one signature: the PowerIQ logo. Launched in 2013, the company’s proprietary charging standard is now present on nearly all of its batteries and wall plugs. The technology, carried by a small chip inside each charger, identifies whatever device is being plugged in, be it an iPhone 7 Plus, Google Pixel, or an iPad Pro 9.7-inch, in order to detect and deliver the maximum current the product allows. Anker says the technology can shave hours off the amount of time it takes to reach a full charge. A next-generation version of the chip, a sequel to PowerIQ, is slated to start shipping in new Anker charging products later this year, allowing for smaller and lighter accessories.

On top of its products’ technical advantages, Anker banks on looks to make its chargers stand out. Brand manager Elisa Lu says every device’s packaging is carefully thought out from the moment customers receive the product, to when they open it and use it. Devices come in a white-and-light-blue box with the name Anker printed in all caps across the front. Inside is a meticulously packaged arrangement of lightweight cardboard. Anker products carry that new gadget smell — a blend of peculiar odors from the evaporation of chemical compounds in the plastics and the epoxy coatings of the device.

Beside its red cables, which Anker offers in both standard and braided nylon versions, and a single one-off red battery pack, the company sells products in only two color options: black and white. All this has a strategic business purpose. “When our consumers receive our products, we want to make sure they know they are getting it from a reliable and trustworthy company,” Lu says. Brand trust is a pivotal consideration when purchasing what may seem like random electronics off Amazon.

Inside the the box with Anker’s larger products, you’ll also find a small square piece of paper with two questions: are you happy, or are you unhappy? If you’re unhappy, a block of text instructs you to contact Anker customer support via phone, email, or on its website. If you’re satisfied, it asks you to tell your friends or family. Better yet, leave an Amazon review.

In many ways, Anker’s success is born from the failures of premier manufacturers like Apple and Samsung. Where those companies introduce points of friction — like ever-thinner devices with short battery lives — Anker offers a remedy.

“My feeling is that Anker is a success because Apple offers such subpar and expensive accessories,” says Stern, who’s often recommended Anker products in her WSJ column. It may be surprising that Apple sat idly as the accessory market ballooned around it; it took the company years to develop its own battery case for the iPhone to compete with Mophie. But Apple has always favored high margins on premium products, even the cables and earbuds it sells in the Apple Store.

This has given ample room to companies like Anker, with the low overhead of an e-commerce business, to sell similar products of near-identical quality for $10 to $20 cheaper. And where Apple and others failed to develop products of their own like multi-port wall plugs and portable chargers, Anker saw the opportunity and capitalized on it. The company is quick to fill gaps created by smartphone manufacturers. When Apple removed the headphone jack on the iPhone 7, for instance, Anker saw a giant opportunity to begin producing new dongles and other cable adapters to help consumers adjust.

But mastering Amazon was the real key to securing Anker’s future.

If you head over to right now and peruse the retailer’s most popular smartphone accessories and you’ll find Anker products at the top. Search for a generic accessory like “cellphone wall charger” or “Lightning cable” and you’ll find the Anker brand high up in the search results, typically with a 4.5 or 5-star rating. In fact, Anker products take up the first five slots on Amazon’s list of best-selling portable batteries. And alongside the ratings, you’ll find thousands of customer reviews — reviews that the company scans meticulously for ideas on how to improve their development process.

Two years before he left Google, Yang became intimately familiar with Amazon’s Marketplace when he built an automated system for a friend who was selling third-party products on Amazon as a side business. It essentially took care of aspects of the business like inventory, logistics and fulfillment, and sales tracking. “It took me two months, evenings and weekends, to develop this system,” Yang explained in a video produced by Amazon last year to promote entrepreneurship on its platform. “Within a month, she was fulfilling 300 orders a day.”

Building this system gave Yang invaluable insight into how third-party selling on Amazon functioned. He discovered what worked and what didn’t, and how entire brands could crop up overnight and fade into obscurity the next day. Anker thrived by borrowing infrastructure from Amazon and relying on engineering and support in China.

“Amazon provides loads of services — financial services, fulfillment. It actually makes selling a low-barrier thing,” Yang says. Although the company attempted to handle its own fulfillment in its early days, it eventually discovered it could not compete with what Amazon offered. Now, about 95 percent of Anker’s products are listed as “Fulfilled by Amazon.”

You can build an online retail business off Amazon, but making sure your products get exposure is the hard part. “The biggest challenge for Anker is not sales. It’s customer perception,” Yang says. “People think they need [Apple’s] cube charger and stock cable.” Teaching people that there is a superior product out there, and then convincing them to keep buying, is Anker’s core challenge. “Amazon reviews were really enormously helpful,” Yang says. “The Amazon reviews just come and come automatically.”

Silicon Valley is full of breathless mission statements designed to inspire and justify the power and influence of technology, like Apple’s famous “think different” slogan, and Google’s ominous and now defunct “don’t be evil” mantra. Facebook, back in 2012, celebrated its billion-user milestone with an ad comparing the social network to chairs, bridges, and even nations.

Anker has never aspired to such grand ambitions. “At Anker, we can't exactly help you unwind,” the company admits on its Amazon sellers page. Instead, Anker takes a more straightforward approach by solving the inevitable problems technology creates. “Say goodbye to first-world tech woes like oppressive low batteries and limited ports,” the page says. “Say hello to an easier, smarter life.”

Yang is trying to extend this simple philosophy to categories like headphones, speakers, phone cases, and now smart home appliances under a new brand called Eufy. Launched last September, Eufy is Anker’s avenue for selling things like Roomba clones, desk lamps, and bathroom scales.

Expanding into more product categories is a logical evolution for Anker, but it’s also a response to a looming existential threat: Yang says he foresees a future where portable chargers won’t be necessary due to advancements in both fast charging and wireless charging. “I think we all agree that the portable charger isn’t forever,” Yang says. But consumers will always need wall plugs and cables, and Anker sees its goal now as keeping pace with changing standards, like the introduction of USB-C.

PowerCore 5000.

In the meantime, Yang says it’s diversifying with a future expansion into audio, smart home, and automotive product lines. Anker already produces Bluetooth headphones and speakers, but it has aspirations to compete with the likes of Harman Kardon and Bose.

Key to the success of those products is moving to brick-and-mortar retailers. Terrence Wang, the company’s chief marketing officer hired from Procter & Gamble in 2015, says having a presence in Best Buy and Walmart, a process that began last year, is the next stage of Anker’s evolution. “Expanding from online to offline is critical actually,” Wang says. “We are trying to get as many people to know that they have a simple, convenient, portable, fast solution rather than just have to bring their own charger wherever they go. That’s the original brand vision.”

The company’s focus on storefront retail is a response to a second very real threat — dependence on a direct-to-consumer, online-only model. The company has found it difficult these days to launch new products on Amazon. Older products have thousands of reviews, while newer ones need to earn placement in the search rankings and collect testimonials before more cautious buyers pull the trigger. “We’re seeing a bit of a dilemma on that one,” Yang says.

The company is also struggling with its complex product line, with a dizzying number of choices that may alienate consumers who are looking for a simple choice. “We’re converging to this generation approach,” Yang says. “So we’ll try to release a generation of products every 18 to 24 months.” Choosing which products to promote at Best Buy, which to discontinue, and where to put resources for the future are all part of Anker’s growing pains.

The company’s success hinges on convincing — or “educating” — their consumers about the value of what they’re offering. Yang likes to tell a story about a moment he sees play out on repeat every single time he’s at an airport, anywhere on the planet. He sees people, smartphones in hand, rushing from outlet to outlet, looking for space with “the little Apple cube in their hands” and running out of options. “To a lot of people, the original charger and the original cable are the only means to charge their devices, in their mind,” he says. “I look at those people and I always want to go up to them and talk to them.”

Instead, Yang and the Anker team started conducting an annual consumer survey. It has only one question: how often does your smartphone run out of battery? Once every day? Once a week? Once a month? Or perhaps never? “Forty percent of people say their battery ran out at least once last week, another 40 percent say at least once last month,” Yang says. “As long as these numbers don’t go down, I think we have a lot of work to do.”

Close this section

HTTPS on Stack Overflow: The End of a Long Road

Today, we deployed HTTPS by default on Stack Overflow. All traffic is now redirected to https:// and Google links will change over the next few weeks. The activation of this is quite literally flipping a switch (feature flag), but getting to that point has taken years of work. As of now, HTTPS is the default on all Q&A websites.

We’ve been rolling it out across the Stack Exchange network for the past 2 months. Stack Overflow is the last site, and by far the largest. This is a huge milestone for us, but by no means the end. There’s still more work to do, which we’ll get to. But the end is finally in sight, hooray!

Fair warning: This is the story of a long journey. Very long. As indicated by your scroll bar being very tiny right now. While Stack Exchange/Overflow is not unique in the problems we faced along the way, the combination of problems is fairly rare. I hope you find some details of our trials, tribulations, mistakes, victories, and even some open source projects that resulted along the way to be helpful. It’s hard to structure such an intricate dependency chain into a chronological post, so I’ll break this up by topic: infrastructure, application code, mistakes, etc.

I think it’s first helpful to preface with a list of problems that makes our situation somewhat unique:

  • We have hundreds of domains (many sites and other services)
  • We allow user submitted & embedded content (e.g. images and YouTube videos in posts)
  • We serve from a single data center (latency to a single origin)
  • We have ads (and ad networks)
  • We use websockets, north of 500,000 active at any given (connection counts)
  • We get DDoSed (proxy)
  • We have many sites & apps communicating via HTTP APIs (proxy issues)
  • We’re obsessed with performance (maybe a little too much)

Since this post is a bit crazy, links for your convenience:

The Beginning

We began thinking about deploying HTTPS on Stack Overflow back in 2013. So the obvious question: It’s 2017. What the hell took 4 years? The same 2 reasons that delay almost any IT project: dependencies and priorities. Let’s be honest, the information on Stack Overflow isn’t as valuable (to secure) as most other data. We’re not a bank, we’re not a hospital, we don’t handle credit card payments, and we even publish most of database both by HTTP and via torrent once a quarter. That means from a security standpoint, it’s just not as high of a priority as it is in other situations. We also had far more dependencies than most, a rather unique combination of some huge problem areas when deploying HTTPS. As you’ll see later, some of the domain problems are also permanent.

The biggest areas that caused us problems were:

  • User content (users can upload images or specify URLs)
  • Ad networks (contracts and support)
  • Hosting from a single data center (latency)
  • Hundreds of domains, at multiple levels (certificates)

Okay, so why do we want HTTPS on our websites? Well the data isn’t the only thing that needs security. We have moderators, developers, and employees with various levels of access via the web. We want to secure their communications with the site. We want to secure every user’s browsing history. Some people live in fear every day knowing that someone may find out they secretly like monads. Google also gives a boost to HTTPS websites in ranking (though we have no clue how much).

Oh, and performance. We love performance. I love performance. You love performance. My dog loves performance. Let’s have a performance hug. That was nice. Thank you. You smell nice.

Quick Specs

Some people just want the specs, so quick Q&A here (we love Q&A!):

  • Q: Which protocols do you support?
  • Q: Do you support SSL v2, v3?
  • Q: Which ciphers do you support?
  • Q: Does Fastly connect to the origin over HTTPS?
    • A: Yes, if the CDN request is HTTPS, the origin request is HTTPS.
  • Q: Do you support forward secrecy?
  • Q: Do you support HSTS?
    • A: Yes, we’re ramping it up across Q&A sites now. Once done we’ll move it to the edge.
  • Q: Do you support HPKP?
    • A: No, and we likely won’t.
  • Q: Do you support SNI?
    • A: No, we have a combined wildcard certificate for HTTP/2 performance reasons (details below).
  • Q: Where do you get certificates?
    • A: We use DigiCert, they’ve been awesome.
  • Q: Do you support IE 6?
    • A: This move finally kills it, completely. IE 6 does not support TLS (default - though 1.0 can be enabled), we do not support SSL. With 301 redirects in place, most IE6 users can no longer access Stack Overflow. When TLS 1.0 is removed, none can.
  • Q: What load balancer do you use?
  • Q: What’s the motivation for HTTPS?


Let’s talk about certificates, because there’s a lot of misinformation out there. I’ve lost count of the number of people who say you just install a certificate and you’re ready to go on HTTPS. Take another look at the tiny size of your scroll bar and take a wild guess if I agree. We prefer the SWAG method for our guessing.

The most common question we get: “Why not use Let’s Encrypt?”

Answer: because they don’t work for us. Let’s Encrypt is doing a great thing. I hope they keep at it. If you’re on a single domain or only a few domains, they’re a pretty good option for a wide variety of scenarios. We are simply not in that position. Stack Exchange has hundreds of domains. Let’s Encrypt doesn’t offer wildcards. These two things are at odds with each other. We’d have to get a certificate (or two) every time we deployed a new Q&A site (or any other service). That greatly complicates deployment, and either a) drops non-SNI clients (around 2% of traffic these days) or b) requires far more IP space than we have.

Another reason we want to control the certificate is we need to install the exact same certificates on both our local load balancers and our CDN/proxy provider. Unless we can do that, we can’t failover (away from a proxy) cleanly in all cases. Anyone that has the certificate pinned via HPKP (HTTP Public Key Pinning) would fail validation. We’re evaluating whether we’ll deploy HPKP, but we’ve prepped as if we will later.

I’ve gotten a lot of raised eyebrows at our main certificate having all of our primary domains + wildcards. Here’s what that looks like:

Main Certificate

Why do this? Well, to be fair, DigiCert is the one who does this for us upon request. Why go through the pain of a manual certificate merge for every change? First, because we wanted to support as many people as possible. That includes clients that don’t support SNI (for example, Android 2.3 was a big thing when we started). But also because of HTTP/2 and reality. We’ll cover that in a minute.

Certificates: Child Metas (meta.*

One of the tenets of the Stack Exchange network is having a place to talk about each Q&A site. We call it the “second place”. As an example, exists to talk about So why does that matter? Well it doesn’t really, we only care about the domain here. It’s 4 levels deep.

I’ve covered this before, but where did we end up? First the problem: * does cover (and hundreds of other sites), but it does not cover RFC 6125 (Section 6.4.3) states that:

The client SHOULD NOT attempt to match a presented identifier in which the wildcard character comprises a label other than the left-most label (e.g., do not match bar.*

That means we cannot have a wildcard of meta.* Well, shit. So what do we do?

  • Option 1: Deploying SAN certificates
    • We’d need 3 (the limit is ~100 domains per), we’d need to dedicate 3 IPs, and we’d complicate new site launches (until the scheme changed, which it already has)
    • We’d have to pay for 3 custom certs for all time at the CDN/proxy
    • We’d have to have a DNS entry for every child meta under the meta.* scheme
      • Due to the rules of DNS, we’d actually have to add a DNS entry for every single site, complicating site launches and maintenance.
  • Option 2: Move all domains to *
    • We’d have a painful move, but it’s 1-time and simplifies all maintenance and certificates
    • We’d have to build a global login system (details here)
    • This solution also creates a includeSubDomains HSTS preloading problem (details here)
  • Option 3: We’ve had a good run, shut ‘er down
    • This one is the easiest, but was not approved

We built a global login system and later moved the child meta domains (with 301s), and they’re now at their new homes. For example, After doing this, we realized how much of a problem the HSTS preload list was going to be simply because those domains ever existed. I’ll cover that near the end, as it’s still in progress. Note that the problems here are mirrored on our journey for things like, but were more limited in scale since only 4 non-English versions of Stack Overflow exist.

Oh, and this created another problem in itself. By moving cookies to the top-level domain and relying on the subdomain inheritance of them, we now had to move domains. As an example, we use SendGrid to send email in our new system (rolling out now). The reason that it sends from with links pointed at (a CNAME pointed to them), is so that your browser doesn’t send any sensitive cookies. If it was (or anything beneath, your browser would send our cookies to them. This is a concrete example of new things, but there were also miscellaneous not-hosted-by-us services under our DNS. Each one of these subdomains had to be moved or retired to get out from under our authenticated domains…or else we’d be sending your cookies to not-our-servers. It’d be a shame to do all this work just to be leaking cookies to other servers at the end of it.

We tried to work around this in one instance by proxying one of our Hubspot properties for a while, stripping the cookies on the way through. But unfortunately, Hubspot uses Akamai which started treating our HAProxy instance as a bot and blocking it in oh so fun various ways on a weekly basis. It was fun, the first 3 times. So anyway, that really didn’t work out. It went so badly we’ll never do it again.

Were you curious why we have the Stack Overflow Blog at Yep, security. It’s hosted on an external service so that the marketing team and others can iterate faster. To facilitate this, we needed it off the cookied domains.

The above issues with meta subdomains also introduced related problems with HSTS, preloading, and the includeSubDomains directive. But we’ll see why that’s become a moot point later.

Performance: HTTP/2

The conventional wisdom long ago was that HTTPS was slower. And it was. But times change. We’re not talking about HTTPS anymore. We’re talking about HTTPS with HTTP/2. While HTTP/2 doesn’t require encryption, effectively it does. This is because the major browsers require a secure connection to enable most of its features. You can argue specs and rules all day long, but browsers are the reality we all live in. I wish they would have just called it HTTPS/2 and saved everyone a lot of time. Dear browser makers, it’s not too late. Please, listen to reason, you’re our only hope!

HTTP/2 has a lot of performance benefits, especially with pushing resources opportunistically to the user ahead of asking for them. I won’t write in detail about those benefits, Ilya Grigorik has done a fantastic job of that already. As a quick overview, the largest optimizations (for us) include:

Hey wait a minute, what about that silly certificate?

A lesser-known feature of HTTP/2 is that you can push content not on the same domain, as long as certain criteria are met:

  1. The origins resolve to the same server IP address.
  2. The origins are covered by the same TLS certificate (bingo!)

So, let’s take a peek at our current DNS:

λ dig +noall +answer
; <<>> DiG 9.10.2-P3 <<>> +noall +answer
;; global options: +cmd      201     IN      A      201     IN      A      201     IN      A      201     IN      A

λ dig +noall +answer
; <<>> DiG 9.10.2-P3 <<>> +noall +answer
;; global options: +cmd        724     IN      A        724     IN      A        724     IN      A        724     IN      A

Heyyyyyy, those IPs match, and they have the same certificate! This means that we can get all the wins of HTTP/2 server pushes without harming HTTP/1.1 users. HTTP/2 gets push and HTTP/1.1 gets domain sharding (via We haven’t deployed server push quite yet, but all of this is in preparation.

So in regards to performance, HTTPS is only a means to an end. And I’m okay with that. I’m okay saying that our primary drive is performance, and security for the site is not. We want security, but security alone in our situation is not enough justification for the time investment needed to deploy HTTPS across our network. When you combine all the factors above though, we can justify the immense amount of time and effort required to get this done. In 2013, HTTP/2 wasn’t a big thing, but that changed as support increased and ultimately helped as a driver for us to invest time in HTTPS.

It’s also worth noting that the HTTP/2 landscape changed quite a bit during our deployment. The web moved from SPDY to HTTP/2 and NPN to ALPN. I won’t cover all that because we didn’t do anything there. We watched, and benefited, but the giants of the web were driving all of that. If you’re curious though, Cloudflare has a good write up of these moves.

HAProxy: Serving up HTTPS

We deployed initial HTTPS support in HAProxy back in 2013. Why HAProxy? Because we’re already using it and they added support back in 2013 (released as GA in 2014) with version 1.5. We had, for a time, nginx in front of HAProxy (as you can see in the last blog post). But simpler is often better, and eliminating a lot of conntrack, deployment, and general complexity issues is usually a good idea.

I won’t cover a lot of detail here because there’s simply not much to cover. HAProxy supports HTTPS natively via OpenSSL since 1.5 and the configuration is straightforward. Our configuration highlights are:

  • Run on 4 processes
    • 1 is dedicated to HTTP/front-end handling
    • 2-4 are dedicated to HTTPS negotiation
  • HTTPS front-ends are connected to HTTP backends via an abstract named socket. This reduces overhead tremendously.
  • Each front-end or “tier” (we have 4: Primary, Secondary, Websockets, and dev) has corresponding :443 listeners.
  • We append request headers (and strip ones you’d send - nice try) when forwarding to the web tier to indicate how a connection came in.
  • We use the Modern compatibility cipher suite recommended by Mozilla. Note: this is not the same suite our CDN runs.

HAProxy was the relatively simple and first step of supporting a :443 endpoint with valid SSL certificates. In retrospect, it was only a tiny spec of the effort needed.

Here’s a logical layout of what I described above…and we’ll cover that little cloud in front next:

Logical Architecture

CDN/Proxy: Countering Latency with Cloudflare & Fastly

One of the things I’m most proud of at Stack Overflow is the efficiency of our stack. That’s awesome right? Running a major website on a small set of servers from one data center? Nope. Not so much. Not this time. While it’s awesome to be efficient for some things, when it comes to latency it suddenly becomes a problem. We’ve never needed a lot of servers. We’ve never needed to expand to multiple locations (but yes, we have another for DR). This time, that’s a problem. We can’t (yet!) solve fundamental problems with latency, due to the speed of light. We’re told someone else is working on this, but there was a minor setback with tears in fabric of space-time and losing the gerbils.

When it comes to latency, let’s look at the numbers. It’s almost exactly 40,000km around the equator (worst case for speed of light round-trip). The speed of light is 299,792,458 meters/second in a vacuum. Unfortunately, a lot of people use this number, but most fiber isn’t in a vacuum. Realistically, most optical fiber is 30-31% slower. So we’re looking at (40,075,000 m) / (299,792,458 m/s * .70) = 0.191 seconds, or 191ms for a round-trip in the worst case, right? Well…no, not really. That’s also assuming an optimal path, but going between two destinations on the internet is very rarely a straight line. There are routers, switches, buffers, processor queues, and all sorts of additional little delays in the way. They add up to measurable latency. Let’s not even talk about Mars, yet.

So why does that matter to Stack Overflow? This is an area where the cloud wins. It’s very likely that the server you’re hitting with a cloud provider is relatively local. With us, it’s not. With a direct connection, the further you get away from our New York or Denver data centers (whichever one is active), the slower your experience gets. When it comes to HTTPS, there’s an additional round trip to negotiate the connection before any data is sent. That’s under the best of circumstances (though that’s improving with TLS 1.3 and 0-RTT). And Ilya Grigorik has a great summary here.

Enter Cloudflare and Fastly. HTTPS wasn’t a project deployed in a silo, as you read on you’ll see that several other projects multiplex in along the way. In the case of a local-to-the-user HTTPS termination endpoint (to minimize that round trip duration), we were looking for a few main criteria:

  • Local HTTPS termination
  • DDoS protection
  • CDN functionality
  • Performance equivalent or better than direct-to-us

Preparing for a Proxy: Client Timings

Before moving to any proxy, testing for performance had to be in place. To do this, we set up a full pipeline of timings to get performance metrics from browsers. For years now, browsers have included performance timings accessible via JavaScript, via window.performance. Go ahead, open up the inspector and try it! We want to be very transparent about this, that’s why details have been available on since day 1. There’s no sensitive data transferred, only the URIs of resources directly loaded by the page and their timings. For each page load recorded, we get timings that look like this:

Currently we attempt to record performance timings from 5% of traffic. The process isn’t that complicated, but all the pieces had to be built:

  1. Transform the timings into JSON
  2. Upload the timings after a page load completes
  3. Relay those timings to our backend Traffic Processing Service (it has reports)
  4. Store those timings in a clustered columnstore in SQL Server
  5. Relay aggregates of the timings to Bosun (via BosunReporter.NET)

The end result is we now have a great real-time overview of actual user performance all over the world that we can readily view, alert on, and use for evaluating any changes. Here’s a view of timings coming in live:

Client Timings Dashboard

Luckily, we have enough sustained traffic to get useful data here. At this point, we have over 5 billion points (and growing) of data to help drive decisions. Here’s a quick overview of that data:

Client Timings Database

Okay, so now we have our baseline data. Time to test candidates for our CDN/Proxy setup.


We evaluated many CDN/DDoS proxy providers. We picked Cloudflare based on their infrastructure, responsiveness, and the promise of Railgun. So how can we do test what life would be like behind Cloudflare all over the world? How many servers would we need to set up to get enough data points? None!

Stack Overflow has an excellent resource here: billions of hits a month. Remember those client timings we just talked about? We already have tens of millions of users hitting us every day, so why don’t we ask them? We can do just that, by embedding an <iframe> in Stack Overflow pages. Cloudflare was already our host (our shared, cookieless static content domain) from earlier. But, this was done with a CNAME DNS record, we served the DNS which pointed at their DNS. To use Cloudflare as a proxy though, we needed them to serve our DNS. So first, we needed to test performance of their DNS.

Practically speaking, to test performance we needed to delegate a second-level domain to them, not, which would have different glue records and sometimes isn’t handled the same way (causing 2 lookups). To clarify, Top-Level Domains (TLDs) are things like .com, .net, .org, .dance, .duck, .fail, .gripe, .here, .horse, .ing, .kim, .lol, .ninja, .pink, .red, .vodka. and .wtf. Nope, I’m not kidding (and here’s the full list). Second-Level Domains (SLDs) are one level below, what most sites would be:,, etc. That’s what we need to test the behavior and performance of. Thus, was born. With this new domain, we could test DNS performance all over the world. By embedding the <iframe> for a certain percentage of visitors (we turned it on and off for each test), we could easily get data from each DNS and hosting configuration.

Note that it’s important to test for ~24hours at a minimum here. The behavior of the internet changes throughout the day as people are awake or asleep or streaming Netflix all over the world as it rolls through time zones. So to measure a single country, you really want a full day. Within weekdays, preferably (e.g. not half into a Saturday). Also be aware that shit happens. It happens all the time. The performance of the internet is not a stable thing, we’ve got the data to prove it.

Our initial assumptions going into this was we’d lose some page load performance going through Cloudflare (an extra hop almost always adds latency), but we’d make it up with the increases in DNS performance. The DNS side of this paid off. Cloudflare had DNS servers far more local to the users than we do in a single data center. The performance there was far better. I hope that we can find the time to release this data soon. It’s just a lot to process (and host), and time isn’t something I have in ample supply right now.

Then we began testing page load performance by proxying through Cloudflare, again in the <iframe>. We saw the US and Canada slightly slower (due to the extra hop), but the rest of the world on par or better. This lined up with expectations overall, and we proceeded with a move behind Cloudflare’s network. A few DDoS attacks along the way sped up this migration a bit, but that’s another story. Why did we accept slightly slower performance in the US and Canada? Well at ~200-300ms page loads for most pages, that’s still pretty damn fast. But we don’t like to lose. We thought Railgun would help us win that performance back.

Once all the testing panned out, we needed to put the pieces in for DDoS protection. This involved installing additional, dedicated ISPs in our data center for the CDN/Proxy to connect to. After all, DDoS protection via a proxy isn’t very effective if you can just go around it. This meant we were serving off of 4 ISPs per data center now, with 2 sets of routers, all running BGP with full tables. It also meant 2 new load balancers, dedicated to CDN/Proxy traffic.

Cloudflare: Railgun

At the time, this setup also meant 2 more boxes just for Railgun. The way Railgun works is by caching the last result of that URL in memcached locally and on Cloudflare’s end. When Railgun is enabled, every page (under a size threshold) is cached on the way out. On the next request, if the entry was in Cloudflare’s edge cache and our cache (keyed by URL), we still ask the web server for it. But instead of sending the whole page back to Cloudflare, it only sends a diff. That diff is applied to their cache, and served back to the client. By nature of the pipe, it also meant the gzip compression for transmission moved from 9 web servers for Stack Overflow to the 1 active Railgun box…so this had to be a pretty CPU-beefy machine. I point this out because all of this had to be evaluated, purchased and deployed on our way.

As an example, think about 2 users viewing a question. Take a picture of each browser. They’re almost the same page, so that’s a very small diff. It’s a huge optimization if we can send only that diff down most of the journey to the user.

Overall, the goal here is to reduce the amount of data sent back in hopes of a performance win. And when it worked, that was indeed the case. Railgun also had another huge advantage: requests weren’t fresh connections. Another consequence of latency is the duration and speed of the ramp up of TCP slow start, part of the congestion control that keeps the Internet flowing. Railgun maintains a constant connection to Cloudflare edges and multiplexes user requests, all of them over a pre-primed connection not heavily delayed by slow start. The smaller diffs also lessened the need for ramp up overall.

Unfortunately, we never got Railgun to work without issues in the long run. To my knowledge, we were (at the time) the largest deployment of the technology and we stressed it further than it has been pushed before. Though we tried to troubleshoot it for over a year, we ultimately gave up and moved on. It simply wasn’t saving us more than it was costing us in the end. It’s been several years now though. If you’re evaluating Railgun, you should evaluate the current version, with the improvements they’ve made and decide for yourself.


Moving to Fastly was relatively recent, but since we’re on the CDN/Proxy topic I’ll cover it now. The move itself wasn’t terribly interesting because most of the pieces needed for any proxy were done in the Cloudflare era above. But of course everyone will ask: why did we move? While Cloudflare was very appealing in many regards, mainly: many data centers, stable bandwidth pricing, and included DNS - it wasn’t the best fit for us anymore. We needed a few things that Fastly simply did to fit us better: more flexibility at the edge, faster change propagation, and the ability to fully automate configuration pushes. That’s not to say Cloudflare is bad, it was just no longer the best fit for Stack Overflow.

Since actions speak louder: If I didn’t think highly of Cloudflare, my personal blog wouldn’t be behind them right now. Hi there! You’re reading it.

The main feature of Fastly that was so compelling to us was Varnish and the VCL. This makes the edge highly configurable. So features that Cloudflare couldn’t readily implement (as they might affect all customers), we could do ourselves. This is simply a different architectural approach to how these two companies work, and the highly-configurable-in-code approach suits us very well. We also liked how open they were with details of infrastructure at conferences, in chats, etc.

Here’s an example of where VCL comes in very handy. Recently we deployed .NET 4.6.2 which had a very nasty bug that set max-age on cache responses to over 2000 years. The quickest way to mitigate this for all of our services affected was to override that cache header as-needed at the edge. As I write this, the following VCL is active:

sub vcl_fetch {
  if (beresp.http.Cache-Control) {
      if (req.url.path ~ "^/users/flair/") {
          set beresp.http.Cache-Control = "public, max-age=180";
      } else {
          set beresp.http.Cache-Control = "private";

This allows us to cache user flair for 3 minutes (since it’s a decent volume of bytes), and bypass everything else. This is an easy-to-deploy global solution to workaround an urgent cache poisoning problem across all applications. We’re very, very happy with all the things we’re able to do at the edge now. Luckily we have Jason Harvey who picked up the VCL bits and wrote automated-pushed of our configs. We had to improve on existing libraries in Go here, so check out fastlyctl, another open source bit to come out of this.

Another important facet of Fastly (that Cloudflare also had, but we never utilized due to cost) is using your own certificate. As we covered earlier, we’re already using this in preparation for HTTP/2 pushes. But, Fastly doesn’t do something Cloudflare does: DNS. So we need to solve that now. Isn’t this dependency chain fun?

Global DNS

When moving from Cloudflare to Fastly, we had to evaluate and deploy new (to us) global DNS providers. That in itself in an entirely different post, one that’s been written by Mark Henderson. Along the way, we were also controlling:

  • Our own DNS servers (still up as a fall back)
  • servers (for redirects not needing HTTPS)
  • Cloudflare DNS
  • Route 53 DNS
  • Google DNS
  • Azure DNS
  • …and several others (for testing)

This was a whole project in itself. We had to come up with means to do this efficiently, and so DNSControl was born. This is now an open source project, available on GitHub, written in Go. In short: we push a change in the JavaScript config to git, and it’s deployed worldwide in under a minute. Here’s a sample config from one of our simpler-in-DNS sites,

    TXT('@', 'google-site-verification=PgJFv7ljJQmUa7wupnJgoim3Lx22fbQzyhES7-Q9cv8'), // webmasters
    A('@', ADDRESS24, FASTLY_ON),
    CNAME('www', '@'),
    CNAME('chat', ''),
    A('meta', ADDRESS24, FASTLY_ON),

Okay great, how do you test that all of this is working? Client Timings! The ones we covered above let us test all of this DNS deployment with real-world data, not simulations. But we also need to test that everything just works.


Client Timings in deploying the above was very helpful for testing performance. But it wasn’t good for testing configuration. After all, Client Timings is awesome for seeing the result, but most configuration missteps result in no page load, and therefore no timings at all. So we had to build httpUnit (yes, the team figured out the naming conflict later…). This is now another open source project written in Go. An example config for

  label = "teststackoverflow_com"
  url = ""
  ips = ["28i"]
  text = "<title>Test Stack Overflow Domain</title>"
  tags = ["so"]
  label = "tls_teststackoverflow_com"
  url = ""
  ips = ["28"]
  text = "<title>Test Stack Overflow Domain</title>"
  tags = ["so"]

It was important to test as we changed firewalls, certificates, bindings, redirects, etc. along the way. We needed to make sure every change was good before we activated it for users (by deploying it on our secondary load balancers first). httpUnit is what allowed us to do that and run an integration test suite to ensure we had no regressions.

There’s another tool we developed internally (by our lovely Tom Limoncelli) for more easily managing Virtual IP Address groups on our load balancers. We test on the inactive load balancer via a secondary range, then move all traffic over, leaving the previous master in a known-good state. If anything goes wrong, we flip back. If everything goes right (yay!), we apply changes to that load balancer as well. This tool is called keepctl (short for keepalived control) - look for this to be open sourced as soon as time allows.

Preparing the Applications

Almost all of the above has been just the infrastructure work. This is generally done by a team of several other Site Reliability Engineers at Stack Overflow and I getting things situated. There’s also so much more that needed doing inside the applications themselves. It’s a long list. I’d grab some coffee and a snickers.

One important thing to note here is that the architecture of Stack Overflow & Stack Exchange Q&A sites is multi-tenant. This means that if you hit or or, you’re hitting the exact same thing. You’re hitting the exact same w3wp.exe process on the exact same server. Based on the Host header the browser sends, we change the context of the request. Several pieces of what follows will be clearer if you understand Current.Site in our code is the site of the request. Things like Current.Site.Url() and Current.Site.Paths.FaviconUrl are all driven off this core concept.

Another way to make this concept/setup clearer: we can run the entire Q&A network off of a single process on a single server and you wouldn’t know it. We run a single process today on each of 9 servers purely for rolling builds and redundancy.

Global Login

Quite a few of these projects seemed like good ideas on their own (and they were), but were part of a bigger HTTPS picture. Login was one of those projects. I’m covering it first, because it was rolled out much earlier than the other changes below.

For the first 5-6 years Stack Overflow (and Stack Exchange) existed, you logged into a particular site. As an example, each of, and had their own per-site cookies. Of note here:’s login depended on the cookie from flowing to the subdomain. These are the “meta” sites we talked about with certificates earlier. Their logins were tied together, you always logged in through the parent. This didn’t really matter much technically, but from a user experience standpoint it sucked. You had to login to each site. We “fixed” that with “global auth”, which was an <iframe> in the page that logged everyone in through if they were logged in elsewhere. Or it tried to. The experience was decent, but a popup bar telling you to click to reload and be logged in wasn’t really awesome. We could do better. Oh and ask Kevin Montrose about mobile Safari private mode. I dare you.

Enter “Universal Login”. Why the name “Universal”? Because global was taken. We’re simple people. Luckily, cookies are also pretty simple. A cookie present on a parent domain (e.g. will be sent by your browser to all subdomains (e.g. When you zoom out from our network, we have only a handful of second-level domains:

Yes, we have other domains that redirect to these, like But they’re only redirects and don’t have cookies or logged-in users.

There’s a lot of backend work that I’m glossing over here (props to Geoff Dalgas and Adam Lear especially), but the general gist is that when you login, we set a cookie on these domains. We do this via third-party cookies and nonces. When you login to any of the above domains, 6 cookies are issues via <img> tags on the destination page for the other domains, effectively logging you in. This doesn’t work everywhere (in particular, mobile safari is quirky), but it’s a vast improvement over previous.

The client code isn’t complicated, here’s what it looks like:

$.post('/users/login/universal/request', function (data, text, req) {
    $.each(data, function (arrayId, group) {
        var url = '//' + group.Host + '/users/login/universal.gif?authToken=' + 
          encodeURIComponent(group.Token) + '&nonce=' + encodeURIComponent(group.Nonce);
        $(function () { $('#footer').append('<img style="display:none" src="' + url + '"></img>'); });
}, 'json');

…but to do this, we have to move to Account-level authentication (it was previously user level), change how cookies are viewed, change how child-meta login works, and also provide integration for these new bits to other applications. For example, Careers (now Talent and Jobs) is a different codebase. We needed to make those applications view the cookies and call into the Q&A application via an API to get the account. We deploy this via a NuGet library to minimize repeated code. Bottom line: you login once and you are logged into all domains. No messages, no page reloads.

For the technical side, we now don’t have to worry about where the *.* domains are. As long as they’re under, we’re good. While on the surface this had nothing to do with HTTPS, it allowed us to move things like to without any interruptions to users. It’s one giant, really ugly puzzle.

Local HTTPS Development

To make any kind of progress here, local environments need to match dev and production as much as possible. Luckily, we’re on IIS which makes this fairly straightforward to do. There’s a tool we use to setup developer environments called “dev local setup” because, again, we’re simple people. It installs tooling (Visual Studio, git, SSMS, etc.), services (SQL Server, Redis, Elasticsearch), repositories, databases, websites, and a few other bits. We had the basic tooling setup, we just needed to add SSL/TLS certs. An abbreviated setup for Core looks like this:

Websites = @(
        Directory = "StackOverflow";
        Site = "";
        Aliases = "", "";
        Databases = "Sites.Database", "Local.StackExchange.Meta", "Local.Area51", "Local.Area51.Meta";
        Certificate = $true;
        Directory = "StackExchange.Website";
        Site = "";
        Databases = "Sites.Database", "Local.StackExchange", "Local.StackExchange.Meta", "Local.Area51.Meta";
        Certificate = $true;

And the code that uses this I’ve put in a gist here: Register-Websites.psm1. We setup our websites via host headers (adding those in aliases), give them certificates if directed (hmmm, we should default this to $true now…), and grant those AppPool accounts access to the databases. Okay, so now we’re set to develop against https:// locally. Yes, I know - we really should open source this setup, but we have to strip out some specific-to-us bits in a fork somehow. One day.

Why is this important? Before this, we loaded static content from /content, not from another domain. This was convenient, but also hid issues like Cross-Origin Requests (or CORS). What may load just fine on the same domain on the same protocol may readily fail in dev and production. “It works on my machine.”

By having a CDN and app domains setup with the same protocols and layout we have in production, we find and fix many more issues before they leave a developer’s machine. For example, did you know that when going from an https:// page to an http:// one, the browser does not send the referer? It’s a security issue, there could be sensitive bits in the URL that would be sent over paintext in the referer header.

“That’s bullshit Nick, we get Google referers!” Well, yes. You do. But because they explicitly opt into it. If you look at the Google search page, you’ll find this <meta> directive:

<meta content="origin" id="mref" name="referrer">

…and that’s why you get it from them.

Okay, we’re setup to build some stuff, where do we go from here?

Mixed Content: From You

This one has a simple label with a lot of implications for a site with user-submitted content. What kind of mixed content problems had we accumulated over the years? Unfortunately, quite a few. Here’s the list of user-submitted content we had to tackle:

Each of these had specific problems attached, I’ll stick to the interesting bits here. Note: each of the solutions I’m talking about has to be scaled to run across hundreds of sites and databases given our architecture.

In each of the above cases (except snippets), there was a common first step to eliminating mixed content. You need to eliminate new mixed content. Otherwise, all cleanups continue indefinitely. Plug the hole, then drain the ship. To that end, we started enforcing only https://-only image embeds across the network. Once that was done and holes were plugged, we could get to work cleaning up.

For images in questions, answers, and other post types we had to do a lot of analysis and see what path to take. First, we tackled the known 90%+ case: Stack Overflow has its own hosted instance of Imgur since before my time. When you upload an image with our editor, it goes there. The vast majority of posts take this approach, and they added proper HTTPS support for us years ago. This was a straight-forward find and replace re-bake (what we call re-processing post markdown) across the board.

Then, we analyzed all the remaining image paths via our Elasticsearch index of all content. And by we, I mean Samo. He put in a ton of work on mixed-content throughout this. After seeing that many of the most repetitive domains actually supported HTTPS, we decided to:

  1. Try each <img> source on https:// instead. If that worked, replace the link in the post.
  2. If the source didn’t support https://, convert it to a link.

But of course that didn’t actually just work. It turns out the regex to match URLs in posts was broken for years and no one noticed…so we fixed that and re-indexed first. Oops.

We’ve been asked: “why not just proxy it?” Well, that’s a legally and ethically gray area for much of our content. For example, we have photographers on that explicitly do not use Imgur to retain all rights. Totally understandable. If we start proxying and caching the full image, that gets legally tricky at best really quick. It turns out that out of millions of image embeds on the network, only a few thousand both didn’t support https:// and weren’t already 404s anyway. So, we elected to not build a complicated proxy setup. The percentages (far less than 1%) just didn’t come anywhere close to justifying it.

We did research building a proxy though. What would it cost? How much storage would we need? Do we have enough bandwidth? We found estimates to all these questions, with some having various answers. For example, do we use Fastly site shield, or take the bandwidth brunt over the ISP pipes? Which option is faster? Which option is cheaper? Which option scales? Really, that’s another blog post all by itself, but if you have specific questions ask them in comments and I’ll try to answer.

Luckily, along the way balpha had revamped YouTube embeds to fix a few things with HTML5. The rebake forced https:// for all as a side effect, yay! All done.

The rest of the content areas were the same story: kill new mixed-content coming in, and replace what’s there. This required changes in the following code areas:

  • Posts
  • Profiles
  • Dev Stories
  • Help Center
  • Jobs/Talent
  • Company Pages

Disclaimer: JavaScript snippets remains unsolved. It’s not so easy because:

  1. The resource you want may not be available over https:// (e.g. a library)
  2. Due it being JavaScript, you could just construct any URL you want. This is basically impossible to check for.
    • If you have a clever way to do this, please tell us. We’re stuck on usability vs. security on that one.

Mixed Content: From Us

Problems don’t stop at user-submitted content. We have a fair bit of http:// baggage as well. While the moves of these things aren’t particularly interesting, in the interest of “what took so long?” they’re at least worth enumerating:

  • Ad Server (Calculon)
  • Ad Server (Adzerk)
  • Tag Sponsorships
  • JavaScript assumptions
  • Area 51 (the whole damn thing really - it’s an ancient codebase)
  • Analytics trackers (Quantcast, GA)
  • Per-site JavaScript includes (community plugins)
  • Everything under /jobs on Stack Overflow (which is actually a proxy, surprise!)
  • User flair
  • …and almost anywhere else http:// appears in code

JavaScript and links were a bit painful, so I’ll cover those in a little detail.

JavaScript is an area some people forget, but of course it’s a thing. We had several assumptions about http:// in our JavaScript where we only passed a host down. There were also many baked-in assumptions about meta. being the prefix for meta sites. So many. Oh so many. Send help. But they’re gone now, and the server now renders the fully qualified site roots in our options object at the top of the page. It looks something like this (abbreviated):

    "name":"Stack Overflow"
    "gravatar":"<div class=\"gravatar-wrapper-32\"><img src=\"\"></div>",

We had so many static links over the years in our code. For example, in the header, in the footer, in the help section…just all over the place. For each of these, the solution wasn’t that complicated: change them to use <site>.Url("/path"). Finding and killing these was a little fun because you can’t just search for "http://". Thank you so much W3C for gems like this:

<svg xmlns=""...

Yep, those are identifiers. You can’t change them. This is why I want Visual Studio to add an “exclude file types” option to the find dialog. Are you listening Visual Studio??? VS Code added it a while ago. I’m not above bribery.

Okay so this isn’t really fun, it’s hunt and kill for over a thousand links in our code (including code comments, license links, etc.) But, that’s life. It had to be done. By converting them to be method calls to .Url(), we made the links dynamically switch to HTTPS when the site was ready. For example, we couldn’t switch meta.* sites over until they moved. The password to our data center is pickles. I didn’t think anyone would read this far and it seemed like a good place to store it. After they moved, .Url() would keep working, and enabling .Url() rendering https-by-default would also keep working. It changed a static thing to a dynamic thing and appropriately hooked up all of our feature flags.

Oh and another important thing: it made dev and local environments work correctly, rather than always linking to production. This was pretty painful and boring, but a worthwhile set of changes. And yes, this .Url() code includes canonicals, so Google sees that pages should be HTTPS as soon as users do.

Once a site is moved to HTTPS (by enabling a feature flag), we then crawled the network to update the links to it. This is to both correct “Google juice” as we call it, and to prevent users eating a 301.

Redirects (301s)

When you move a site from HTTP, there are 2 critical things you need to do for Google:

  • Update the canonical links, e.g. <link rel="canonical" href="" />
  • 301 the http:// link to the https:// version

This isn’t complicated, it isn’t grand, but it’s very, very important. Stack Overflow gets most of its traffic from Google search results, so it’s imperative we don’t adversely affect that. I’d literally be out of a job if we lost traffic, it’s our livelihood. Remember those .internal API calls? Yeah, we can’t just redirect everything either. So there’s a bit of logic into what gets redirected (e.g. we don’t redirect POST requests during the transition…browsers don’t handle that well), but it’s fairly straightforward. Here’s the actual code:

public static void PerformHttpsRedirects()
    var https = Settings.HTTPS;
    // If we're on HTTPS, never redirect back
    if (Request.IsSecureConnection) return;

    // Not HTTPS-by-default? Abort.
    if (!https.IsDefault) return;
    // Not supposed to redirect anyone yet? Abort.
    if (https.RedirectFor == SiteSettings.RedirectAudience.NoOne) return;
    // Don't redirect .internal or any other direct connection
    // this would break direct HOSTS to webserver as well
    if (RequestIPIsInternal()) return;

    // Only redirect GET/HEAD during the transition - we'll 301 and HSTS everything in Fastly later
    if (string.Equals(Request.HttpMethod, "GET", StringComparison.InvariantCultureIgnoreCase)
        || string.Equals(Request.HttpMethod, "HEAD", StringComparison.InvariantCultureIgnoreCase))
        // Only redirect if we're redirecting everyone, or a crawler (if we're a crawler)
        if (https.RedirectFor == SiteSettings.RedirectAudience.Everyone
            || (https.RedirectFor == SiteSettings.RedirectAudience.Crawlers && Current.IsSearchEngine))
            var resp = Context.InnerHttpContext.Response;
            // 301 when we're really sure (302 is the default)
            if (https.RedirectVia301)
                resp.RedirectPermanent(Site.Url(Request.Url.PathAndQuery), false);
                resp.Redirect(Site.Url(Request.Url.PathAndQuery), false);

Note that we don’t start with a 301 (there’s a .RedirectVia301 setting for this), because you really want to test these things carefully before doing anything permanent. We’ll talk about HSTS and permanent consequences a bit later.


This one’s a quick mention. Websockets was not hard, it was the easiest thing we did…in some ways. We use websockets for real-time updates to users like reputation changes, inbox notifications, new questions being asked, new answers added, etc. This means that basically for every page open to Stack Overflow, we have a corresponding websocket connection to our load balancer.

So what’s the change? Pretty simple: install a certificate, listen on :443, and use wss:// instead of the ws:// (insecure) version. The latter of that was done above in prep for everything (we decided on a specific certificate here, but nothing special). The ws:// to wss:// change was simply a configuration one. During the transition we had ws:// with wss:// as a fallback, but this has since become only wss://. The reasons to go with secure websockets in general are 2-fold:

  1. It’s a mixed content warning on https:// if you don’t.
  2. It supports more users, due to many old proxies not handling websockets well. With encrypted traffic, most pass it along without screwing it up. This is especially true for mobile users.

The big question here was: “can we handle the load?” Our network handles quite a few concurrent websockets, as I write this we have over 600,000 concurrent connections open. Here’s a view of our HAProxy dashboard in Opserver:

HAProxy Websockets

That’s a lot of connections on a) the terminators, b) the abstract named socket, and c) the frontend. It’s also much more load in HAProxy itself, due to enabling TLS session resumption. To enable a user to reconnect faster the next time, the first negotiation results in a token a user can send back the next time. If we have enough memory and the timeout hasn’t passed, we’ll resume that session instead of negotiating a new session every time. This saves CPU and improves performance for users, but it has a cost in memory. This cost varies by key size (2048, 4096 bits? more?). We’re currently at 4,096 bit keys. With about 600,000 websockets open at any given time (the majority of our memory usage), we’re still sitting at only 19GB of RAM utilized on our 64GB load balancers. Of this, about 12GB is being utilized by HAProxy, and most of that is the TLS session cache. So…it’s not too bad, and if we had to buy RAM, it’d still be one of the cheapest things about this move.

HAProxy Websocket Memory


I guess now’s a good time to cover the unknowns (gambles really) we took on with this move. There are a few things we couldn’t really know until we tested out an actual move:

  • How Google Analytics traffic appeared (Do we lose referers?)
  • How Google Webmasters transitions worked (Do the 301s work? The canonicals? The sitemaps? How fast?)
  • How Google search analytics worked (do we see search analytics in https://?)
  • Will we fall in search result rankings? (scariest of all)

There’s a lot of advice out there of people who have converted to https://, but we’re not the usual use case. We’re not a site. We’re a network of sites across many domains. We have very little insight into how Google treats our network. Does it know and are related? Who knows. And we’re not holding our breath for Google to give us any insight.

So, we test. In our network-wide rollout, we tested a few domains first:

These were chosen super carefully after a detailed review in a 3 minute meeting between Samo and I. Meta because it’s our main feedback site (that the announcement is also on). Security because they have experts who may notice problems other sites don’t, especially in the HTTPS space. And last, Super User. We needed to test the search impact of our content. While meta and security are smaller and have relatively smaller traffic levels, Super User gets significantly more traffic. More importantly, it gets this traffic from Google, organically.

The reason for a long delay between Super User and the rest of the network is we were watching and assessing the search impact. As far as we can tell: there was barely any. The amount of week-to-week change in searches, results, clicks, and positions is well within the normal up/down noise. Our company depends on this traffic. This was incredibly important to be damn sure about. Luckily we were concerned for little reason and could continue rolling out.


Writing this post isn’t a very decent exercise if I didn’t also cover our screw-ups along the way. Failure is always an option. We have the experience to prove it. Let’s cover a few things we did and ended up regretting along the way.

Mistakes: Protocol-Relative URLs

When you have a URL to a resource, typically you see something like or, this includes paths for images, etc. Another option you can use is // These are called protocol-relative URLs. We used these early on for images, JavaScript, CSS, etc. (that we served, not user-submitted content). Years later, we found out this was a bad idea, at least for us. The way protocol-relative links work is they are relative to the page. When you’re on, // is the same as, and on, it’s the same as So what’s the problem?

Well, URLs to images aren’t only used in pages, they’re also used in places like email, our API, and mobile applications. This bit us once when I normalized the pathing structure and used the same image paths everywhere. While the change drastically reduced code duplication and simplified many things, the result was protocol-relative URLs in email. Most email clients (appropriately) don’t render such images. Because they don’t know which protocol. Email is neither http:// nor https://. You may just be viewing it in a web browser and it might have worked.

So what do we do? Well, we switched everything everywhere to https://. I unified all of our pathing code down to 2 variables: the root of the CDN, and the folder for the particular site. For example Stack Overflow’s stylesheet resides at: (but with a cache breaker!). Locally, it’s You can see the similarity. By calculating all routes, life is simpler. By enforcing https://, people are getting the benefits of HTTP/2 even before the site itself switches over, since static content was already prepared. All https:// also meant we could use one property for a URL in web, email, mobile, and API. The unification also meant we have a consistent place to handle all pathing - this means cache breakers are built in everywhere, while still being simpler.

Note: when you’re cache breaking resources like we do, for example:, please don’t do it with a build number. Our cache breakers are a checksum of the file, which means you only download a new copy when it actually changes. Doing a build number may be slightly simpler, but it’s likely quite literally costing you money and performance at the same time.

Okay all of that’s cool - so why the hell didn’t we just do this from the start? Because HTTPS, at the time, was a performance penalty. Users would have suffered slower load times on http:// pages. For an idea of scale: we served up 4 billion requests on last month, totalling 94TB. That would be a lot of collective latency back when HTTPS was slower. Now that the tables have turned on performance with HTTP/2 and our CDN/proxy setup - it’s a net win for most users as well as being simpler. Yay!

Mistakes: APIs and .internal

So what did we find when we got the proxies up and testing? We forgot something critical. I forgot something critical. We use HTTP for a truckload of internal APIs. Oh, right. Dammit. While these continued to work, they got slower, more complicated, and more brittle at the same time.

Let’s say an internal API hits Previously, the hops there were:

  • Origin app
  • Gateway/Firewall (exiting to public IP space)
  • Local load balancer
  • Destination web server

This was because used to resolve to us. The IP it went to was our load balancer. In a proxy scenario, in order for users to hit the nearest hop to them, they’re hitting a different IP and destination. The IP their DNS resolves to is the CDN/Proxy (Fastly) now. Well, crap. That means our path to the same place is now:

  • Origin app
  • Gateway/Firewall (exiting to public IP space)
  • Our external router
  • ISP (multiple hops)
  • Proxy (Cloudflare/Fastly)
  • ISPs (proxy path to us)
  • Our external router
  • Local load balancer
  • Destination web server

Okay…that seems worse. To make an application call from A to B, we have a drastic increase in dependencies that aren’t necessary and kill performance at the same time. I’m not saying our proxy is slow, but compared to a sub 1ms connection inside the data center…well yeah, it’s slow.

A lot of internal discussion ensued about the simplest way to solve this problem. We could have made requests like, but this would require substantial app changes to how the sites work (and potentially create conflicts later). It would also have created an external leak of DNS for internal-only addresses (and created wildcard inheritance issues). We could have made resolve different internally (this is known as split-horizon DNS), but that’s both harder to debug and creates other issues like multi-datacenter “who-wins?” scenarios.

Ultimately, we ended up with a .internal suffix to all domains we had external DNS for. For example, inside our network resolves to an internal subnet on the back (DMZ) side of our load balancer. We did this for several reasons:

  • We can override and contain a top-level domain on our internal DNS servers (Active Directory)
  • We can strip the .internal from the Host header as it passes through HAProxy back to a web application (the application side isn’t even aware)
  • If we need internal-to-DMZ SSL, we can do so with a very similar wildcard combination.
  • Client API code is simple (if in this domain list, add .internal)

The client API code is done via a NuGet package/library called StackExchange.Network mostly written by Marc Gravell. We simply call it in a static way with every URL we’re about to hit (so only in a few places, our utility fetch methods). It returns the “internalized” URL, if there is one, or returns it untouched. This means any changes to logic here can be quickly deployed to all applications with a simple NuGet update. The call is simple enough:

uri = SubstituteInternalUrl(uri);

Here’s a concrete illustration for DNS behavior:

  • Fastly:,,,
  • Direct (public routers):
  • Internal:

Remember dnscontrol we mentioned earlier? That keeps all of this in sync. Thanks to the JavaScript config/definitions, we can easily share all and simplify code. We match the last octet of all IPs (in all subnets, in all data centers), so with a few variables all the DNS entries both in AD and externally are aligned. This also means our HAProxy config is simpler as well, it boils down to this:

stacklb::external::frontend_normal { 't1_http-in':
  section_name    => 'http-in',
  maxconn         => $t1_http_in_maxconn,
  inputs          => {
    "${external_ip_base}.16:80"  => [ 'name stackexchange' ],
    "${external_ip_base}.17:80"  => [ 'name careers' ],
    "${external_ip_base}.18:80"  => [ 'name openid' ],
    "${external_ip_base}.24:80"  => [ 'name misc' ],

Overall, the API path is now faster and more reliable than before:

  • Origin app
  • Local load balancer (DMZ side)
  • Destination web server

A dozen problems solved, several hundred more to go.

Mistakes: 301 Caching

Something we didn’t realize and should have tested is that when we started 301ing traffic from http:// to https:// for enabled sites, Fastly was caching the response. In Fastly, the default cache key doesn’t take the protocol into account. I personally disagree with this behavior, since by default enabling 301 redirects at the origin will result in infinite redirects. The problem happens with this series of events:

  1. A user visits a page on http://
  2. They get redirected via a 301 to https://
  3. Fastly caches that redirect
  4. Any user (including the one in #1 above) visits the same page on https://
  5. Fastly serves the 301 to https://, even though you’re already on it

And that’s how we get an infinite redirect. To fix this, we turned off 301s, purged Fastly cache, and investigated. After fixing it via a hash change, we worked with Fastly support which recommended adding Fastly-SSL to the vary instead, like this:

 sub vcl_fetch {
   set beresp.http.Vary = if(beresp.http.Vary, beresp.http.Vary ",", "") "Fastly-SSL";

In my opinion, this should be the default behavior.

Mistakes: Help Center SNAFU

Remember those help posts we had to fix? Help posts are mostly per-language with few being per-site, so it makes sense for them to be shared. To not duplicate a ton of code and storage structure for just this, we do them a little differently. We store the actual Post object (same as a question or answer) in, or whatever specific site the post is for. We store the resulting HelpPost in our central Sites database, which is just the baked HTML. In terms of mixed-content, we fixed the posts in the individual sites already, because they were the same posts are other things. Sweet! That was easy!

After the original posts were fixed, we simply had to backfill the rebaked HTML into the Sites table. And that’s where I left off a critical bit of code. The backfill looked at the current site (the ones the backfill was invoked on) rather than the site the original post came from. As an example, this resulted in a HelpPost from post 12345 on being replaced with whatever was in post 12345 on Sometimes it was an answer, sometimes a question, sometimes a tag wiki. This resulted in some very interesting help articles across the network. Here are some of the gems created.

At least the commit to fix my mistake was simple enough:

Me being a dumbass

…and re-running the backfill fixed it all. Still, that was some very public “fun”. Sorry about that.

Open Source

Here are quick links to all the projects that resulted or improved from our HTTPS deployment. Hopefully these save the world some time:

Next Steps

We’re not done. There’s quite a bit left to do.

  • We need to fix mixed content on our chat domains like (from user embedded images, etc.)
  • We need to join (if we can) the Chrome HSTS preload list on all domains where possible.
  • We need to evaluate HPKP and if we want to deploy it (it’s pretty dangerous - currently leaning heavily towards “no”)
  • We need to move chat to https://
  • We need to migrate all cookies over to secure-only
  • We’re awaiting HAProxy 1.8 (ETA is around September) which is slated to support HTTP/2
  • We need to utilize HTTP/2 pushes (I’m discussing this with Fastly in June - they don’t support cross-domain pushes yet)
  • We need to move the https:// 301 out to the CDN/Proxy for performance (it was necessary to do it per-site as we rolled out)

HSTS Preloading

HSTS stands for “HTTP Strict Transport Security”. OWASP has a great little write up here. It’s a fairly simple concept:

  • When you visit an https:// page, we send you a header like this: Strict-Transport-Security: max-age=31536000
  • For that duration (in seconds), your browser only visits that domain over https://

Even if you click a link that’s http://, your browser goes directly to https://. It never goes through the http:// redirect that’s likely also set up, it goes right for SSL/TLS. This prevents people intercepting the http:// (insecure) request and hijacking it. As an example, it could redirect you to https://stack<LooksLikeAnOButIsReallyCrazyUnicode>, for which they may even have a proper SSL/TLS certificate. By never visiting there, you’re safer.

But that requires hitting the site once to get the header in the first place, right? Yep, that’s right. So there’s HSTS preloading, which is a list of domains that ships with all major browsers and how to preload them. Effectively, they get the directive to only visit https:// before the first visit. There’s never any http:// communication once this is in place.

Okay cool! So what’s it take to get on that list? Here are the requirements:

  1. Serve a valid certificate.
  2. Redirect from HTTP to HTTPS on the same host, if you are listening on port 80.
  3. Serve all subdomains over HTTPS.
    • In particular, you must support HTTPS for the www subdomain if a DNS record for that subdomain exists.
  4. Serve an HSTS header on the base domain for HTTPS requests:
    • The max-age must be at least eighteen weeks (10886400 seconds).
    • The includeSubDomains directive must be specified.
    • The preload directive must be specified.
    • If you are serving an additional redirect from your HTTPS site, that redirect must still have the HSTS header (rather than the page it redirects to).

That sounds good, right? We’ve got all our active domains on HTTPS now, with valid certificates. Nope, we’ve got a problem. Remember how we had for years? While it redirects to that redirect does not have a valid certificate.

Using metas as an example, if we pushed includeSubDomains on our HSTS header, we would change every link on the internet pointing at the old domains from a working redirect into a landmine. Instead of landing on an https:// site (as they do today), they’d get an invalid certificate error. Based on our traffic logs yesterday, we’re still getting 80,000 hits a day just to the child meta domains for the 301s. A lot of this is web crawlers catching up (it takes quite a while), but a lot is also human traffic from blogs, bookmarks, etc. …and some crawlers are just really stupid and never update their information based on a 301. You know who you are. And why are you still reading this? I fell asleep 3 times writing this damn thing.

So what do we do? Do we set up several SAN certs with hundreds of domains on them and host that strictly for 301s piped through our infrastructure? It couldn’t reasonably be done through Fastly without a higher cost (more IPs, more certs, etc.) Let’s Encrypt is actually helpful here. Getting the cert would be low cost, if you ignore the engineering effort required to set it up and maintain it (since we don’t use it today for reasons listed above).

There’s one more critical piece of archaeology here: our internal domain is Why ds.? I’m not sure. My assumption is we didn’t know how to spell data center. This means includeSubDomains automatically includes every internal endpoint. Now most of our things are https:// already, but making everything need HTTPS for even development from the first moment internally will cause some issues and delays. It’s not that we wouldn’t want https:// everywhere inside, but that’s an entire project (mainly around certificate distribution and maintenance, as well as multi-level certificates) that you really don’t want coupled. Why not just change the internal domain? Because we don’t have a few spare months for a lateral move. It requires a lot of time and coordination to do a move like that.

For the moment, I will be ramping up the HSTS max-age duration slowly to 2 years across all Q&A sites without includeSubDomains. I’m actually going to remove this setting from the code until needed, since it’s so dangerous. Once we get all Q&A site header durations ramped up, I think we can work with Google to add them to the HSTS list without includeSubDomains, at least as a start. You can see on the current list that this does happen in rare circumstances. I hope they’ll agree for securing Stack Overflow.


In order to enable Secure cookies (ones only sent over HTTPS) as fast as possible, we’ll be redirecting chat (,, and to https://. Chat relies on the cookie on the second-level domain like all the other Universal Login apps do, so if the cookies are only sent over https://, you can only be logged in over https://.

There’s more to think through on this, but making chat itself https:// with mixed-content while we solve those issues is still a net win. It allows us to secure the network fast and work on mixed-content in real-time chat afterwards. Look for this to happen in the next week or two, it’s next on my list.


So anyway, that’s where we stand today and what we’ve been up to the last 4 years. A lot of things came up with higher priority that pushed HTTPS back - this is very far from the only thing we’ve been working on. That’s life. The people working on this are the same ones that fight the fires we hope you never see. There are also far more people involved than mentioned here. I was narrowing the post to the complicated topics (otherwise it would have been long) that each took significant amounts of work, but many others at Stack Overflow and outside helped along the way.

I know a lot of you will have many questions, concerns, complaints, and suggestions on how we can do things better. We more than welcome all of these things. We’ll watch the comments below, our metas, Reddit, Hacker News, and Twitter this week and answer/help as much as we can. Thanks for your time, and you’re crazy for reading all of this. <3

Close this section

When TV Logos Were Physical Objects

It goes without saying that nearly everything made with graphic design and video software was once produced using a physical process, from newspapers to TV Logos. But some TV stations and film studios took things even further and designed physical logos that were filmed to create dynamic special effects. Arguably the most famous of which is MGM’s Leo the Lion which first appeared in 1916 and would go on to include 7 different lions over the decades.

Recently, television history buff Andrew Wiseman unearthed this amazing behind-the-scenes shot of the Office de Radiodiffusion Télévision Française logo from the early 1960s that was constructed with an array of strings to provide the identity with a bright shimmer that couldn’t be accomplished with 2D drawings. The logo could also presumably be filmed from different perspectives, though there’s no evidence that was actually done.

Another famous physical TV identity was the BBC’s “globe and mirror” logo in use from 1981 to 1985 that was based on a physical device. After filming the rotating globe against a panoramic mirror, it appears the results were then traced by hand similar to rotoscoping. One of the more elaborate physical TV intro sequences was the 1983 HBO intro that despite giving the impression of being animated or created digitally was in fact built almost entirely with practical effects. You can watch a 10 minute video about how they did it below. (via Quipsologies, Reddit, Andrew Wiseman)

Update: It turns out the BBC Globe ident wasn’t rotoscoped or animated, instead it was recorded live using the Noddy camera system and the color was created by adjusting the contrast. Thanks, Gene!

See related posts on Colossal about , , , .

Close this section

New Surface Pro

* Sold separately.

** Type Cover sold separately.

1 Up to 13.5 hours of video playback. Testing conducted by Microsoft in April 2017 using preproduction Intel Core i5, 256GB, 8 GB RAM device. Testing consisted of full battery discharge during video playback. All settings were default except: Wi-Fi was associated with a network and Auto-Brightness disabled. Battery life varies significantly with settings, usage, and other factors.

2 Fanless cooling system included with Surface Pro m3 and i5 models only.

3 Available colors vary in some markets.

4 Surface Pen tilt functionality is available now with Surface Pro. Available with other Surface devices via Windows Update.

5 Requires active Office 365 subscription.

Close this section

Xperia Touch makes any surface interactive

Unlike traditional projectors, Xperia Touch does more than put on a show. It turns a flat wall, table or even your floor into an interactive screen. With short-throw projection, Wi-Fi connection and state-of-the art touch functionality, this portable projector adds a whole new dimension to your home.

Close this section

Hardbin: secure encrypted pastebin

Hardbin is an encrypted pastebin using IPFS. It was created by James Stanley.

Unlike other pastebins, Hardbin does not require you to trust any server. You can run a local IPFS gateway and then you can always be certain that no remote server is able to modify the code you're running. In particular, this means no remote server is able to insert malicious code to exfiltrate the content of your pastes.

There is a public writable gateway available at which allows creation of pastes without running a local gateway.

If you want to learn more, please see

If you want to contribute to Hardbin development, please see the github repo.

Close this section

Compressing Images Using Google’s Guetzli

As you may already know, the average web page is now heavier than Doom.
One of the reasons for this increase is the weight of images, and the need to support higher resolutions.

Google to the Rescue

Google just published a new JPEG compression algorithm: Guetzli.
The main idea of this algorithm is to focus on keeping the details the human eye is able to recognize easily while skipping details the eye is not able to notice.
I am no specialist, but the intended result is to get an image which percived quality is the same, but with a reduced filesize.

This is not a new image format, but a new way to compress JPEG images
Which means that there is no need for a custom image reader, the images are displayed by anything that already renders JPEGs.

Guetzli in Real Life

On one of my projects, we had an image-heavy homepage (about 30Mb for the homepage alone, 27 of those only for the images).
I decided to give Guetzli a try, to convince our product owner and our designer that the quality loss would be acceptable, I tried this new algorithm on one of the high-res image we were not using (a 8574×5715, 22MB JPEG).

It crashed.
According to google (and my experiences confirms the figures), Guetzli takes about 300MB RAM per 1Mpix of image (so about 15GB for the image I had) and I did not have that memory available at the time (half a dozen node servers, a couple docker containers, chromium and a couple electron instances were taking enough space to get my computer under the requirement).
I retried after cleaning up every non-vital process, Guetzli took 12GB of RAM but succeeded.
Google also states that it take about one minute per MPix for Guetzli to process an image, which is about the time it took me (a bit above 40minutes).

The resulting image weighted under 7MB (from 22MB), and I could not determine by looking at them which was the compressed one (our designer could, but admitted that the difference was “incredibly small”).

6.9M    home-guetzli.jpg
22M home-raw.jpg

That compression was made using Guetzli’s default quality setting (which goes from 84 to 100, to get under 84 you would need to compile and use a version where you change the minimal value).

More Tests and Some Successes

I then decided to try different quality settings for that image (wrote a very simple script to do that without having to relaunch the process every 40 minutes, and to be able to do it during my sleep).
The results are here (and it seems that Guetzli’s default quality factor is 95).

6.9M    ./home-guetzli.jpg
22M ./home-raw.jpg
3.0M    ./home-raw.jpg.guetzli84.jpg
3.4M    ./home-raw.jpg.guetzli87.jpg
4.2M    ./home-raw.jpg.guetzli90.jpg
5.5M    ./home-raw.jpg.guetzli93.jpg
8.8M    ./home-raw.jpg.guetzli96.jpg
18M ./home-raw.jpg.guetzli99.jpg

Both the product owner and the designer agreed to go with the 84 quality factor. I then converted all our assets and we went from 30MB to less than 8MB for the homepage (3MB of those being the CSS/script).
Should be noted that there was not any form of image compression before.


The installation of Guetzli on my machine was painless (someone set up an AUR package containing Guetzli on archlinux, thanks a lot to whoever did that), and running it is straightfoward (as long as you have enough RAM).
There seems to be a brew package (for macOs users), but I did not test it.

Guetzli requires a lot of RAM and CPU time for huge images (a lot being relative, i.e. don’t expect to be able to do anything while it’s running).
If RAM is not your bottleneck you might even want to consider to run multiples instances of Guetzli in parallel on different images, as it is (as of this writting) only taking one core.

Being a JPEG encoder, it cannot output PNGs (so no transparency).
But it can convert and compress your PNGs.
It’s efficiency is tied to the initial quality of the picture: I noticed the compression ratio going from 7x on the largest image to 2x on small images.
The quality loss was also more visible on those small images.

On a few cases I also witnessed a loss of color saturation (which was deemed acceptable in my case).


Give Guetzli a try, it might give you unacceptable results (especially with low quality), but it might save you a few MBs on your website.

You liked this article? You'd probably be a good match for our ever-growing tech team at Theodo.

Join Us

Close this section

Join Virool (YC S12) Video AI Startup as an Ad Operations Ninja

Company Overview:

Virool is a video advertising platform for brand marketers. To date, we've promoted more than 75,000 global video campaigns across our network of more than 100,000 publishing properties on desktop and mobile web, through our unique and sophisticated video ad units.

Digital video is the fastest growing online advertising format and the industry is expected to reach $5 billion this year. We are on a mission to create the most exceptional video advertising experience for our customers, having worked with some of the biggest brands including Coca-Cola, Turkish Airlines, WestJet, Under Armour and many more. Best of all, our job requires that we watch amazing online videos every day.

Virool graduated from Y Combinator in the summer of 2012 and since then, the company has raised $18M in funding from a collection of top VC firms and angel investors, including Menlo Ventures, 500 Startups, Draper Fisher Jurvetson, Paul Buchheit (creator of Gmail and AdSense), Zod Nazem (CTO, Yahoo) and others.

What it means to be a Viroolian:

There’s something that runs deep in the DNA of Virool: we see no limits. We are an inspired, fun-loving and hardworking team that believes we can change the online video advertising industry. We understand that in order to reach our goals, we need the very best people to get there. As an early member of our team, your contributions will have a major impact in our success. If being a big part of the Virool story excites you, we want to chat.

About the role:

We are seeking a highly organized, detail-oriented AdOps Coordinator to join our team and lead the setup and execution of successful advertising campaigns. We are a deadline and metrics driven team-oriented group that strives to meet or exceed KPIs while having fun along the way. The successful candidate enjoys collaborating with other functions, identifying and solving problems, knowing when to escalate as needed, and proactive with ensuring stellar delivery and performance.


  • Set up, test, and QA advertising campaigns using in-house/proprietary ad-server and supplementary tools
  • Troubleshoot ad-serving errors and report discrepancies
  • Continually monitor advertising campaigns and ensure that they adhere to technical specifications based on internal directive as well as iAB standards where applicable 
  • Hold frequent meetings with Account Management team and other internal teams to showcase results, progress and optimize future results
  • Play pivotal role as feedback loop node for tech, product, sales issues
  • Troubleshoot and beta test new product implementations and technical updates


  • BA/BS degree
  • 1+ years of relevant advertising or operations experience
  • Strong written and oral communication skills
  • Interest in the digital ad-tech space
  • Familiarity with third party ad-server technology
  • Reporting analysis (1st vs third party, viewability, etc.)
  • Inventory management and forecasting
  • Familiarity of VPAID and VAST technologies
  • Must be detail-oriented and have the ability to multi-task in a fast-paced work environment
  • Proficiency with Microsoft Office suite, especially Excel
  • Understanding of HTML, CSS, Flash or JavaScript
  • Experience with web troubleshooting tools such as Firebug, Charles, Chrome/Safari Inspector or Web Developer Tools
  • Some Programmatic experience a plus (AOL, DBM, SpotX, Rubicon platform experience preferred

Close this section

Viper: a new programming language from Ethereum

Viper is an experimental programming language that aims to provide the following features:

  • Bounds and overflow checking, both on array accesses and on arithmetic
  • Support for signed integers and decimal fixed point numbers
  • Decidability - it's possible to compute a precise upper bound on the gas consumption of any function call
  • Strong typing, including support for units (eg. timestamp, timedelta, seconds, wei, wei per second, meters per second squared)
  • Maximally small and understandable compiler code size
  • Limited support for pure functions - anything marked constant is NOT allowed to change the state


Note that not all programs that satisfy the following are valid; for example, there are also requirements against declaring variables twice, accessing undeclared variables, type mismatches among other rules.

body = <globals> + <defs>
globals = <global> <global> ...
global = <varname> = <type>
defs = <def> <def> ...
def = def <funname>(<argname>: <type>, <argname>: <type>...): <body>
    OR def <funname>(<argname>: <type>, <argname>: <type>...) -> <type>: <body>
    OR def <funname>(<argname>: <type>, <argname>: <type>...) -> <type>(const): <body>
argname = <str>
body = <stmt> <stmt> ...
stmt = <varname> = <type>
    OR <var> = <expr>
    OR <var> <augassignop> <expr>
    OR if <cond>: <body>
    OR if <cond>: <body> else: <body>
    OR for <varname> in range(<int>): <body>
    OR for <varname> in range(<expr>, <expr> + <int>): <body> (two exprs must match)
    OR pass
    OR return
    OR break
    OR return <expr>
    OR send(<expr>, <expr>)
    OR selfdestruct(<expr>) # suicide(<expr>) is a synonym
var = <varname>
    OR <var>.<membername>
    OR <var>[<expr>]
varname = <str>
expr = <literal>
    OR <expr> <binop> <expr>
    OR <expr> <boolop> <expr>
    OR <expr> <compareop> <expr>
    OR not <expr>
    OR <var>
    OR <expr>.balance
    OR <system_var>
    OR <basetype>(<expr>) (only some type conversions allowed)
    OR floor(<expr>)
literal = <integer>
    OR <fixed point number>
    OR <address, in the form 0x12cd2f...3fe, CHECKSUMS ARE MANDATORY>
    OR <bytes32, in the form 0x414db52e5....2a7d>
    OR <bytes, in the form "cow">
system_var = (block.timestamp, block.coinbase, block.number, block.difficulty, tx.origin, tx.gasprice, msg.gas, self)
basetype = (num, decimal, bool, address, bytes32)
unit = <baseunit>
    OR <baseunit> * <positive integer>
    OR <unit> * <unit>
    OR <unit> / <unit>
type = <basetype>
    OR bytes <= <maxlen>
    OR {<membername>: <type>, <membername>: <type>, ...}
    OR <type>[<basetype>]
    OR <type>[<int>] # Integer must be nonzero positive
    OR <num or decimal>(unit)
binop = (+, -, *, /, %)
augassignop = (+=, -=, *=, /=, %=)
boolop = (or, and)
compareop = (<, <=, >, >=, ==, !=)
membername = varname = argname = <str>


  • num: a signed integer strictly between -2**128 and 2**128
  • decimal: a decimal fixed point value with the integer component being a signed integer strictly between -2**128 and 2**128 and the fractional component being ten decimal places
  • timestamp: a timestamp value
  • timedelta: a number of seconds (note: two timedeltas can be added together, as can a timedelta and a timestamp, but not two timestamps)
  • wei_value: an amount of wei
  • currency_value: an amount of currency
  • address: an address
  • bytes32: 32 bytes
  • bool: true or false
  • type[length]: finite list
  • bytes <= maxlen: a byte array with the given maximum length
  • {base_type: type}: map (can only be accessed, NOT iterated)
  • [arg1(type), arg2(type)...]: struct (can be accessed via struct.argname)

Arithmetic is overflow-checked, meaning that if a number is out of range then an exception is immediately thrown. Division and modulo by zero has a similar effect. The only kind of looping allowed is a for statement, which can come in three forms:

  • for i in range(x): ... : x must be a nonzero positive constant integer, ie. specified at compile time
  • for i in range(x, y): ... : x and y must be nonzero positive constant integers, ie. specified at compile time
  • for i in range(start, start + x): ... : start can be any expression, though it must be the exact same expression in both places. x must be a nonzero positive constant integer.

In all three cases, it's possible to statically determine the maximum runtime of a loop. Jumping out of a loop before it ends can be done with either break or return.

Regarding byte array literals, unicode strings like "这个傻老外不懂中文" or "Я очень умный" are illegal, though those that manage to use values that are in the 0...255 range according to UTF-8, like "¡très bien!", are fine.

Code examples can be found in the file.

Planned future features

  • Declaring external contract ABIs, and calling to external contracts
  • A mini-language for handling num256 and signed256 values and directly / unsafely using opcodes; will be useful for high-performance code segments
  • Smart optimizations, including compile-time computation of arithmetic and clamps, intelligently computing realistic variable ranges, etc

Code example

funders: {sender: address, value: wei_value}[num]
nextFunderIndex: num
beneficiary: address
deadline: timestamp
goal: wei_value
refundIndex: num
timelimit: timedelta
# Setup global variables
def __init__(_beneficiary: address, _goal: wei_value, _timelimit: timedelta):
    self.beneficiary = _beneficiary
    self.deadline = block.timestamp + _timelimit
    self.timelimit = _timelimit
    self.goal = _goal
# Participate in this crowdfunding campaign
def participate():
    assert block.timestamp < self.deadline
    nfi = self.nextFunderIndex
    self.funders[nfi] = {sender: msg.sender, value: msg.value}
    self.nextFunderIndex = nfi + 1
# Enough money was raised! Send funds to the beneficiary
def finalize():
    assert block.timestamp >= self.deadline and self.balance >= self.goal
# Not enough money was raised! Refund everyone (max 30 people at a time
# to avoid gas limit issues)
def refund():
    ind = self.refundIndex
    for i in range(ind, ind + 30):
        if i >= self.nextFunderIndex:
            self.refundIndex = self.nextFunderIndex
        send(self.funders[i].sender, self.funders[i].value)
        self.funders[i] = None
    self.refundIndex = ind + 30


Requires python3

python install


python test

Close this section

Ask HN: How liquid is Bitcoin?

I have done 2x $1mill bitcoin sells, i did each over 2 days to avoid exchange slippage, using bitstamp. If you had accounts at multiple exchanges you could easily do it in an hour or 2 without too much slippage.

They did ask for information on the source of the bitcoins, but I had evidence of this. Funds were in my account within 5 days.

(this is with a pre-verified account)

Side note: speaking from experience, you need to be very careful about tying your public identity to any statements about holding this amount of cryptocurrency. I've seen extremely well targeted (successful) spear phishing and identity theft attacks to steal bitcoin from people who posted online about holding much smaller bitcoin fortunes.

I realize your username says `anon...`, but you've already made HN comments about where you live and I only did a 30s glance.

> you need to be very careful about tying your public identity to any statements about holding this amount of cryptocurrency. I've seen extremely well targeted (successful) spear phishing and identity theft attacks to steal bitcoin

...and hopefully you correctly reported the profit/loss on those coins to your friendly tax authority, otherwise it's not just thieves who will be out to get you.

> it's not just thieves who will be out to get you.

Man I hate the implication that the tax man is being unreasonable by applying basic goddamned tax rules to profits made in all of a persons enterprises. They aren't out to get you, they're out to collect a "fair" proportion of your earnings to pay for a crapload of services which you and your community use every god damned day.

Not true. Spread betting is tax free, which most retail FX trading is conducted as. But people don't realize they are paying huge implicit trading costs to do this. And institutional investors (who certainly don't use spread betting accounts) pay ordinary rates on FX trading just like anything else.

Assuming the person you're replying to is an American, the majority (> 70%) of the federal budget ( goes to military spending, interest payments, pensions and welfare. If the person you're replaying to is a software developer, they're likely financially independent and not relying on any such services.

How is any American not relying on military spending? In any case, we decided as a society that even if you aren't using programs you still need to contribute to them. That's how they get funded... You think the people on safety nets pay enough in taxes to pay for the programs?

Because 99% of the military's job isn't to protect us. It is to bomb counties that we should be ignoring.

And if the Federal government stopped paying for all that stuff, what would happen? Financial collapse, chaos, and perhaps war. Would our hypothetical financially independent software developer be able to continue living his life in the same way when that happens? I doubt it.

It's foolish to think that you only benefit from government spending if it results in money in your bank account or services you directly, personally take advantage of.

That's why I said "and community" - because even the most self sufficient person lives in a world surrounded by other people who do use services. Be that the person who packs their amazon drone, or makes their coffee, or sleeps on their streets.

If you want to claim that military spending is not something you should fund. I suggest getting out of your bubble and campaigning to help elect people who agree with you. The price of democracy, in the meantime, is a bunch of military spending.

It only takes 33 bits of unique information (at the most) to successfully deanonymize someone :)

Id ask mods to delete the post. I'm no detective but reckon I could figure out your name from the info in your past posts, assuming they are truthful.

Curious question, since a one million dollar bitcoin sell is tiny in the schema of daily volume, why would it incur slippage?

Volume and order book depth are two unrelated figures.

Volume is roughly proportional to trading fees, while slippage is a function of order book depth.

It's easy to increase volume: relinquish trading fees. Getting people to deposit USD/BTC on your exchange, and lock it in a sell/buy order, is a different matter though, since it mainly depends on trust in the company behind the exchange (since you will lose that money if the exchange goes bust).

since it's at more or less an all time high you would have to have had a series of spectacularly poor trades preceding it to not have made a profit.

According to his comment history he bought them back in 2011 when they were super cheap and did partial cash outs.

$1MM is nothing. At current prices that is ~500 BTC. Looking at just the Bitfinex order book[1], you could sell this in one trade with <5% slippage. If you split it out in say 10 trades over 24 hours you would probably not move the market at all.

Most other exchanges also publish their complete order book so you can fairly easily view their liquidity and answer this question.


And that's to sell it all off instantly by putting in the absolute least amount of effort possible. The point is that exchanging a million dollars worth of bitcoin is not difficult.

You specifically cannot cash out with HKD. The only ActualMoney you can get out of Bitfinex at present is Taiwan NTD to a local account.

It is not. They do wire money back into your account. In fact, if you're a US person, then you are covered by FDIC rules. During the hack, Bitfinex didn't touch money that was in USD; for the rest there was a haircut.

Given you still can't get actual US Dollars out of Bitfinex, it is poor advice to take them as indicative of anything whatsoever about Bitcoin except as a cautionary tale.

Given the proper completion of AML / KYC processes liquidating 1m and withdrawing is pretty painless (although I wouldn't recommend a single market sell) on the major US exchanges.

If the cryptocurrency markets interest anyone here, we are hiring for full time and contract positions for software engineers with python and go experience for market entry/exit algo dev, backoffice P and L dev, and more as well as data science / quant roles. Currently in stealth mode but should be public soon.

Reach out to me at supersquare (at) for more information

The literalness of "cold hard cash" is going to be very important here.

Into a digital account somewhere as USD? Fairly easy. As a million bucks in literal physical cash? something tells me that would be an absolute shitshow, since 10k cash withdrawl is enough to trigger special processes at most institutions.

No, liquidity is the market's ability to make a transaction with an asset without affecting that asset's price, not how easily it can be exchanged. Cash transactions are pretty much as liquid as it's possible for anything to be. As another poster mentions above, a $1m bitcoin transaction has be to spread out over a day or two to not impact price, otherwise it can fluctuate several percent.

You could arrange something in person, but you'd probably better bring those machines that check for fake money.

Yeah really. A Craigslist post to do such a transaction at some back alley isn't going work; they'll wipe your entire family for $1 million. If you get ripped off, probably can't even complain, I'm sure cash transactions above a certain amount are illegal in many countries.

Because nobody else has asked, when you say "cold, hard cash", do you mean that literally? As in a case full of physical greenback?

Yes, it is feasible to sell $1 million in BTC (444 BTC at present) in one day. Assuming you're in the U.S., my recommendation is to signup for a Gemini account and read about their daily auctions. They do more than enough volume to fetch you a good price. They're also a highly trusted and very professionally run exchange. I can't recommend them highly enough.

If that doesn't work for you for whatever reason, Bitstamp and Coinbase are both good options. Both have the daily volume to give you a good price on your 444 BTC. Coinbase has pretty strict daily withdrawal limits ($10k per day by default) whereas Bitstamp does not, so I'd try Bitstamp before Coinbase. However, Coinbase can increase those limits if you ask nicely.

Whatever you do, stay away from Bitfinex. They're currently not letting anyone withdraw money.

The last point is not true. Bitfinex withdrawals in btc, xmr, eur, hkd all work just fine. USD is the only unusable currency.

You could easily sell $1M in one day (e.g. in the Gemini auction) but you'd lose money due to slippage.

There are brokers and dark pools that could handle $1M without slippage.

The S&P500 index funds are linked to the S&P500 where there are multiple ways to trade and are arbitraged to the nearest cent. People have mentioned VOO & SPY already but most liquid are futures which are super liquid. Eg yesterday there was 2,394501,116,312=$78 Billion traded in the Chicago e-mini contract alone. Which is more than $3mil a second.

Held as ETFs? Assuming it's in big ones like VOO, they can see $400M turnover in a business day, so if you're not picky about getting the best fill price and just sell at the bid, you should be able to do it within a minute or two. You can then call your broker for a wire transfer and have it in your bank account within the same day.

By what pathway do you believe it is possible to start a day with money in Vanguard, end the day with money in Bank of America, and hold ~$1 million in Bitcoin for some portion of the day.

Wiring a million dollars from Vanguard to BoA is a routine transaction, despite the degree of surprise that causes some geeks. More interesting commerce happens every day.

The Bitcoin transaction you have outlined, though, is not a routine transaction. I'd go further than that: it is impossible.

Average daily volume of VOO over the last 60 days is about 2,000,000 shares. At ~$200/share (it's actually about $220 right now), that's about $400m dollars daily, so unloading $1m would be well under 1% of shares traded in a typical day.

And notably the fraction of VOO shares traded on any given day to the total market cap of the fund is about 100:1. With bitcoin it's more like 10k:1. Thus the liquidity worries.

I'm gonna take a guess and say 5 minutes or less. I've moved much less index funds (magnitude of 10x less) and it was no problem.

One security isn't really the right point of comparison though. Someone with a ton of money to park in "the market" obviously has the choice of doing so in a diverse portfolio from which they can sell in enormously high volume.

Bitcoin is just one thing.

No, lots of index fund investors don't have a diverse portfolio, they get the diversity through the index fund (see bogleheads).

A worldwide index fund from Vanguard is all you need if you want to own the whole market. Optionally add a few bonds to reduce volatility.

Unless you have millions of dollars to park, in which case a single security is a liquidity risk (which has nothing to do with diversity risk) and you take the obvious path of manually diversifying. Same treatment, different problem.

The point was that you can't "diversify" Bitcoin holdings like this. It's inherently illiquid on these scales.

There are private brokers who will facilitate trades larger than this, and I know Xapo does (did?) as well. Transferring the money back to you may be the more difficult part, depending on where you are.

Cumberland mining, itbit otc, gemini daily auction. You could sell them easily with a typical 0.1% cost. Silbert's otc division probably (never traded with them though). $1mm is a pretty common trade nowadays.

If you want to sell and not incur slippage, bitfinex has an OTC market and Gemini has an auction market. Millions are changing hand everyday.

Trading volume on Bitfinex is in the 100s of million, and in the 10s of millions on most of the other exchanges.

You could do it in a day but it's probably the upper end of what you could trade without risking having an impact on the price. You'd want to be careful not to dump it all in one transaction.

That may not be real liquidity. It may be the same money going round and round. The famous "flash crash" of 2010 happened when a big mutual fund did a real sell, for cash, underestimating the real liquidity of the market. "Between 2:45:13 and 2:45:27, HFTs traded over 27,000 contracts, which accounted for about 49 percent of the total trading volume, while buying only about 200 additional contracts net." - SEC report.

In-exchange volume indicates nothing about actual US dollar liquidity.

That and the fact that you can't get actual US dollars out of Bitfinex. (Which is what triggered the present bubble - unable to withdraw their USD, people bought more cryptos with them because there was literally nothing else they could use them for.)

So, and this doesn't answer the one million dollar question.

I recently liquidated a much smaller amount (in the thousands ) of Bitcoin.

Using Kraken (no affliation) it probably took me about a month from transferring my btc from My private wallet to them, to having the ability to finally transfer to my bank account. This was largely due to the verification time, and some issues with the verification photo.

Once it was all setup the process was quick and painless.

Depends on whether you want to hide the fact of the transactions. Going the legit way, suppose you don't

Then it is the matter of registering on an exchange and buying out all the bids. With 1m even now, you'll make a whole lot of turbulence in the market. After that just withdraw it to your bank accounts, you might have to have a chat with your exchange before that though

If you had existing accounts with all the exchanges then yes you could do it in a couple of days. A lot of them have $10k or $20k limits. Also it can take some days to verify your account, which needs to be done for most exchanges.

If you open an account with Gemini (large, legitimate US-based exchange) and go through their standard AML / KYC processes they will do unlimited USD wire transfers, both incoming and outgoing, for either personal or institutional accounts.

If you go through the standard AML / KYC processes at any of the other US-based exchanges the wire withdrawal limits go up significantly (hundreds of thousands) and there are similar upgrades in withdrawal limits across the major euro exchanges.

Order books are often disconcertingly thin. When GDAX restarted with a configuration error only allowing sells, the price went from $1184 to 6 cents in about 100 BTC of trades. This quickly recovered of course, but that's how thin it is.

Large sales frequently cause $20-30 dips.

So you'll want to take it slowly, and across multiple exchanges if you can. Check Reddit that a given exchange isn't having (what's the phrase) "problems with the traditional banking system", i.e. you can get ActualMoney out.

Also, don't go within a mile of Bitfinex - users still can't get hard currency out without a local Taiwan bank account (if even that still works).

Close this section

Maine Is Drowning in Lobsters

In his famous 1968 essay "The Tragedy of the Commons," biologist Garrett Hardin singled out ocean fishing as a prime example of self-interested individuals short-sightedly depleting shared resources:

Professing to believe in the "inexhaustible resources of the oceans," they bring species after species of fish and whales closer to extinction.

The whales have actually been doing a lot better lately. Fish in general, not so much.

Then there's the Maine lobster. As University of Maine anthropologist James M. Acheson put it in his 2003 book "Capturing the Commons: Devising Institutions to Manage the Maine Lobster Industry":

Since the late 1980s, catches have been at record-high levels despite decades of intense exploitation. We have never produced so many lobsters. Even more interesting to managers is the fact that catch levels remained relatively stable from 1947 to the late 1980s. While scientists do not agree on the reason for these high catches, there is a growing consensus that they are due, in some measure, to the long history of effective regulations that the lobster industry has played a key role in developing.

Two of the most prominent and straightforward regulations are that lobsters must be thrown back in the water not only if they are too small but also if they are too big (because mature lobsters produce the most offspring), and that egg-bearing females must not only be thrown back but also marked (by notches cut in their tails) as off-limits for life. Acheson calls this "parametric management" -- the rules "control 'how' fishing is done," not how many lobsters are caught -- and concludes that "Although this approach is not supported by fisheries scientists in general, it appears to work well in the lobster fishery."

It's a seafood sustainability success story! But there's been an interesting twist since Acheson wrote those words in 2003. That already-record-setting Maine lobster harvest has more than doubled:

The Lobster Boom

Maine annual lobster harvest, in pounds

Source: Maine Department of Marine Resources

Sustainable fisheries practices alone can't really explain why today's lobster take is more than seven times the pre-2000 average. What can? The most universally accepted answer seems to be that depletion of the fish that used to eat young lobsters (mainly cod, landings of which peaked in Maine in 1991 and have fallen 99.2 percent since) has allowed a lot more lobsters to grow big enough for people to catch and eat them. The tragedy of one commons has brought unprecedented bounty to another.

Warming ocean temperatures have also improved lobster survival rates. Canada's Atlantic provinces have experienced a lobster boom similar to Maine's. Not so in the New England states to the south and west of Maine, where the water is now apparently a little too warm and lobster harvests peaked in the 1990s. Within Maine, which now accounts for more than 80 percent of U.S. lobsters, the sweet spot for lobstering has moved from the state's southern coast to the cooler northeast.

Other explanations I've heard during a visit to Maine this week include:

  1. Reduced incidence of a lobster disease called gaffkemia, and
  2. Increased effort and efficiency on the part of lobstermen, who go farther offshore and can haul in more traps in a day than they used to.

The one thing nobody can answer is how long these good times will continue. We journalists have a tendency to see disaster around every corner -- Quartz's Gwynn Guilford concluded an epic 2015 examination of the lobster boom with this warning:

Two decades of lobster abundance isn’t thanks to human mastery of “sustainability.” The ecosystem extremes that seem likely to have produced it -- how we’ve pulled apart the food web, heated up the sea, re-rigged the lobster population structure -- are volatile. Inevitably, nature warps again.

Those seem like reasonable concerns. But every time I brought them up among lobster folks this week, I was greeted with something of a shrug. As Acheson documents in his book, similar worries about an end to the abundance have been voiced for decades now -- and until they come true, there doesn't seem to be much point in not harvesting the available bounty, given that there's little to no indication that lobstering is exhausting it.

This leaves the Maine (and Canadian) lobster industry with another interesting challenge: how to find enough buyers for all those lobsters so that prices don't collapse. As you can see from the chart below, they've mostly succeeded:

Holding Steady(ish)

Real Maine wholesale lobster prices, per pound*

Sources: Maine Department of Marine Resources, U.S. Bureau of Economic Analysis

Affluent Chinese diners have been one reason. This January, five chartered 747s full of live lobsters flew from Halifax, Nova Scotia, to China to supply Chinese New Year feasts. Maine's lobsters tend to make the voyage less dramatically, in regularly scheduled flights from Boston, but $27 million worth of them were shipped to China in 2016.

The national and even global spread of the lobster roll has also helped a lot. I came to Maine on a trip organized by Luke's Lobster, a fast-casual restaurant chain that now has 21 "shacks" in the U.S. and eight more scheduled to open this year, along with six licensed locations in Japan. Founder Luke Holden was an investment banker in New York when he and former food writer Ben Conniff opened the first restaurant in the East Village in 2009, but he's also the son of a Maine lobsterman who owned the state's first lobster-processing plant.

Luke's Lobster now has its own plant in Saco, Maine, that processes between 4 and 5 percent of the state's lobster harvest. Processing, in this case, means cooking and picking the meat out of the claws and knuckles for Luke's lobster rolls while cleaning and freezing the raw tails and clawless "bullet" lobsters for sale to restaurants, groceries and such.

Holden's father, Jeff, says that tails used to sell for much more than claw meat. Now lobster rolls, for which tail meat is generally too chewy, have flipped the price equation.

All in all, it's a fascinating tale of adaptation, marketing and lobster logistics. There is one big catch, though, beyond the vague fears that the lobsters can't be this abundant forever. It's that the bait used to lure the lobsters into traps -- herring -- isn't as abundant as they are. Herring stocks along the Maine coast haven't collapsed as some other fisheries have, but the catch has fallen in recent years, to 77 million pounds in 2016 from 103 million in 2014 and more than 150 million some years in the 1950s and 1960s.

On average, it takes about a pound of herring to catch a pound of lobster. Last year, Maine's lobster harvest was 130 million pounds. To make up the difference, says Bob Bayer, executive director of the Lobster Institute at the University of Maine, they've been buying herring from Canada and Massachusetts and experimenting with other bait such as ocean perch and tuna heads.

Even with all that, herring prices have quadrupled over the past 15 years. Steve Train of Long Island, Maine, whose lobster boat I went out on Wednesday morning, estimated that a hypothetical lobsterman who brings in 40,000 pounds of lobster a year (more than the state average but less than Train usually gets) spends $40,000 to $50,000 a year on bait. After that, fuel, labor and other costs, "a guy landing 40,000 pounds a year is maybe making $45,000 a year."

This is Maine, where that, coupled with a spouse's income (Train is married to a teacher), can pay for a quite comfortable lifestyle. But even the greatest lobster boom in history still isn't exactly making the lobstermen rich.

This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.

To contact the author of this story:
Justin Fox at

To contact the editor responsible for this story:
Brooke Sample at

Close this section