Hey there, my name's Mitch and I'm a software/web developer at N.C. State! Check out the about page or see below for my most recent blog posts.

Arroz con pollo recipe

This is a divergence from the usually technically-oriented post. I’ve never put a recipe up on my website, but I made up a pretty good arroz con pollo recipe that I’ve made on a few occasions. It’s kinda long to make, but really good hit the spot hearty cuban-ish food.

Serves 6

Ingredients - need prep

  • Chicken - 3 breasts, cut to ~1.5 inch pieces
  • Half a red onion (cut to 3/4 inch squares)
  • Red & green bell pepper - half a pepper each kind (cut into strips)
  • 3 sweet potatoes (large dice - 3/4 inch)
  • Garlic (finely chopped, or if you’re lazy like me, a bit of garlic powder)

Ingredients - no prep

  • Black rice and wild rice (1 cup each)
  • Olive oil (drizzle on sweet potatoes)
  • Black beans, half drained (2 cans - ~30 oz)
  • Diced tomatoes (1 can)
  • 2 limes

Ingredients - spices

  • Italian seasoning/oregano (pick one)
  • Rosemary
  • Cumin
  • Some salt/pepper
  • Chile powder
  • Cilantro and extra lemon/lime(s) for garnish

Prep

  1. [3 sweet potatoes] [olive oil] [Rosemary, cumin, cinnamon, salt/pepper] Preheat oven to 400. Peel and dice 3 sweet potatoes to approximately 3/4 inch cubes. Spread onto sheet pan and drizzle a bit of olive oil for moisture. Spice with rosemary leaves, cumin, cinnamon, salt, and pepper. Use a spatula to lightly toss chunks until seasoning and oil is evenly distributed. Bake at 400 for 50 minutes.
  2. [Half a red bell pepper] [Half a green bell pepper] [Half a red onion] Chop half a red and half a green bell pepper (or any combination of pepper colors) into 1/2 inch by 3/4 inch strips and put in a bowl for later cooking. Chop half a red onion into 3/4 inch squares and mix with bell peppers.
  3. [Cumin, chile powder, salt, pepper, garlic powder or finely chopped garlic] Prepare spice mix by combining some ratio of cumin, chile powder, salt, pepper, and garlic powder into a small bowl. Not sure how much of each, just kinda splash it in there until there’s enough. Or put it directly in the chicken/veggies sautee as needed.

Cook

  1. [1 cup black rice, 1 cup wild rice] Bring 1 cup of black rice and 1 cup of wild rice to a boil in uh…however much water the rice bag calls for, and for however long it says. Squeeze juice from 2 limes into rice and stir.
  2. [3 chicken breasts] [1 can diced tomatoes] [2 cans black beans, only 1 drained] Prep chicken by cutting off fat and cutting into large chunks (~1.5 inch). Sautee 5-7 minutes or until almost golden brown, and add vegetables from step 2. Add half of spice combination and sautee ~5mins, until vegetables are cooked. Add 1 can of diced tomatoes, 1 can drained black beans, and 1 can UNdrained black beans; bring to a simmer for 5 mins. and stir ingredients until mixed.
  3. When sweet potatoes are done, remove from oven and add to chicken/beans simmer. And add the rest of the spice mix.
  4. [Cilantro] Serve the rice and the chicken mix as a topping. Use cilantro as a garnish.

Goes well with cilantro lime corn (recipe soon to be posted?) and refried plantains (also soon to be posted?)

My Job: curating SE data at openscience.us

Back in January, I started a part-time job at N.C. State, joining my colleague and good friend Carter Pape in developing version 4 of openscience.us/repo, a long-term repository for software engineering (SE) research data. I wasn’t too knowledgeable about what I was getting into, but through the past few months I’ve gained some insight into the philosophy of SE research – and how Dr. Tim Menzies and his brainchild OpenScience is making software engineering research a more reproducible and replicable process.

Software Analytics

Software analytics deals with the analysis of data gathered from software engineering to gain insights that can produce actionable advice to improve software engineering. Such advice can include using

  • XML descriptions of design patterns to recommend particular designs [1],
  • software process models to learn effective project changes [2], and
  • bug databases to learn defect predictors that guide inspection teams to where the code is most likely to fail [3-5].

A common problem associated with software analytics, which includes SE research, is that many research papers that used SE data to reach conclusions is not provided with the paper. An essential paradigm of the scientific method is that results must be both reproducible and replicable. Reproducibility is the ability to reproduce an experiment; e.g. take somebody’s previous experiment and rerun it with either their data or on your own data (it depends on who you ask – the precise definition is a bit fuzzy). Replicability is achieved when the same results are gathered from the same experimental methods with the same data.

So…when the data used in a particular study or experiment is not provided to the academic community, the study or experiment is irreproducible and therefore irreplicable. There’s solid evidence of this, as stated on the openscience.us/ssj/manifesto page:

There are very few replications of prior SE results. For example, in 2010, Robles published a retrospective study of the 171 papers published in the Mining Software Repositories (MSR) conference [106]. He found that over 95 of those papers were unreproducible, since their associated data was no longer on-line. This lack of availability of old MSR data was discussed, at length, at MSR 2013. According to those participants, the single biggest contributor to this issue was the lack of a free-to-use long-term storage facility for big files. For example, free services like GitHub or GoogleCode impose limits of just a few gigabytes on the total repository size.

So not only is some data not being published (therefore breaking the academic research model), but the data that is published tends to go missing over time. As the manifesto states, a reliable long-term storage repository for data simply didn’t exist. That’s where I come in!

The tera-PROMISE repository

My job involves two main branches of work: developing the actual site with HTML, CSS, and the Jekyll framework, and curating/adding research data as it is submitted. It is also described in the OpenScience manifesto:

SOLUTION #4: Create a large free-to-use repository for SE research products. To this end, we have created a large repository for storing the data, plus creating a discussion site for those contents calculate, that this repository requires one petabyte of storage.

So I work on building the actual site with Jekyll, a “simple, blog-aware, static site generator”, and adding datasets to the site as Jekyll posts and uploading them to the SVN repository. The Jekyll site hosts the data descriptions and context notes to the data, along with a link to the data in the SVN repository, hosted separately at N.C. State University.

And that’s my job! There are currently over 100 datasets housed in the tera-PROMISE repository, some of which are circa 2004, sorted into 18 categories. If you happen to be a researcher who wants data to be uploaded, feel free to browse around the current projects and fill out a Google Form with the appropriate information. You can also email openscience.content@gmail.com if you prefer.

These references were gathered from this IEEE article on software analytics, co-authored by Dr. Menzies.

F. Palma, H. Farzin, and Y.-G. Gueheneuc, “Recommendation System for Design Patterns in Software Development: A DPR Overview,” Proc. 3rd Int’l Workshop Recommendation Systems for Software Eng., IEEE, 2012, pp. 1–5

D. Rodríguez et al., “Multiobjective Simulation Optimisation in Software Project Management,” Proc. Genetic and Evolutionary Computation Conf., ACM, 2011, pp. 1883–1890.

T. Menzies, J. Greenwald, and A. Frank, “Data Mining Static Code Attributes to Learn Defect Predictors,” IEEE Trans. Software Eng., Jan. 2007; http://menzies.us/pdf/06learnPredict.pdf.

T.J. Ostrand, E.J. Weyuker, and R.M. Bell, “Where the Bugs Are,” Proc. 2004 ACM SIGSOFT Int’l Symp. Software Testing and Analysis, ACM, 2004, pp. 86–96.

S. Kim et al., “Predicting Faults from Cached History,” Proc. Int’l Conf. Software Eng., IEEE CS, 2007, pp. 489-498.

My first bash function

I found a nice little article today on learning terminal commands in Linux. The first suggestion in the article was to echo a random command from the /bin directory every time an instance of bash starts up. I took their suggestion of wrapping the output in a cowsay speech bubble slightly further.

cowsay is a little program that prints text from standard input in a speech bubble coming from a cow (there are other animals/characters that you can specify). All I did was plop the cowsay command from the MakeUseOf article into a bash function called cowtip in my ~/.bashrc and ran it every time a terminal window is started. Here’s the code:

CowTip of the day!

function cowtip {
   cowsay -f $(ls /usr/share/cowsay/cows | shuf -n 1 | cut    -d -f1) $(whatis $(ls /bin) 2>/dev /null | shuf -n 1)
}
cowtip

Just put it in your ~/.bashrc and it’ll spit out a random command and its description every time you start the terminal or run cowtip!

 ___________________________________
< dir (1) - list directory contents >
 -----------------------------------
       \   ^__^
        \  (oo)\_______
           (__)\       )\/\
               ||----W |
               ||     ||

The scary yet promising implications of AI

This post was inspired by a long, thought provoking post about artificial intelligence (AI) from the very interesting blogging site WaitButWhy.com. A big part of my educational enlightenment has been figuring out just what it is that I can do to make an impact in the world, hopefully a good one, on hopefully a large scale. After spending the good part of an afternoon reading and mulling over the Wait But Why article, I had formulated the foggy idea that cognitive computing, machine learning, artificial intelligence, and other related subfields of computer science would provide ample opportunity and potential to make a huge impact by advancing the technology we have today into the superintelligent machines that will most likely be commonplace in the future. Instead of making you read the entire ~23,000 word two-part article, I’ll give you a super condensed version in italics below (note: I have no intention of taking credit for the hard work of the people at waitbutwhy.com – this summary is just a condensation of their really long article).

A quick summary

Basically, human progress is currently here:

Just at the edge of an AI explosion. [Courtesy of waitbutwhy.com]

Artificial intelligence (AI) technology might be about to skyrocket thanks to the Law of Accelerating Returns. We don’t exactly know when (or if) it’ll happen, but experts predict it’ll be anywhere from 7 years to 100 years. And if it does reach the point of no return (known as the singularity – when AI explosively surpasses human intelligence), it’ll be a fun ride. Researchers have also classified AI into three calibers: ANI, AGI, and ASI, standing for Artificial Narrow Intelligence, Artificial General Intelligence, and Artificial Superintelligence. We’re currently at ANI, with things like Siri, Google Translate, or Watson (the computer that won Jeopardy). AGI is as intelligent as a human, and ASI is anything past AGI, or past the level of human intelligence. Once we build an AI with human capabilities, it’ll likely go through recursive self-improvement (e.g. machine learning) and then become ASI really quickly.

The problem with ASI being smarter than is us that we wouldn’t know how to control it. It could do unbelievably good things or unbelievably bad things, like figure out how to make us an immortal species (see picture), invent technology for us, and solve all our problems, or it could destroy every living thing on the planet in a plethora myriad of different ways. This immense power is what scares many people (including me).

The proverbial balance beam. [Courtest of waitbutwhy.com]

The article then goes on to describe a doomsday-type scenario where a robot is given the task of emulating handwriting by constantly improving, without regards to the human consequences of doing so, and the robot proceeds to kill the rest of the universe to gather the resources it needs to keep practicing its handwriting. This doomsday robot isn’t necessarily evil – computers aren’t ‘friendly’ and ‘evil’, we just anthropomorphize them, e.g. think of them as human. Since human values weren’t initially hard coded into the robot, it didn’t consider the survival of humans to be an imperative goal, so it simply continued on to wipe out humans to gather resources for its task.

So how do we control AI?

AI systems have the potential to drastically alter the current path of life as we know it, whether that path is good, bad, or both. The implications are pretty scary, but the scariest possibility is that we won’t understand how the AI systems we’ve created work, and therefore won’t know how to control them.

Figuring out how to gain control of artificial superintelligence and the self-improving decision making of AI before it actually surpasses us could be the difference between us controlling AI and our AI controlling us. And we need to do it fast, because we don’t really know how much longer we have.

In my mind, the key difference between a computer we can and cannot control long-term is whether or not it self-improves. If it self-improves, it could do so unpredictably in a way that creates undesirable behaviors. The intelligent decisions that a computer makes that affects humanity somehow must be moral decisions, in the best interest of human rights. But what’s moral? And how do we govern an AI’s morality?


Issac Asimov, author of Runaround and the Three Laws of Robotics [Courtesy of mentalfloss.com]

You may have heard of Isaac Asimov’s Three Laws of Robotics, originally published in Runaround, a short story from 1942. They are as follows:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm
  2. A robot must obey the orders given it by human beings, except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

The problem with these is that they were formed as the basis of a short story, and not from a well-formed scientific approach. They’re too ambiguous, and on top of that, there’s no scientific basis suggesting that they would suffice as valid parameters from which to construct ASI systems.

But is there really any scientific evidence that we can pull from that gives us insight into which ethical paradigms most successfully guide an AGI’s behavior in favor of humanity? Not really. We haven’t developed an AGI yet, and the only way to gather evidence about AGI behavior is to make them and study them. Our situation is paradoxical: in order to develop good ethical AGI/ASI laws before they’re developed, we have to have AGI to study and experiment on.

Effective technical design paradigms take time and experimentation to properly mature, and sometimes good paradigms haven’t matured until the technology has already undergone widespread mainstream implementation. Andrew Tanenbaum noted this in his computer networking textbook. The OSI model was developed, but before the standards could mature into a better networking paradigm, technology companies started heavily investing in TCP/IP products and the belatedly-developed OSI protocols were left by the wayside. This is known as the apocalypse of two elephants.

The same could happen with AGI. The rate of progress of AI is so volatile that we don’t know when we’ll achieve ASI, but when we do we should hope that solid fundamentals have been developed to ensure our control of their behavior. Fortunately, there’s been considerable development in the field of machine ethics, which could provide the answer researchers have been looking for as an assuaging and sufficient response to Asimov’s Three Laws of Robotics.

The successor to Asimov’s Three Laws

Machine ethics is a relatively new field, only recently coming into focus since the capabilities and sophistication of computers has made artificial intelligence a more realistic endeavor. Basically, it studies the moral behavior of artificially intelligent machines, the potential impact that morality could have on AI behavior, and the creation of ‘ethical agents’, a term defined in James Moor’s paper, “The Nature, Importance and Difficulty of Machine Ethics”.

Machine ethics considers a number of different strategies to control the morality of AI behavior. One such consideration is the domain of algorithms that can be considered safe to use in AI programming: algorithms such as neural networks and genetic algorithms are too indecipherable to be considered safe because of how difficult it is to see how they make decisions. An AI that runs on decision trees or Bayesian networks would be safer, because the implementation of those algorithms are easily inspected and transparent, according to researcher Nick Bostrom (see this article).

Another consideration on controlling AI behavior is whether or not we can use machine learning techniques to ‘learn’ morality. Some researchers suggest that AGI should be programmed to dynamically analyze the ethical consequences of its own actions, rather than rely on a predetermined list of rules to follow. It would be harder to implement, but would be more flexible and adaptable to different situations.

Where are we now?

We ultimately don’t know when (or even if) artificial intelligence will reach a singularity, so there’s no telling what exactly is going to happen. However, the rate of progress of machine ethics and artificial intelligence ethics is definitely nonzero. Products like IBM’s Watson are getting at least within the ballpark of human decision making. Watson can answer impressively complex Jeopardy! questions (as it famously did to beat the two best human Jeopardy! players ever), by combining massive amounts of computing power and massively sophisticated combinations of algorithms in natural language processing and data retrieval to come up with hypotheses and their probabilities of correctness.

Watson is an impressive step forward in cognitive computing, and emulates human brain function on a primitive level: instead of calculating or computing an answer, it searches for an answer based on prior ‘experience’ in the form of indexed data. It’s very good at what it was designed to do, but it’s still far away from the general intelligence abilities of the human brain. The jump from ANI (narrow intelligence) to AGI is huge. It’ll take some pretty giant leap to get there. Still, Watson is a remarkably advanced product that pushed the capabilities of natural language processing and cognitive computing far beyond what had been done before.

In academic research, a great deal of progress is being made. Researchers have begun developing automated tools (here’s another) based on genetic algorithms that can fix bugs. It’ll be fascinating to watch and see what happens as the research continues to push further the extent of human knowledge already made in artificial intelligence and machine learning (and so many more…).


Human morality is and always will be a field of deep divides. A quick Google search of social issues gives plenty of varying and often extremist opinions. We, as humans, can’t and probably never will agree on the ethicality of certain decisions, and it would certainly be orders of magnitude more difficult to hard code ‘correct’ ethical behavior into an AGI or ASI. Nonetheless, the progress of academic research in both machine ethics and artificial intelligence (and other fields) will ultimately cause advancements in our understanding of the computing systems of the future, and it’ll be fascinating to watch.