Hey there, my name's Mitch and I'm a software/web developer at N.C. State! Check out the about page or see below for my most recent blog posts.

A tidbit of Java magic - integer caching

I came across a code snippet on StackOverflow:

Integer a = 42;
Integer b = 42;
System.out.println(a == b);
Integer c = 555;
Integer d = 555;
System.out.println(c == d);

The first print statement prints true and the second one prints false.

Wait, what?

The second comparison makes sense. == in Java compares references to see if they are the same object in memory (and primitives are compared by their literal value). So c == d should return false, since they’re not the same object in memory. The Object.equals() method is a functional comparison, but it doesn’t check actual reference equality, e.g. it doesn’t check the addresses of the reference variables. So why does the first comparison equate to true?

The answer can be found in the Java Langauge Specification, chapter 5. Two primitives being autoboxed into references via a boxing conversion may qualify to be cached for optimization purposes. Caching the more commonly used primitive values leads to faster access time, so these certain commonly used primitives are cached:

  • true, false
  • byte or char in the range \u0000 to \u007f
  • int or short in the inclusive range of -128 to 127

So in the snippet above, a == b is true because a and b are between -128 and 127, so when a new Integer a = 42; is created, it simply points to a previously cached Integer object.

Note that this only works when primitive ints are autoboxed to Integers. If two Integer objects are created (not autoboxed from the primitive type)…

Integer e = new Integer(4);
Integer f = new Integer(4);
System.out.println(e == f);

…then the comparison will be false, since both Integer references were explicitly initialized as separate objects.

Interesting stuff!

Arroz con pollo recipe

This is a divergence from the usually technically-oriented post. I’ve never put a recipe up on my website, but I made up a pretty good arroz con pollo recipe that I’ve made on a few occasions. It’s kinda long to make, but really good hit the spot hearty cuban-ish food.

Serves 6

Ingredients - need prep

  • Chicken - 3 breasts, cut to ~1.5 inch pieces
  • Half a red onion (cut to 3/4 inch squares)
  • Red & green bell pepper - half a pepper each kind (cut into strips)
  • 3 sweet potatoes (large dice - 3/4 inch)
  • Garlic (finely chopped, or if you’re lazy like me, a bit of garlic powder)

Ingredients - no prep

  • Black rice and wild rice (1 cup each)
  • Olive oil (drizzle on sweet potatoes)
  • Black beans, half drained (2 cans - ~30 oz)
  • Diced tomatoes (1 can)
  • 2 limes

Ingredients - spices

  • Italian seasoning/oregano (pick one)
  • Rosemary
  • Cumin
  • Some salt/pepper
  • Chile powder
  • Cilantro and extra lemon/lime(s) for garnish

Prep

  1. [3 sweet potatoes] [olive oil] [Rosemary, cumin, cinnamon, salt/pepper] Preheat oven to 400. Peel and dice 3 sweet potatoes to approximately 3/4 inch cubes. Spread onto sheet pan and drizzle a bit of olive oil for moisture. Spice with rosemary leaves, cumin, cinnamon, salt, and pepper. Use a spatula to lightly toss chunks until seasoning and oil is evenly distributed. Bake at 400 for 50 minutes.
  2. [Half a red bell pepper] [Half a green bell pepper] [Half a red onion] Chop half a red and half a green bell pepper (or any combination of pepper colors) into 1/2 inch by 3/4 inch strips and put in a bowl for later cooking. Chop half a red onion into 3/4 inch squares and mix with bell peppers.
  3. [Cumin, chile powder, salt, pepper, garlic powder or finely chopped garlic] Prepare spice mix by combining some ratio of cumin, chile powder, salt, pepper, and garlic powder into a small bowl. Not sure how much of each, just kinda splash it in there until there’s enough. Or put it directly in the chicken/veggies sautee as needed.

Cook

  1. [1 cup black rice, 1 cup wild rice] Bring 1 cup of black rice and 1 cup of wild rice to a boil in uh…however much water the rice bag calls for, and for however long it says. Squeeze juice from 2 limes into rice and stir.
  2. [3 chicken breasts] [1 can diced tomatoes] [2 cans black beans, only 1 drained] Prep chicken by cutting off fat and cutting into large chunks (~1.5 inch). Sautee 5-7 minutes or until almost golden brown, and add vegetables from step 2. Add half of spice combination and sautee ~5mins, until vegetables are cooked. Add 1 can of diced tomatoes, 1 can drained black beans, and 1 can UNdrained black beans; bring to a simmer for 5 mins. and stir ingredients until mixed.
  3. When sweet potatoes are done, remove from oven and add to chicken/beans simmer. And add the rest of the spice mix.
  4. [Cilantro] Serve the rice and the chicken mix as a topping. Use cilantro as a garnish.

Goes well with cilantro lime corn (recipe soon to be posted?) and refried plantains (also soon to be posted?)

My Job: curating SE data at openscience.us

Back in January, I started a part-time job at N.C. State, joining my colleague and good friend Carter Pape in developing version 4 of openscience.us/repo, a long-term repository for software engineering (SE) research data. I wasn’t too knowledgeable about what I was getting into, but through the past few months I’ve gained some insight into the philosophy of SE research – and how Dr. Tim Menzies and his brainchild OpenScience is making software engineering research a more reproducible and replicable process.

Software Analytics

Software analytics deals with the analysis of data gathered from software engineering to gain insights that can produce actionable advice to improve software engineering. Such advice can include using

  • XML descriptions of design patterns to recommend particular designs [1],
  • software process models to learn effective project changes [2], and
  • bug databases to learn defect predictors that guide inspection teams to where the code is most likely to fail [3-5].

A common problem associated with software analytics, which includes SE research, is that many research papers that used SE data to reach conclusions is not provided with the paper. An essential paradigm of the scientific method is that results must be both reproducible and replicable. Reproducibility is the ability to reproduce an experiment; e.g. take somebody’s previous experiment and rerun it with either their data or on your own data (it depends on who you ask – the precise definition is a bit fuzzy). Replicability is achieved when the same results are gathered from the same experimental methods with the same data.

So…when the data used in a particular study or experiment is not provided to the academic community, the study or experiment is irreproducible and therefore irreplicable. There’s solid evidence of this, as stated on the openscience.us/ssj/manifesto page:

There are very few replications of prior SE results. For example, in 2010, Robles published a retrospective study of the 171 papers published in the Mining Software Repositories (MSR) conference [106]. He found that over 95 of those papers were unreproducible, since their associated data was no longer on-line. This lack of availability of old MSR data was discussed, at length, at MSR 2013. According to those participants, the single biggest contributor to this issue was the lack of a free-to-use long-term storage facility for big files. For example, free services like GitHub or GoogleCode impose limits of just a few gigabytes on the total repository size.

So not only is some data not being published (therefore breaking the academic research model), but the data that is published tends to go missing over time. As the manifesto states, a reliable long-term storage repository for data simply didn’t exist. That’s where I come in!

The tera-PROMISE repository

My job involves two main branches of work: developing the actual site with HTML, CSS, and the Jekyll framework, and curating/adding research data as it is submitted. It is also described in the OpenScience manifesto:

SOLUTION #4: Create a large free-to-use repository for SE research products. To this end, we have created a large repository for storing the data, plus creating a discussion site for those contents calculate, that this repository requires one petabyte of storage.

So I work on building the actual site with Jekyll, a “simple, blog-aware, static site generator”, and adding datasets to the site as Jekyll posts and uploading them to the SVN repository. The Jekyll site hosts the data descriptions and context notes to the data, along with a link to the data in the SVN repository, hosted separately at N.C. State University.

And that’s my job! There are currently over 100 datasets housed in the tera-PROMISE repository, some of which are circa 2004, sorted into 18 categories. If you happen to be a researcher who wants data to be uploaded, feel free to browse around the current projects and fill out a Google Form with the appropriate information. You can also email openscience.content@gmail.com if you prefer.

These references were gathered from this IEEE article on software analytics, co-authored by Dr. Menzies.

F. Palma, H. Farzin, and Y.-G. Gueheneuc, “Recommendation System for Design Patterns in Software Development: A DPR Overview,” Proc. 3rd Int’l Workshop Recommendation Systems for Software Eng., IEEE, 2012, pp. 1–5

D. Rodríguez et al., “Multiobjective Simulation Optimisation in Software Project Management,” Proc. Genetic and Evolutionary Computation Conf., ACM, 2011, pp. 1883–1890.

T. Menzies, J. Greenwald, and A. Frank, “Data Mining Static Code Attributes to Learn Defect Predictors,” IEEE Trans. Software Eng., Jan. 2007; http://menzies.us/pdf/06learnPredict.pdf.

T.J. Ostrand, E.J. Weyuker, and R.M. Bell, “Where the Bugs Are,” Proc. 2004 ACM SIGSOFT Int’l Symp. Software Testing and Analysis, ACM, 2004, pp. 86–96.

S. Kim et al., “Predicting Faults from Cached History,” Proc. Int’l Conf. Software Eng., IEEE CS, 2007, pp. 489-498.

My first bash function

I found a nice little article today on learning terminal commands in Linux. The first suggestion in the article was to echo a random command from the /bin directory every time an instance of bash starts up. I took their suggestion of wrapping the output in a cowsay speech bubble slightly further.

cowsay is a little program that prints text from standard input in a speech bubble coming from a cow (there are other animals/characters that you can specify). All I did was plop the cowsay command from the MakeUseOf article into a bash function called cowtip in my ~/.bashrc and ran it every time a terminal window is started. Here’s the code:

CowTip of the day!

function cowtip {
   cowsay -f $(ls /usr/share/cowsay/cows | shuf -n 1 | cut    -d -f1) $(whatis $(ls /bin) 2>/dev /null | shuf -n 1)
}
cowtip

Just put it in your ~/.bashrc and it’ll spit out a random command and its description every time you start the terminal or run cowtip!

 ___________________________________
< dir (1) - list directory contents >
 -----------------------------------
       \   ^__^
        \  (oo)\_______
           (__)\       )\/\
               ||----W |
               ||     ||