kraegar

joined 2 years ago
[–] kraegar 4 points 2 years ago

I feel this has been the case already for more time than people think. AI/ML has been its own subspecialty of SWE for years. There are some low hanging fruit that using sklearn or copy and pasting from stack overflow will let you do, but for the most part the advanced features require professional specialization.

One thing that bothers me is that subject matter expertise is often ignored. General AI researchers can be helpful, but often times having SME context AND and AI skillset will be way more valuable. For LLMs it may be fine since they produce a generalized solution to a general problem, but application specific tasks require relevant knowledge and an understanding of pros/cons within the use case.

It feels like a hot take, but I think that undergraduate degrees should establish a base knowledge in a domain and then AI introduced at the graduate-level. Even if you are not using the undergraduate domain knowledge, it should be transferable to other domains and help you to understand how to solve problems with AI within the context of a professional domain.

[–] kraegar 1 points 2 years ago (1 children)

Depending on what config data you need it might be a good idea to use environment variables. If all you need are server locations and credentials then environment variables are likely your best bet.

If you need fancy JSON or something else, global variables are nice.

[–] kraegar 1 points 2 years ago

I think this is the beauty of federation. Everything is open and free to all rather than a company being able to lock in your personally created content.

For example, I wanted to learn about NLP and am working on building a bot to monitor sentiment and check for hate speech in lemmy content. I am still at the brainstorming/research phase, but the accessibility of lemmy makes it really nice.

Pythorhead was made for this exact purpose.

[–] kraegar 3 points 2 years ago

I normally try and do "fun" work. This largely depends on how autonomous your job is. I was a PhD student doing research for a company and I received very little oversight for 3 years.

The supervision I did receive was great though. They understood needing to take a break and slow down. At those point I would generally read papers, watch PyData talks (highly recommend them, like inspirational ted talks for data people), or contribute to open source to learn about new tools or design paradigms.

[–] kraegar 1 points 2 years ago

Welp. This is me. I spent a few hours debugging a failing test that was caused by a package update. If only I checked the changelogs...

[–] kraegar 5 points 2 years ago

Open source contribution can be really great. I started contributing to a Python project that I have used extensively and it 100% improved my coding. It also can allow for you to interact with more experienced devs (depending on the project) and allows for you to get feedback.

[–] kraegar 4 points 2 years ago

This has been my experience too. A junior dev at my last company kept trying to use ChatGPT to generate docket compose files and wondered why they generally didn't work.

My research has been on time series forecasting which is tangentially related to NLP. People are shocked when I point out to them that all these models do it predict the next token. Using weather forecasting has been a good analogy for why long AI generated texts are extra bad: weather forecasts get worse as the horizon increases.

Despite all my gripes about LLMs, I must say that copilot has saved me writing TONS of boilerplate code and unit tests.

[–] kraegar 8 points 2 years ago (3 children)

I think there is definitely some echo chambering, since the average person isn't generally aware of AI. At the same time, mainstream media has been picking up the hype train a lot recently.

People hear my grad school studies involve AI/ML and I instantly get bombarded with questions about ChatGPT.

[–] kraegar 1 points 2 years ago

I started my MSc and part of it involved building a ray-tracing simulation. I built it in MATLAB, but the technical debt quickly became so high that I had to rebuild it all in Python.

MATLAB does have classes, but it is hot garbage. Distributed computing is also awful (I moved to Python and Pyspark in quick succession and life got a lot better).

The only industry job I have seen request MATLAB was for legacy companies like Telesat and I wouldn't be surprised if they were moving towards Python since the license fees are insane.

[–] kraegar 2 points 2 years ago

I have heard of SageMath, but never used it.

I have never seen people use tooling like that in industry. It could be that I am simply not interacting with companies that do, but across all the modeling I have done I have yet to see symbolic math.

The closest thing I have used to symbolic math is PyMC which is a bit of a stretch.

[–] kraegar 2 points 2 years ago

My research area has been in time series forecasting and unsupervised anomaly detection, but it is SOMEWHAT related to NLP.

Papers with code had a few potential implementations: https://paperswithcode.com/paper/hyena-hierarchy-towards-larger-convolutional

I am always skeptical of papers. They could have good results, but how much did they adjust their experiment to look good on paper?

view more: next ›