Lead Data Engineer - Testing and Measurement

Employer
Location
Minneapolis, Minnesota, United States
Posted
Mar 17, 2017
Closes
Mar 29, 2017
Category
Business, Engineering
Employment Status
Full Time
PRIMARY FUNCTIONThe Lead Data Engineer is responsible for the development of high performance, distributed computing tasks using Big Data technologies such as Hadoop, NoSQL, text mining and other distributed environment technologies based on the needs of the organization. Also responsible for analyzing, designing, programing, debugging and modifying software enhancements and/or new products used in distributed, large scale analytics solutions.PRINCIPAL DUTIES AND RESPONSIBILITIESDesigning and Implementation
    • Solid understanding of data structures, algorithms, object oriented design and patterns
    • Experience building highly scalable solutions in data storage, real-time analysis and reporting for multi-terabyte data sets using technologies like: Hive, HBase, real-time technologies like Spark/APEX and Hadoop environment in general
    • Expert knowledge of databases, specifically relational (e.g. SQL), and also unstructured (HBase/Cassandra) and column store (e.g. Vertica/Redshift)
    • Understanding of Data Science & Machine Learning Methodologies including Statistics and Mathematical Modelling
    • Highly effective communication and collaboration skill  
    • Experience working with data at scale, specifically with distributed systems
    • Rigor in A/B testing, automated testing, and other engineering best practices
    • Assist in the definition of software architecture to ensure that the online organization's software solutions are built within a consistent framework
    • Assist in the decision-making process related to the selection of software architecture solutions
    • Implement architectures to handle web-scale data and its organization
    • Execute strategies that inform data design and architecture in partnership with enterprise-wide standards
Scope of Work
    • Experience working in an agile environment with rapid iterations of small amounts of functionality delivered frequently
    • Integrate data sources, including interacting with APIs, create and maintain data schemas, and store data in our database
    • Help with data pipeline related components and architecture
    • Design and oversee the entire data pipeline, from postgres data layout and sql queries to large batch processing jobs, MapReduce frameworks, and data warehouse / cloud storage
    • Integrate data sources, including interacting with APIs, create and maintain data schemas, and store data in our database
    • Translate strategic requirements to ensure effective solutions meet business requirements.
    • Review and approve specifications to ensure consistency in approach and use
    • Assist partners as a technical and business-savvy resource to software engineering personnel on a range of software design issues
    • Perform systems and applications performance characterization and trade-off studies through analysis and simulation
    • Research improvements in coding standards
MINIMUM REQUIREMENTS:
  • 5-7 years' experience in developing software applications including: analysis, design, coding, testing, deploying and supporting of applications
  • BS degree in Computer Science, Applied Mathematics, Physics, Statistics or area of study related to data sciences and data mining
  • Proficient in application/software architecture (Definition, Business Process Modeling, etc.)
  • Understand application/software development and design
  • Understanding of Java - Maven, Gradle
  • Collaborative personality, able to engage in interactive discussions with the rest of the team
  • Background in one or more big data technologies (such as Hadoop, Spark, Storm)
  • Expert knowledge of databases, specifically relational (e.g. SQL), and also unstructured (HBase/Cassandra) and column store (e.g. Vertica/Redshift)

More jobs like this